Perl, "the Swiss Army chainsaw of scripting languages"
Using Perl, regular expressions and sort to parse and format a plaintext log-file.
And finally comma separate the output (CSV format) for import into e.g. Excel.
The approach here can easily be modified to parse other plaintext files in the same manner.
Here is a snippet of an Nginx access.log file example we are going to use for parsing:
$ cat ./access.log
216.218.206.66 - - [01/Feb/2015:05:35:11 +0000] "GET / HTTP/1.1" 403 162 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //MyAdmin/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //phpMyAdmin/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //pma/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //phpmyadmin/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //myadmin/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET /muieblackcat HTTP/1.1" 404 162 "-" "-"
111.251.50.236 - - [01/Feb/2015:07:17:47 +0000] "CONNECT mx0.mail2000.com.tw:25 HTTP/1.0" 400 166 "-" "-"
23.239.196.71 - - [01/Feb/2015:07:45:48 +0000] "GET / HTTP/1.1" 301 178 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)"
69.171.237.116 - - [01/Feb/2015:11:06:01 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 301 178 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.237.116 - - [01/Feb/2015:11:06:02 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 200 57525 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
77.37.231.208 - - [01/Feb/2015:11:26:03 +0000] "GET / HTTP/1.1" 301 178 "-" "Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0"
173.252.110.115 - - [01/Feb/2015:11:31:20 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 301 178 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
173.252.110.119 - - [01/Feb/2015:11:31:21 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 200 57525 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
61.240.144.66 - - [01/Feb/2015:12:42:47 +0000] "GET / HTTP/1.0" 200 612 "-" "masscan/1.0 (https://github.com/robertdavidgraham/masscan)"
108.166.85.126 - - [01/Feb/2015:16:05:43 +0000] "GET /admin/config.php HTTP/1.0" 499 0 "-" "-"
23.23.38.251 - - [01/Feb/2015:16:24:00 +0000] "GET / HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB6 (.NET CLR 3.5.30729)"
23.23.38.251 - - [01/Feb/2015:16:24:01 +0000] "GET / HTTP/1.1" 200 7105 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB6 (.NET CLR 3.5.30729)"
185.5.51.50 - - [01/Feb/2015:16:56:48 +0000] "GET / HTTP/1.1" 301 178 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:50 +0000] "GET / HTTP/1.1" 200 7105 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:50 +0000] "GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1" 200 9977 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1" 200 39386 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1" 200 2698 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1" 200 3075 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /content/images/2015/01/lars.jpg HTTP/1.1" 200 47097 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /assets/fonts/casper-icons.woff HTTP/1.1" 200 2260 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:57:20 +0000] "GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1" 200 6642 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
207.34.25.76 - - [01/Feb/2015:18:01:13 +0000] "GET /robots.txt HTTP/1.1" 301 178 "-" "R6_CommentReader(www.radian6.com/crawler)"
71.11.195.254 - - [01/Feb/2015:21:37:39 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 166 "-" "-"
175.44.8.98 - - [01/Feb/2015:22:12:06 +0000] "POST /ghost_linux_init_script/ HTTP/1.1" 301 178 "-" "-"
78.133.20.10 - - [01/Feb/2015:22:42:33 +0000] "GET / HTTP/1.1" 200 72 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
78.133.20.10 - - [01/Feb/2015:22:42:33 +0000] "GET /favicon.ico HTTP/1.1" 404 162 "https://gw4.node25.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
31.202.241.242 - - [02/Feb/2015:01:15:13 +0000] "GET / HTTP/1.0" 301 178 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
207.145.97.131 - - [02/Feb/2015:04:00:25 +0000] "GET / HTTP/1.1" 400 166 "-" "-"
207.145.97.131 - - [02/Feb/2015:04:00:26 +0000] "GET //Net_work.xml HTTP/1.1" 400 166 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:10 +0000] "GET / HTTP/1.1" 403 162 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:10 +0000] "GET /robots.txt HTTP/1.1" 403 162 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:12 +0000] "" 400 0 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:13 +0000] "" 400 0 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:13 +0000] "" 400 0 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:17 +0000] "quit" 400 166 "-" "-"
54.89.61.205 - - [02/Feb/2015:08:25:02 +0000] "GET /robots.txt HTTP/1.1" 200 48 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.197.168.201 - - [02/Feb/2015:08:25:02 +0000] "GET /robots.txt HTTP/1.1" 200 48 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.89.61.205 - - [02/Feb/2015:08:25:02 +0000] "GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1" 200 6638 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.197.168.201 - - [02/Feb/2015:08:25:02 +0000] "GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1" 200 5682 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.160.249.160 - - [02/Feb/2015:08:25:03 +0000] "GET /robots.txt HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.160.249.160 - - [02/Feb/2015:08:25:03 +0000] "GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1" 200 5682 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.82.74.230 - - [02/Feb/2015:08:25:03 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.6; +http://flipboard.com/browserproxy)"
54.82.74.230 - - [02/Feb/2015:08:25:04 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 200 57525 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.6; +http://flipboard.com/browserproxy)"
54.160.249.160 - - [02/Feb/2015:08:25:08 +0000] "GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1" 200 5682 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
186.15.3.50 - - [02/Feb/2015:11:13:54 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 166 "-" "-"
176.58.116.39 - - [02/Feb/2015:12:01:46 +0000] "GET / HTTP/1.1" 200 7107 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:46 +0000] "GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:47 +0000] "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:47 +0000] "GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:47 +0000] "GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:48 +0000] "GET /content/images/2015/01/lars.jpg HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:54 +0000] "GET / HTTP/1.1" 200 7107 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:54 +0000] "GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:55 +0000] "GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:55 +0000] "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:55 +0000] "GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:55 +0000] "GET /content/images/2015/01/lars.jpg HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
Let's say we want to:
- Extract the IP address, date, request, URN and HTTP Status Code.
- Ignore duplicates from the same day.
Here one might think that "awk" & "sort" would do, but because of the date format with "hour:minute:seconds", ignoring duplicates with these, eventually ends up convoluted:
$ awk '{print $1, $4, $6, $7}' access.log |sort -u
108.166.85.126 [01/Feb/2015:16:05:43 "GET /admin/config.php
111.251.50.236 [01/Feb/2015:07:17:47 "CONNECT mx0.mail2000.com.tw:25
173.252.110.115 [01/Feb/2015:11:31:20 "GET /content/images/2015/01/ghost_login_owner.png
173.252.110.119 [01/Feb/2015:11:31:21 "GET /content/images/2015/01/ghost_login_owner.png
175.44.8.98 [01/Feb/2015:22:12:06 "POST /ghost_linux_init_script/
176.58.116.39 [02/Feb/2015:12:01:46 "GET /
176.58.116.39 [02/Feb/2015:12:01:46 "GET /assets/css/screen.css?v=cfc2d462d6
176.58.116.39 [02/Feb/2015:12:01:47 "GET /assets/js/index.js?v=cfc2d462d6
176.58.116.39 [02/Feb/2015:12:01:47 "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6
176.58.116.39 [02/Feb/2015:12:01:47 "GET /public/jquery.min.js?v=cfc2d462d6
176.58.116.39 [02/Feb/2015:12:01:48 "GET /content/images/2015/01/lars.jpg
176.58.116.39 [02/Feb/2015:12:01:54 "GET / <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:54 "GET /assets/css/screen.css?v=cfc2d462d6 <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:55 "GET /assets/js/index.js?v=cfc2d462d6 <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:55 "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:55 "GET /content/images/2015/01/lars.jpg <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:55 "GET /public/jquery.min.js?v=cfc2d462d6 <---Duplicate
185.5.51.50 [01/Feb/2015:16:56:48 "GET /
185.5.51.50 [01/Feb/2015:16:56:50 "GET / <---Duplicate
185.5.51.50 [01/Feb/2015:16:56:50 "GET /assets/css/screen.css?v=cfc2d462d6
185.5.51.50 [01/Feb/2015:16:56:51 "GET /assets/fonts/casper-icons.woff
185.5.51.50 [01/Feb/2015:16:56:51 "GET /assets/js/index.js?v=cfc2d462d6
185.5.51.50 [01/Feb/2015:16:56:51 "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6
185.5.51.50 [01/Feb/2015:16:56:51 "GET /content/images/2015/01/lars.jpg
185.5.51.50 [01/Feb/2015:16:56:51 "GET /public/jquery.min.js?v=cfc2d462d6
185.5.51.50 [01/Feb/2015:16:57:20 "GET /identifying-services-needing-restart-after-updating-linux-packages/
186.15.3.50 [02/Feb/2015:11:13:54 "GET /tmUnblock.cgi
198.20.69.74 [02/Feb/2015:06:18:10 "GET /
198.20.69.74 [02/Feb/2015:06:18:10 "GET /robots.txt
198.20.69.74 [02/Feb/2015:06:18:12 "" 400
198.20.69.74 [02/Feb/2015:06:18:13 "" 400 <---Duplicate
198.20.69.74 [02/Feb/2015:06:18:17 "quit" 400
199.180.112.34 [01/Feb/2015:05:59:48 "GET /muieblackcat
199.180.112.34 [01/Feb/2015:05:59:48 "GET //myadmin/scripts/setup.php
199.180.112.34 [01/Feb/2015:05:59:48 "GET //MyAdmin/scripts/setup.php
199.180.112.34 [01/Feb/2015:05:59:48 "GET //phpmyadmin/scripts/setup.php
199.180.112.34 [01/Feb/2015:05:59:48 "GET //phpMyAdmin/scripts/setup.php
199.180.112.34 [01/Feb/2015:05:59:48 "GET //pma/scripts/setup.php
207.145.97.131 [02/Feb/2015:04:00:25 "GET /
207.145.97.131 [02/Feb/2015:04:00:26 "GET //Net_work.xml
207.34.25.76 [01/Feb/2015:18:01:13 "GET /robots.txt
216.218.206.66 [01/Feb/2015:05:35:11 "GET /
23.23.38.251 [01/Feb/2015:16:24:00 "GET /
23.23.38.251 [01/Feb/2015:16:24:01 "GET / <---Duplicate
23.239.196.71 [01/Feb/2015:07:45:48 "GET /
31.202.241.242 [02/Feb/2015:01:15:13 "GET /
54.160.249.160 [02/Feb/2015:08:25:03 "GET /host-your-own-blog-with-ghost-nginx-on-linux/
54.160.249.160 [02/Feb/2015:08:25:03 "GET /robots.txt
54.160.249.160 [02/Feb/2015:08:25:08 "GET /host-your-own-blog-with-ghost-nginx-on-linux/
54.197.168.201 [02/Feb/2015:08:25:02 "GET /host-your-own-blog-with-ghost-nginx-on-linux/
54.197.168.201 [02/Feb/2015:08:25:02 "GET /robots.txt
54.82.74.230 [02/Feb/2015:08:25:03 "GET /content/images/2015/01/ghost_login_owner.png
54.82.74.230 [02/Feb/2015:08:25:04 "GET /content/images/2015/01/ghost_login_owner.png <---Duplicate
54.89.61.205 [02/Feb/2015:08:25:02 "GET /identifying-services-needing-restart-after-updating-linux-packages/
54.89.61.205 [02/Feb/2015:08:25:02 "GET /robots.txt
61.240.144.66 [01/Feb/2015:12:42:47 "GET /
69.171.237.116 [01/Feb/2015:11:06:01 "GET /content/images/2015/01/ghost_login_owner.png
69.171.237.116 [01/Feb/2015:11:06:02 "GET /content/images/2015/01/ghost_login_owner.png <---Duplicate
71.11.195.254 [01/Feb/2015:21:37:39 "GET /tmUnblock.cgi
77.37.231.208 [01/Feb/2015:11:26:03 "GET /
78.133.20.10 [01/Feb/2015:22:42:33 "GET /
78.133.20.10 [01/Feb/2015:22:42:33 "GET /favicon.ico
Clearly another approach is needed if we're to easily ignore duplicates from the same day.
Removing the "hour:minute:seconds" part of date stamp from the output to "sort -u" will fix that.
So let's get rid of the duplicate lines, this time using Perl and regular expressions to parse the log lines.
Perl, the right tool for the job.
$ perl -lane 'print "$1 $2 $3 $4 " if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log
216.218.206.66 01/Feb/2015 GET / HTTP/1.1 403
199.180.112.34 01/Feb/2015 GET //MyAdmin/scripts/setup.php HTTP/1.1 404
199.180.112.34 01/Feb/2015 GET //phpMyAdmin/scripts/setup.php HTTP/1.1 404
199.180.112.34 01/Feb/2015 GET //pma/scripts/setup.php HTTP/1.1 404
199.180.112.34 01/Feb/2015 GET //phpmyadmin/scripts/setup.php HTTP/1.1 404
199.180.112.34 01/Feb/2015 GET //myadmin/scripts/setup.php HTTP/1.1 404
199.180.112.34 01/Feb/2015 GET /muieblackcat HTTP/1.1 404
111.251.50.236 01/Feb/2015 CONNECT mx0.mail2000.com.tw:25 HTTP/1.0 400
23.239.196.71 01/Feb/2015 GET / HTTP/1.1 301
69.171.237.116 01/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 301
69.171.237.116 01/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 200
77.37.231.208 01/Feb/2015 GET / HTTP/1.1 301
173.252.110.115 01/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 301
173.252.110.119 01/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 200
61.240.144.66 01/Feb/2015 GET / HTTP/1.0 200
108.166.85.126 01/Feb/2015 GET /admin/config.php HTTP/1.0 499
23.23.38.251 01/Feb/2015 GET / HTTP/1.1 301
23.23.38.251 01/Feb/2015 GET / HTTP/1.1 200
185.5.51.50 01/Feb/2015 GET / HTTP/1.1 301
185.5.51.50 01/Feb/2015 GET / HTTP/1.1 200
185.5.51.50 01/Feb/2015 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1 200
185.5.51.50 01/Feb/2015 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1 200
185.5.51.50 01/Feb/2015 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1 200
185.5.51.50 01/Feb/2015 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1 200
185.5.51.50 01/Feb/2015 GET /content/images/2015/01/lars.jpg HTTP/1.1 200
185.5.51.50 01/Feb/2015 GET /assets/fonts/casper-icons.woff HTTP/1.1 200
185.5.51.50 01/Feb/2015 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1 200
207.34.25.76 01/Feb/2015 GET /robots.txt HTTP/1.1 301
71.11.195.254 01/Feb/2015 GET /tmUnblock.cgi HTTP/1.1 400
175.44.8.98 01/Feb/2015 POST /ghost_linux_init_script/ HTTP/1.1 301
78.133.20.10 01/Feb/2015 GET / HTTP/1.1 200
78.133.20.10 01/Feb/2015 GET /favicon.ico HTTP/1.1 404
31.202.241.242 02/Feb/2015 GET / HTTP/1.0 301
207.145.97.131 02/Feb/2015 GET / HTTP/1.1 400
207.145.97.131 02/Feb/2015 GET //Net_work.xml HTTP/1.1 400
198.20.69.74 02/Feb/2015 GET / HTTP/1.1 403
198.20.69.74 02/Feb/2015 GET /robots.txt HTTP/1.1 403
198.20.69.74 02/Feb/2015 quit 400
54.89.61.205 02/Feb/2015 GET /robots.txt HTTP/1.1 200
54.197.168.201 02/Feb/2015 GET /robots.txt HTTP/1.1 200
54.89.61.205 02/Feb/2015 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1 200
54.197.168.201 02/Feb/2015 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1 200
54.160.249.160 02/Feb/2015 GET /robots.txt HTTP/1.1 301
54.160.249.160 02/Feb/2015 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1 200
54.82.74.230 02/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 301
54.82.74.230 02/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 200
54.160.249.160 02/Feb/2015 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1 200
186.15.3.50 02/Feb/2015 GET /tmUnblock.cgi HTTP/1.1 400
176.58.116.39 02/Feb/2015 GET / HTTP/1.1 200
176.58.116.39 02/Feb/2015 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET /content/images/2015/01/lars.jpg HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET / HTTP/1.1 200
176.58.116.39 02/Feb/2015 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1 304
176.58.116.39 02/Feb/2015 GET /content/images/2015/01/lars.jpg HTTP/1.1 304
Before we go further, let me explain the regular expression used in the previous command.
^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+("[^"]*")\s(\d{3})
^ # Match start of string.
###
# 1st Capturing group.
(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
\d{1,3}) # Match a digit [0-9] Between 1 and 3 times (greedy).
\. # Match the character "." literally.
###
\s # Match any white space character [\r\n\t\f].
\- # Match the character "-" literally.
\s # Match any white space character [\r\n\t\f].
\- # Match the character "-" literally.
\s # Match any white space character [\r\n\t\f]
\[ # Match the character "[" literally.
###
# 2nd Capturing group.
([^:]+)
[^:]+ # Match anything not a ":", one or more times (greedy).
###
[^"]+ # Match anything not a """, one or more times (greedy).
###
# 3rd Capturing group.
("[^"]*")
" # Match a single character """ literally.
[^"]* # Match anything not a """, zero or more times (greedy).
" # Match a single character """ literally.
###
\s # Match any white space character [\r\n\t\f].
###
# 4th Capturing group.
(\d{3})
\d{3} # match a digit [0-9] Exactly 3 times.
For more on regular expressions: http://www.regular-expressions.info/tutorial.html
Let's make it "pretty".
Let's move the HTTP status code to the 3rd column and use the printf function to line up the columns to make the output more legible.
$ perl -lane 'printf ("%-16s %-12s %-4s %s\n", $1, $2, $4, $3) if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log
216.218.206.66 01/Feb/2015 403 GET / HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //MyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //phpMyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //pma/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //phpmyadmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //myadmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET /muieblackcat HTTP/1.1
111.251.50.236 01/Feb/2015 400 CONNECT mx0.mail2000.com.tw:25 HTTP/1.0
23.239.196.71 01/Feb/2015 301 GET / HTTP/1.1
69.171.237.116 01/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
69.171.237.116 01/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
77.37.231.208 01/Feb/2015 301 GET / HTTP/1.1
173.252.110.115 01/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
173.252.110.119 01/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
61.240.144.66 01/Feb/2015 200 GET / HTTP/1.0
108.166.85.126 01/Feb/2015 499 GET /admin/config.php HTTP/1.0
23.23.38.251 01/Feb/2015 301 GET / HTTP/1.1
23.23.38.251 01/Feb/2015 200 GET / HTTP/1.1
185.5.51.50 01/Feb/2015 301 GET / HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET / HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /content/images/2015/01/lars.jpg HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/fonts/casper-icons.woff HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
207.34.25.76 01/Feb/2015 301 GET /robots.txt HTTP/1.1
71.11.195.254 01/Feb/2015 400 GET /tmUnblock.cgi HTTP/1.1
175.44.8.98 01/Feb/2015 301 POST /ghost_linux_init_script/ HTTP/1.1
78.133.20.10 01/Feb/2015 200 GET / HTTP/1.1
78.133.20.10 01/Feb/2015 404 GET /favicon.ico HTTP/1.1
31.202.241.242 02/Feb/2015 301 GET / HTTP/1.0
207.145.97.131 02/Feb/2015 400 GET / HTTP/1.1
207.145.97.131 02/Feb/2015 400 GET //Net_work.xml HTTP/1.1
198.20.69.74 02/Feb/2015 403 GET / HTTP/1.1
198.20.69.74 02/Feb/2015 403 GET /robots.txt HTTP/1.1
198.20.69.74 02/Feb/2015 400 quit
54.89.61.205 02/Feb/2015 200 GET /robots.txt HTTP/1.1
54.197.168.201 02/Feb/2015 200 GET /robots.txt HTTP/1.1
54.89.61.205 02/Feb/2015 200 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
54.197.168.201 02/Feb/2015 200 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.160.249.160 02/Feb/2015 301 GET /robots.txt HTTP/1.1
54.160.249.160 02/Feb/2015 200 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.82.74.230 02/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.82.74.230 02/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.160.249.160 02/Feb/2015 200 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
186.15.3.50 02/Feb/2015 400 GET /tmUnblock.cgi HTTP/1.1
176.58.116.39 02/Feb/2015 200 GET / HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /content/images/2015/01/lars.jpg HTTP/1.1
176.58.116.39 02/Feb/2015 200 GET / HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /content/images/2015/01/lars.jpg HTTP/1.1
Much better!
Removing duplicate lines.
$ perl -lane 'printf ("%-16s %-12s %-4s %s\n", $1, $2, $4, $3) if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log |sort -u
108.166.85.126 01/Feb/2015 499 GET /admin/config.php HTTP/1.0
111.251.50.236 01/Feb/2015 400 CONNECT mx0.mail2000.com.tw:25 HTTP/1.0
173.252.110.115 01/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
173.252.110.119 01/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
175.44.8.98 01/Feb/2015 301 POST /ghost_linux_init_script/ HTTP/1.1
176.58.116.39 02/Feb/2015 200 GET / HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /content/images/2015/01/lars.jpg HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/fonts/casper-icons.woff HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /content/images/2015/01/lars.jpg HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET / HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 301 GET / HTTP/1.1
186.15.3.50 02/Feb/2015 400 GET /tmUnblock.cgi HTTP/1.1
198.20.69.74 02/Feb/2015 400 quit
198.20.69.74 02/Feb/2015 403 GET / HTTP/1.1
198.20.69.74 02/Feb/2015 403 GET /robots.txt HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET /muieblackcat HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //myadmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //MyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //phpmyadmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //phpMyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //pma/scripts/setup.php HTTP/1.1
207.145.97.131 02/Feb/2015 400 GET / HTTP/1.1
207.145.97.131 02/Feb/2015 400 GET //Net_work.xml HTTP/1.1
207.34.25.76 01/Feb/2015 301 GET /robots.txt HTTP/1.1
216.218.206.66 01/Feb/2015 403 GET / HTTP/1.1
23.23.38.251 01/Feb/2015 200 GET / HTTP/1.1
23.23.38.251 01/Feb/2015 301 GET / HTTP/1.1
23.239.196.71 01/Feb/2015 301 GET / HTTP/1.1
31.202.241.242 02/Feb/2015 301 GET / HTTP/1.0
54.160.249.160 02/Feb/2015 200 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.160.249.160 02/Feb/2015 301 GET /robots.txt HTTP/1.1
54.197.168.201 02/Feb/2015 200 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.197.168.201 02/Feb/2015 200 GET /robots.txt HTTP/1.1
54.82.74.230 02/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.82.74.230 02/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.89.61.205 02/Feb/2015 200 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
54.89.61.205 02/Feb/2015 200 GET /robots.txt HTTP/1.1
61.240.144.66 01/Feb/2015 200 GET / HTTP/1.0
69.171.237.116 01/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
69.171.237.116 01/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
71.11.195.254 01/Feb/2015 400 GET /tmUnblock.cgi HTTP/1.1
77.37.231.208 01/Feb/2015 301 GET / HTTP/1.1
78.133.20.10 01/Feb/2015 200 GET / HTTP/1.1
78.133.20.10 01/Feb/2015 404 GET /favicon.ico HTTP/1.1
Want to sort it based on URN/file-path?
$ perl -lane 'printf ("%-16s %-12s %-4s %s\n", $1, $2, $4, $3) if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log |sort -u|sort -k5,5
198.20.69.74 02/Feb/2015 400 quit
176.58.116.39 02/Feb/2015 200 GET / HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET / HTTP/1.1
185.5.51.50 01/Feb/2015 301 GET / HTTP/1.1
198.20.69.74 02/Feb/2015 403 GET / HTTP/1.1
207.145.97.131 02/Feb/2015 400 GET / HTTP/1.1
216.218.206.66 01/Feb/2015 403 GET / HTTP/1.1
23.23.38.251 01/Feb/2015 200 GET / HTTP/1.1
23.23.38.251 01/Feb/2015 301 GET / HTTP/1.1
23.239.196.71 01/Feb/2015 301 GET / HTTP/1.1
31.202.241.242 02/Feb/2015 301 GET / HTTP/1.0
61.240.144.66 01/Feb/2015 200 GET / HTTP/1.0
77.37.231.208 01/Feb/2015 301 GET / HTTP/1.1
78.133.20.10 01/Feb/2015 200 GET / HTTP/1.1
108.166.85.126 01/Feb/2015 499 GET /admin/config.php HTTP/1.0
176.58.116.39 02/Feb/2015 304 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/fonts/casper-icons.woff HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
173.252.110.115 01/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
173.252.110.119 01/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.82.74.230 02/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.82.74.230 02/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
69.171.237.116 01/Feb/2015 200 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
69.171.237.116 01/Feb/2015 301 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /content/images/2015/01/lars.jpg HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /content/images/2015/01/lars.jpg HTTP/1.1
78.133.20.10 01/Feb/2015 404 GET /favicon.ico HTTP/1.1
175.44.8.98 01/Feb/2015 301 POST /ghost_linux_init_script/ HTTP/1.1
54.160.249.160 02/Feb/2015 200 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.197.168.201 02/Feb/2015 200 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
54.89.61.205 02/Feb/2015 200 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET /muieblackcat HTTP/1.1
111.251.50.236 01/Feb/2015 400 CONNECT mx0.mail2000.com.tw:25 HTTP/1.0
199.180.112.34 01/Feb/2015 404 GET //myadmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //MyAdmin/scripts/setup.php HTTP/1.1
207.145.97.131 02/Feb/2015 400 GET //Net_work.xml HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //phpmyadmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //phpMyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34 01/Feb/2015 404 GET //pma/scripts/setup.php HTTP/1.1
176.58.116.39 02/Feb/2015 304 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50 01/Feb/2015 200 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
198.20.69.74 02/Feb/2015 403 GET /robots.txt HTTP/1.1
207.34.25.76 01/Feb/2015 301 GET /robots.txt HTTP/1.1
54.160.249.160 02/Feb/2015 301 GET /robots.txt HTTP/1.1
54.197.168.201 02/Feb/2015 200 GET /robots.txt HTTP/1.1
54.89.61.205 02/Feb/2015 200 GET /robots.txt HTTP/1.1
186.15.3.50 02/Feb/2015 400 GET /tmUnblock.cgi HTTP/1.1
71.11.195.254 01/Feb/2015 400 GET /tmUnblock.cgi HTTP/1.1
Output in CSV format, for import into a spreadsheet application like Excel.
$ perl -lane 'printf "$1,$2,$4,$3\n" if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log |sort -u > access.log.csv
That's it!
Hope you enjoyed it!
Lars Bjaerris