Parsing Plaintext Logfiles On The Commandline Using Perl.

Perl, "the Swiss Army chainsaw of scripting languages"

Using Perl, regular expressions and sort to parse and format a plaintext log-file.
And finally comma separate the output (CSV format) for import into e.g. Excel.
The approach here can easily be modified to parse other plaintext files in the same manner.

Here is a snippet of an Nginx access.log file example we are going to use for parsing:

$ cat ./access.log

216.218.206.66 - - [01/Feb/2015:05:35:11 +0000] "GET / HTTP/1.1" 403 162 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //MyAdmin/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //phpMyAdmin/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //pma/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //phpmyadmin/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET //myadmin/scripts/setup.php HTTP/1.1" 404 56 "-" "-"
199.180.112.34 - - [01/Feb/2015:05:59:48 +0000] "GET /muieblackcat HTTP/1.1" 404 162 "-" "-"
111.251.50.236 - - [01/Feb/2015:07:17:47 +0000] "CONNECT mx0.mail2000.com.tw:25 HTTP/1.0" 400 166 "-" "-"
23.239.196.71 - - [01/Feb/2015:07:45:48 +0000] "GET / HTTP/1.1" 301 178 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)"
69.171.237.116 - - [01/Feb/2015:11:06:01 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 301 178 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.237.116 - - [01/Feb/2015:11:06:02 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 200 57525 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
77.37.231.208 - - [01/Feb/2015:11:26:03 +0000] "GET / HTTP/1.1" 301 178 "-" "Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0"
173.252.110.115 - - [01/Feb/2015:11:31:20 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 301 178 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
173.252.110.119 - - [01/Feb/2015:11:31:21 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 200 57525 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
61.240.144.66 - - [01/Feb/2015:12:42:47 +0000] "GET / HTTP/1.0" 200 612 "-" "masscan/1.0 (https://github.com/robertdavidgraham/masscan)"
108.166.85.126 - - [01/Feb/2015:16:05:43 +0000] "GET /admin/config.php HTTP/1.0" 499 0 "-" "-"
23.23.38.251 - - [01/Feb/2015:16:24:00 +0000] "GET / HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB6 (.NET CLR 3.5.30729)"
23.23.38.251 - - [01/Feb/2015:16:24:01 +0000] "GET / HTTP/1.1" 200 7105 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB6 (.NET CLR 3.5.30729)"
185.5.51.50 - - [01/Feb/2015:16:56:48 +0000] "GET / HTTP/1.1" 301 178 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:50 +0000] "GET / HTTP/1.1" 200 7105 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:50 +0000] "GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1" 200 9977 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1" 200 39386 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1" 200 2698 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1" 200 3075 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /content/images/2015/01/lars.jpg HTTP/1.1" 200 47097 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:56:51 +0000] "GET /assets/fonts/casper-icons.woff HTTP/1.1" 200 2260 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
185.5.51.50 - - [01/Feb/2015:16:57:20 +0000] "GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1" 200 6642 "https://bjaerris.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53"
207.34.25.76 - - [01/Feb/2015:18:01:13 +0000] "GET /robots.txt HTTP/1.1" 301 178 "-" "R6_CommentReader(www.radian6.com/crawler)"
71.11.195.254 - - [01/Feb/2015:21:37:39 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 166 "-" "-"
175.44.8.98 - - [01/Feb/2015:22:12:06 +0000] "POST /ghost_linux_init_script/ HTTP/1.1" 301 178 "-" "-"
78.133.20.10 - - [01/Feb/2015:22:42:33 +0000] "GET / HTTP/1.1" 200 72 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
78.133.20.10 - - [01/Feb/2015:22:42:33 +0000] "GET /favicon.ico HTTP/1.1" 404 162 "https://gw4.node25.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
31.202.241.242 - - [02/Feb/2015:01:15:13 +0000] "GET / HTTP/1.0" 301 178 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
207.145.97.131 - - [02/Feb/2015:04:00:25 +0000] "GET / HTTP/1.1" 400 166 "-" "-"
207.145.97.131 - - [02/Feb/2015:04:00:26 +0000] "GET //Net_work.xml HTTP/1.1" 400 166 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:10 +0000] "GET / HTTP/1.1" 403 162 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:10 +0000] "GET /robots.txt HTTP/1.1" 403 162 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:12 +0000] "" 400 0 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:13 +0000] "" 400 0 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:13 +0000] "" 400 0 "-" "-"
198.20.69.74 - - [02/Feb/2015:06:18:17 +0000] "quit" 400 166 "-" "-"
54.89.61.205 - - [02/Feb/2015:08:25:02 +0000] "GET /robots.txt HTTP/1.1" 200 48 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.197.168.201 - - [02/Feb/2015:08:25:02 +0000] "GET /robots.txt HTTP/1.1" 200 48 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.89.61.205 - - [02/Feb/2015:08:25:02 +0000] "GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1" 200 6638 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.197.168.201 - - [02/Feb/2015:08:25:02 +0000] "GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1" 200 5682 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.160.249.160 - - [02/Feb/2015:08:25:03 +0000] "GET /robots.txt HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.160.249.160 - - [02/Feb/2015:08:25:03 +0000] "GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1" 200 5682 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.82.74.230 - - [02/Feb/2015:08:25:03 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.6; +http://flipboard.com/browserproxy)"
54.82.74.230 - - [02/Feb/2015:08:25:04 +0000] "GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1" 200 57525 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.6; +http://flipboard.com/browserproxy)"
54.160.249.160 - - [02/Feb/2015:08:25:08 +0000] "GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1" 200 5682 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
186.15.3.50 - - [02/Feb/2015:11:13:54 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 166 "-" "-"
176.58.116.39 - - [02/Feb/2015:12:01:46 +0000] "GET / HTTP/1.1" 200 7107 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:46 +0000] "GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:47 +0000] "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:47 +0000] "GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:47 +0000] "GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:48 +0000] "GET /content/images/2015/01/lars.jpg HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:54 +0000] "GET / HTTP/1.1" 200 7107 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:54 +0000] "GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:55 +0000] "GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:55 +0000] "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:55 +0000] "GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"
176.58.116.39 - - [02/Feb/2015:12:01:55 +0000] "GET /content/images/2015/01/lars.jpg HTTP/1.1" 304 0 "https://bjaerris.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/7.1.3 Safari/537.85.12"

Let's say we want to:

  1. Extract the IP address, date, request, URN and HTTP Status Code.
  2. Ignore duplicates from the same day.

Here one might think that "awk" & "sort" would do, but because of the date format with "hour:minute:seconds", ignoring duplicates with these, eventually ends up convoluted:

$ awk '{print $1, $4, $6, $7}' access.log |sort -u

108.166.85.126 [01/Feb/2015:16:05:43 "GET /admin/config.php
111.251.50.236 [01/Feb/2015:07:17:47 "CONNECT mx0.mail2000.com.tw:25
173.252.110.115 [01/Feb/2015:11:31:20 "GET /content/images/2015/01/ghost_login_owner.png
173.252.110.119 [01/Feb/2015:11:31:21 "GET /content/images/2015/01/ghost_login_owner.png
175.44.8.98 [01/Feb/2015:22:12:06 "POST /ghost_linux_init_script/
176.58.116.39 [02/Feb/2015:12:01:46 "GET /
176.58.116.39 [02/Feb/2015:12:01:46 "GET /assets/css/screen.css?v=cfc2d462d6
176.58.116.39 [02/Feb/2015:12:01:47 "GET /assets/js/index.js?v=cfc2d462d6
176.58.116.39 [02/Feb/2015:12:01:47 "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6
176.58.116.39 [02/Feb/2015:12:01:47 "GET /public/jquery.min.js?v=cfc2d462d6
176.58.116.39 [02/Feb/2015:12:01:48 "GET /content/images/2015/01/lars.jpg
176.58.116.39 [02/Feb/2015:12:01:54 "GET /                                              <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:54 "GET /assets/css/screen.css?v=cfc2d462d6            <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:55 "GET /assets/js/index.js?v=cfc2d462d6               <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:55 "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6      <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:55 "GET /content/images/2015/01/lars.jpg               <---Duplicate
176.58.116.39 [02/Feb/2015:12:01:55 "GET /public/jquery.min.js?v=cfc2d462d6             <---Duplicate
185.5.51.50 [01/Feb/2015:16:56:48 "GET /
185.5.51.50 [01/Feb/2015:16:56:50 "GET /                                                <---Duplicate
185.5.51.50 [01/Feb/2015:16:56:50 "GET /assets/css/screen.css?v=cfc2d462d6
185.5.51.50 [01/Feb/2015:16:56:51 "GET /assets/fonts/casper-icons.woff
185.5.51.50 [01/Feb/2015:16:56:51 "GET /assets/js/index.js?v=cfc2d462d6
185.5.51.50 [01/Feb/2015:16:56:51 "GET /assets/js/jquery.fitvids.js?v=cfc2d462d6
185.5.51.50 [01/Feb/2015:16:56:51 "GET /content/images/2015/01/lars.jpg
185.5.51.50 [01/Feb/2015:16:56:51 "GET /public/jquery.min.js?v=cfc2d462d6
185.5.51.50 [01/Feb/2015:16:57:20 "GET /identifying-services-needing-restart-after-updating-linux-packages/
186.15.3.50 [02/Feb/2015:11:13:54 "GET /tmUnblock.cgi
198.20.69.74 [02/Feb/2015:06:18:10 "GET /
198.20.69.74 [02/Feb/2015:06:18:10 "GET /robots.txt
198.20.69.74 [02/Feb/2015:06:18:12 "" 400
198.20.69.74 [02/Feb/2015:06:18:13 "" 400                                               <---Duplicate
198.20.69.74 [02/Feb/2015:06:18:17 "quit" 400
199.180.112.34 [01/Feb/2015:05:59:48 "GET /muieblackcat
199.180.112.34 [01/Feb/2015:05:59:48 "GET //myadmin/scripts/setup.php
199.180.112.34 [01/Feb/2015:05:59:48 "GET //MyAdmin/scripts/setup.php
199.180.112.34 [01/Feb/2015:05:59:48 "GET //phpmyadmin/scripts/setup.php
199.180.112.34 [01/Feb/2015:05:59:48 "GET //phpMyAdmin/scripts/setup.php
199.180.112.34 [01/Feb/2015:05:59:48 "GET //pma/scripts/setup.php
207.145.97.131 [02/Feb/2015:04:00:25 "GET /
207.145.97.131 [02/Feb/2015:04:00:26 "GET //Net_work.xml
207.34.25.76 [01/Feb/2015:18:01:13 "GET /robots.txt
216.218.206.66 [01/Feb/2015:05:35:11 "GET /
23.23.38.251 [01/Feb/2015:16:24:00 "GET /
23.23.38.251 [01/Feb/2015:16:24:01 "GET /                                               <---Duplicate
23.239.196.71 [01/Feb/2015:07:45:48 "GET / 
31.202.241.242 [02/Feb/2015:01:15:13 "GET /
54.160.249.160 [02/Feb/2015:08:25:03 "GET /host-your-own-blog-with-ghost-nginx-on-linux/
54.160.249.160 [02/Feb/2015:08:25:03 "GET /robots.txt
54.160.249.160 [02/Feb/2015:08:25:08 "GET /host-your-own-blog-with-ghost-nginx-on-linux/
54.197.168.201 [02/Feb/2015:08:25:02 "GET /host-your-own-blog-with-ghost-nginx-on-linux/
54.197.168.201 [02/Feb/2015:08:25:02 "GET /robots.txt
54.82.74.230 [02/Feb/2015:08:25:03 "GET /content/images/2015/01/ghost_login_owner.png
54.82.74.230 [02/Feb/2015:08:25:04 "GET /content/images/2015/01/ghost_login_owner.png   <---Duplicate
54.89.61.205 [02/Feb/2015:08:25:02 "GET /identifying-services-needing-restart-after-updating-linux-packages/
54.89.61.205 [02/Feb/2015:08:25:02 "GET /robots.txt
61.240.144.66 [01/Feb/2015:12:42:47 "GET /
69.171.237.116 [01/Feb/2015:11:06:01 "GET /content/images/2015/01/ghost_login_owner.png
69.171.237.116 [01/Feb/2015:11:06:02 "GET /content/images/2015/01/ghost_login_owner.png <---Duplicate
71.11.195.254 [01/Feb/2015:21:37:39 "GET /tmUnblock.cgi
77.37.231.208 [01/Feb/2015:11:26:03 "GET /
78.133.20.10 [01/Feb/2015:22:42:33 "GET /
78.133.20.10 [01/Feb/2015:22:42:33 "GET /favicon.ico

Clearly another approach is needed if we're to easily ignore duplicates from the same day.
Removing the "hour:minute:seconds" part of date stamp from the output to "sort -u" will fix that.
So let's get rid of the duplicate lines, this time using Perl and regular expressions to parse the log lines.

Perl, the right tool for the job.

$ perl -lane 'print "$1 $2 $3 $4 " if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log

216.218.206.66 01/Feb/2015 GET / HTTP/1.1 403 
199.180.112.34 01/Feb/2015 GET //MyAdmin/scripts/setup.php HTTP/1.1 404 
199.180.112.34 01/Feb/2015 GET //phpMyAdmin/scripts/setup.php HTTP/1.1 404 
199.180.112.34 01/Feb/2015 GET //pma/scripts/setup.php HTTP/1.1 404 
199.180.112.34 01/Feb/2015 GET //phpmyadmin/scripts/setup.php HTTP/1.1 404 
199.180.112.34 01/Feb/2015 GET //myadmin/scripts/setup.php HTTP/1.1 404 
199.180.112.34 01/Feb/2015 GET /muieblackcat HTTP/1.1 404 
111.251.50.236 01/Feb/2015 CONNECT mx0.mail2000.com.tw:25 HTTP/1.0 400 
23.239.196.71 01/Feb/2015 GET / HTTP/1.1 301 
69.171.237.116 01/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 301 
69.171.237.116 01/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 200 
77.37.231.208 01/Feb/2015 GET / HTTP/1.1 301 
173.252.110.115 01/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 301 
173.252.110.119 01/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 200 
61.240.144.66 01/Feb/2015 GET / HTTP/1.0 200 
108.166.85.126 01/Feb/2015 GET /admin/config.php HTTP/1.0 499 
23.23.38.251 01/Feb/2015 GET / HTTP/1.1 301 
23.23.38.251 01/Feb/2015 GET / HTTP/1.1 200 
185.5.51.50 01/Feb/2015 GET / HTTP/1.1 301 
185.5.51.50 01/Feb/2015 GET / HTTP/1.1 200 
185.5.51.50 01/Feb/2015 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1 200 
185.5.51.50 01/Feb/2015 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1 200 
185.5.51.50 01/Feb/2015 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1 200 
185.5.51.50 01/Feb/2015 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1 200 
185.5.51.50 01/Feb/2015 GET /content/images/2015/01/lars.jpg HTTP/1.1 200 
185.5.51.50 01/Feb/2015 GET /assets/fonts/casper-icons.woff HTTP/1.1 200 
185.5.51.50 01/Feb/2015 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1 200 
207.34.25.76 01/Feb/2015 GET /robots.txt HTTP/1.1 301 
71.11.195.254 01/Feb/2015 GET /tmUnblock.cgi HTTP/1.1 400 
175.44.8.98 01/Feb/2015 POST /ghost_linux_init_script/ HTTP/1.1 301 
78.133.20.10 01/Feb/2015 GET / HTTP/1.1 200 
78.133.20.10 01/Feb/2015 GET /favicon.ico HTTP/1.1 404 
31.202.241.242 02/Feb/2015 GET / HTTP/1.0 301 
207.145.97.131 02/Feb/2015 GET / HTTP/1.1 400 
207.145.97.131 02/Feb/2015 GET //Net_work.xml HTTP/1.1 400 
198.20.69.74 02/Feb/2015 GET / HTTP/1.1 403 
198.20.69.74 02/Feb/2015 GET /robots.txt HTTP/1.1 403 
198.20.69.74 02/Feb/2015 quit 400 
54.89.61.205 02/Feb/2015 GET /robots.txt HTTP/1.1 200 
54.197.168.201 02/Feb/2015 GET /robots.txt HTTP/1.1 200 
54.89.61.205 02/Feb/2015 GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1 200 
54.197.168.201 02/Feb/2015 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1 200 
54.160.249.160 02/Feb/2015 GET /robots.txt HTTP/1.1 301 
54.160.249.160 02/Feb/2015 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1 200 
54.82.74.230 02/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 301 
54.82.74.230 02/Feb/2015 GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1 200 
54.160.249.160 02/Feb/2015 GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1 200 
186.15.3.50 02/Feb/2015 GET /tmUnblock.cgi HTTP/1.1 400 
176.58.116.39 02/Feb/2015 GET / HTTP/1.1 200 
176.58.116.39 02/Feb/2015 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET /content/images/2015/01/lars.jpg HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET / HTTP/1.1 200 
176.58.116.39 02/Feb/2015 GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1 304 
176.58.116.39 02/Feb/2015 GET /content/images/2015/01/lars.jpg HTTP/1.1 304

Before we go further, let me explain the regular expression used in the previous command.

^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+("[^"]*")\s(\d{3})

^            # Match start of string.
###
# 1st Capturing group.
(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) 
\d{1,3})  # Match a digit [0-9] Between 1 and 3 times (greedy).
\.        # Match the character "." literally.
###
\s           # Match any white space character [\r\n\t\f].
\-           # Match the character "-" literally.
\s           # Match any white space character [\r\n\t\f].
\-           # Match the character "-" literally.
\s           # Match any white space character [\r\n\t\f]
\[           # Match the character "[" literally.
###
# 2nd Capturing group.
([^:]+)
[^:]+        # Match anything not a ":", one or more times (greedy).
###
[^"]+        # Match anything not a """, one or more times (greedy).
###
# 3rd Capturing group.
("[^"]*")
"            # Match a single character """ literally.
[^"]*        # Match anything not a """, zero or more times (greedy).
"            # Match a single character """ literally.
###
\s           # Match any white space character [\r\n\t\f].
###
# 4th Capturing group.
(\d{3})
\d{3}        # match a digit [0-9] Exactly 3 times.

For more on regular expressions: http://www.regular-expressions.info/tutorial.html

Let's make it "pretty".

Let's move the HTTP status code to the 3rd column and use the printf function to line up the columns to make the output more legible.

$ perl -lane 'printf ("%-16s %-12s %-4s %s\n", $1, $2, $4, $3) if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log

216.218.206.66   01/Feb/2015  403  GET / HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //MyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //phpMyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //pma/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //phpmyadmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //myadmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET /muieblackcat HTTP/1.1
111.251.50.236   01/Feb/2015  400  CONNECT mx0.mail2000.com.tw:25 HTTP/1.0
23.239.196.71    01/Feb/2015  301  GET / HTTP/1.1
69.171.237.116   01/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
69.171.237.116   01/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
77.37.231.208    01/Feb/2015  301  GET / HTTP/1.1
173.252.110.115  01/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
173.252.110.119  01/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
61.240.144.66    01/Feb/2015  200  GET / HTTP/1.0
108.166.85.126   01/Feb/2015  499  GET /admin/config.php HTTP/1.0
23.23.38.251     01/Feb/2015  301  GET / HTTP/1.1
23.23.38.251     01/Feb/2015  200  GET / HTTP/1.1
185.5.51.50      01/Feb/2015  301  GET / HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET / HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /content/images/2015/01/lars.jpg HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/fonts/casper-icons.woff HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
207.34.25.76     01/Feb/2015  301  GET /robots.txt HTTP/1.1
71.11.195.254    01/Feb/2015  400  GET /tmUnblock.cgi HTTP/1.1
175.44.8.98      01/Feb/2015  301  POST /ghost_linux_init_script/ HTTP/1.1
78.133.20.10     01/Feb/2015  200  GET / HTTP/1.1
78.133.20.10     01/Feb/2015  404  GET /favicon.ico HTTP/1.1
31.202.241.242   02/Feb/2015  301  GET / HTTP/1.0
207.145.97.131   02/Feb/2015  400  GET / HTTP/1.1
207.145.97.131   02/Feb/2015  400  GET //Net_work.xml HTTP/1.1
198.20.69.74     02/Feb/2015  403  GET / HTTP/1.1
198.20.69.74     02/Feb/2015  403  GET /robots.txt HTTP/1.1
198.20.69.74     02/Feb/2015  400  quit
54.89.61.205     02/Feb/2015  200  GET /robots.txt HTTP/1.1
54.197.168.201   02/Feb/2015  200  GET /robots.txt HTTP/1.1
54.89.61.205     02/Feb/2015  200  GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
54.197.168.201   02/Feb/2015  200  GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.160.249.160   02/Feb/2015  301  GET /robots.txt HTTP/1.1
54.160.249.160   02/Feb/2015  200  GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.82.74.230     02/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.82.74.230     02/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.160.249.160   02/Feb/2015  200  GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
186.15.3.50      02/Feb/2015  400  GET /tmUnblock.cgi HTTP/1.1
176.58.116.39    02/Feb/2015  200  GET / HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /content/images/2015/01/lars.jpg HTTP/1.1
176.58.116.39    02/Feb/2015  200  GET / HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /content/images/2015/01/lars.jpg HTTP/1.1

Much better!

Removing duplicate lines.

$ perl -lane 'printf ("%-16s %-12s %-4s %s\n", $1, $2, $4, $3) if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log |sort -u

108.166.85.126   01/Feb/2015  499  GET /admin/config.php HTTP/1.0
111.251.50.236   01/Feb/2015  400  CONNECT mx0.mail2000.com.tw:25 HTTP/1.0
173.252.110.115  01/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
173.252.110.119  01/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
175.44.8.98      01/Feb/2015  301  POST /ghost_linux_init_script/ HTTP/1.1
176.58.116.39    02/Feb/2015  200  GET / HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /content/images/2015/01/lars.jpg HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/fonts/casper-icons.woff HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /content/images/2015/01/lars.jpg HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET / HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  301  GET / HTTP/1.1
186.15.3.50      02/Feb/2015  400  GET /tmUnblock.cgi HTTP/1.1
198.20.69.74     02/Feb/2015  400  quit
198.20.69.74     02/Feb/2015  403  GET / HTTP/1.1
198.20.69.74     02/Feb/2015  403  GET /robots.txt HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET /muieblackcat HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //myadmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //MyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //phpmyadmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //phpMyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //pma/scripts/setup.php HTTP/1.1
207.145.97.131   02/Feb/2015  400  GET / HTTP/1.1
207.145.97.131   02/Feb/2015  400  GET //Net_work.xml HTTP/1.1
207.34.25.76     01/Feb/2015  301  GET /robots.txt HTTP/1.1
216.218.206.66   01/Feb/2015  403  GET / HTTP/1.1
23.23.38.251     01/Feb/2015  200  GET / HTTP/1.1
23.23.38.251     01/Feb/2015  301  GET / HTTP/1.1
23.239.196.71    01/Feb/2015  301  GET / HTTP/1.1
31.202.241.242   02/Feb/2015  301  GET / HTTP/1.0
54.160.249.160   02/Feb/2015  200  GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.160.249.160   02/Feb/2015  301  GET /robots.txt HTTP/1.1
54.197.168.201   02/Feb/2015  200  GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.197.168.201   02/Feb/2015  200  GET /robots.txt HTTP/1.1
54.82.74.230     02/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.82.74.230     02/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.89.61.205     02/Feb/2015  200  GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
54.89.61.205     02/Feb/2015  200  GET /robots.txt HTTP/1.1
61.240.144.66    01/Feb/2015  200  GET / HTTP/1.0
69.171.237.116   01/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
69.171.237.116   01/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
71.11.195.254    01/Feb/2015  400  GET /tmUnblock.cgi HTTP/1.1
77.37.231.208    01/Feb/2015  301  GET / HTTP/1.1
78.133.20.10     01/Feb/2015  200  GET / HTTP/1.1
78.133.20.10     01/Feb/2015  404  GET /favicon.ico HTTP/1.1

Want to sort it based on URN/file-path?

$ perl -lane 'printf ("%-16s %-12s %-4s %s\n", $1, $2, $4, $3) if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log |sort -u|sort -k5,5

198.20.69.74     02/Feb/2015  400  quit
176.58.116.39    02/Feb/2015  200  GET / HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET / HTTP/1.1
185.5.51.50      01/Feb/2015  301  GET / HTTP/1.1
198.20.69.74     02/Feb/2015  403  GET / HTTP/1.1
207.145.97.131   02/Feb/2015  400  GET / HTTP/1.1
216.218.206.66   01/Feb/2015  403  GET / HTTP/1.1
23.23.38.251     01/Feb/2015  200  GET / HTTP/1.1
23.23.38.251     01/Feb/2015  301  GET / HTTP/1.1
23.239.196.71    01/Feb/2015  301  GET / HTTP/1.1
31.202.241.242   02/Feb/2015  301  GET / HTTP/1.0
61.240.144.66    01/Feb/2015  200  GET / HTTP/1.0
77.37.231.208    01/Feb/2015  301  GET / HTTP/1.1
78.133.20.10     01/Feb/2015  200  GET / HTTP/1.1
108.166.85.126   01/Feb/2015  499  GET /admin/config.php HTTP/1.0
176.58.116.39    02/Feb/2015  304  GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/css/screen.css?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/fonts/casper-icons.woff HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/js/index.js?v=cfc2d462d6 HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /assets/js/jquery.fitvids.js?v=cfc2d462d6 HTTP/1.1
173.252.110.115  01/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
173.252.110.119  01/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.82.74.230     02/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
54.82.74.230     02/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
69.171.237.116   01/Feb/2015  200  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
69.171.237.116   01/Feb/2015  301  GET /content/images/2015/01/ghost_login_owner.png HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /content/images/2015/01/lars.jpg HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /content/images/2015/01/lars.jpg HTTP/1.1
78.133.20.10     01/Feb/2015  404  GET /favicon.ico HTTP/1.1
175.44.8.98      01/Feb/2015  301  POST /ghost_linux_init_script/ HTTP/1.1
54.160.249.160   02/Feb/2015  200  GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
54.197.168.201   02/Feb/2015  200  GET /host-your-own-blog-with-ghost-nginx-on-linux/ HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
54.89.61.205     02/Feb/2015  200  GET /identifying-services-needing-restart-after-updating-linux-packages/ HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET /muieblackcat HTTP/1.1
111.251.50.236   01/Feb/2015  400  CONNECT mx0.mail2000.com.tw:25 HTTP/1.0
199.180.112.34   01/Feb/2015  404  GET //myadmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //MyAdmin/scripts/setup.php HTTP/1.1
207.145.97.131   02/Feb/2015  400  GET //Net_work.xml HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //phpmyadmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //phpMyAdmin/scripts/setup.php HTTP/1.1
199.180.112.34   01/Feb/2015  404  GET //pma/scripts/setup.php HTTP/1.1
176.58.116.39    02/Feb/2015  304  GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
185.5.51.50      01/Feb/2015  200  GET /public/jquery.min.js?v=cfc2d462d6 HTTP/1.1
198.20.69.74     02/Feb/2015  403  GET /robots.txt HTTP/1.1
207.34.25.76     01/Feb/2015  301  GET /robots.txt HTTP/1.1
54.160.249.160   02/Feb/2015  301  GET /robots.txt HTTP/1.1
54.197.168.201   02/Feb/2015  200  GET /robots.txt HTTP/1.1
54.89.61.205     02/Feb/2015  200  GET /robots.txt HTTP/1.1
186.15.3.50      02/Feb/2015  400  GET /tmUnblock.cgi HTTP/1.1
71.11.195.254    01/Feb/2015  400  GET /tmUnblock.cgi HTTP/1.1

Output in CSV format, for import into a spreadsheet application like Excel.

$ perl -lane 'printf "$1,$2,$4,$3\n" if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\-\s\-\s\[([^:]+)[^"]+"([^"]+)"\s(\d{3})/' ./access.log |sort -u > access.log.csv

That's it!

Hope you enjoyed it!

Lars Bjaerris