Sunday, April 20, 2014

awk and log parsing


Find the number of total unique visitors:

cat access.log | awk '{print $1}' | sort | uniq -c | wc -l

2. Find the number of unique visitors today:

cat access.log | grep `date '+%e/%b/%G'` | awk '{print $1}' | sort | uniq -c | wc -l

3. Find the number of unique visitors this month:

cat access.log | grep `date '+%b/%G'` | awk '{print $1}' | sort | uniq -c | wc -l

4. Find the number of unique visitors on arbitrary date – for example March 22nd of 2007:

cat access.log | grep 22/Mar/2007 | awk '{print $1}' | sort | uniq -c | wc -l

5. (based on #3) Find the number of unique visitors for the month of March:

cat access.log | grep Mar/2007 | awk '{print $1}' | sort | uniq -c | wc -l

6. Show the sorted statistics of “number of visits/requests” “visitor’s IP address”:

cat access.log | awk '{print "requests from " $1}' | sort | uniq -c | sort

7. Similarly by adding “grep date”, as in above tips, the same statistics will be produces for “that” date:

cat access.log | grep 26/Mar/2007 | awk '{print "requests from " $1}' | sort | uniq -c | sort

Most Common 404s (Page Not Found)
cut -d'"' -f2,3 /var/log/apache/access.log | awk '$4=404{print $4" "$2}' | sort | uniq -c | sort -rg

2 - Count requests by HTTP code

cut -d'"' -f3 /var/log/apache/access.log | cut -d' ' -f2 | sort | uniq -c | sort -rg

3 - Largest Images
cut -d'"' -f2,3 /var/log/apache/access.log | grep -E '\.jpg|\.png|\.gif' | awk '{print $5" "$2}' | sort | uniq | sort -rg

4 - Filter Your IPs Requests
tail -f /var/log/apache/access.log | grep

5 - Top Referring URLS
cut -d'"' -f4 /var/log/apache/access.log | grep -v '^-#39; | grep -v '^http://www.yoursite.com' | sort | uniq -c | sort -rg

6 - Watch Crawlers Live
For this we need an extra file which we'll call bots.txt. Here's the contents:


Bot
Crawl
ai_archiver
libwww-perl
spider
Mediapartners-Google
slurp
wget
httrack


This just helps is to filter out common user agents used by crawlers.
Here's the command:
tail -f /var/log/apache/access.log | grep -f bots.txt

7 - Top Crawlers
This command will show you all the spiders that crawled your site with a count of the number of requests.
cut -d'"' -f6 /var/log/apache/access.log | grep -f bots.txt | sort | uniq -c | sort -rg


How To Get A Top Ten
You can easily turn the commands above that aggregate (the ones using uniq) into a top ten by adding this to the end:
| head

That is pipe the output to the head command.
Simple as that.

Zipped Log Files
If you want to run the above commands on a logrotated file, you can adjust easily by starting with a zcat on the file then piping to the first command (the one with the filename).

So this:
cut -d'"' -f3 /var/log/apache/access.log | cut -d' ' -f2 | sort | uniq -c | sort -rg
Would become this:
zcat /var/log/apache/access.log.1.gz | cut -d'"' -f3 | cut -d' ' -f2 | sort | uniq -c | sort -rg

No comments:

Post a Comment