Using Fabric and goaccess with nginx logs

Posted on Thu 25 May 2017 in Web

The lazy version

So why bother, I mean we got Google Analytics? Well yes, but I like to take ownership of my logs and reporting, if even in slightly lazy ways. I have also noticed lately that Google Analytics can be targeted by some more ad blockers so I like to always have a backup plan to user friendly reports for my web sites.

I will eventually have to update this to a slightly cleaner method. But for now here is a little explanation to the way I handle site reporting on the websites I deal with. Quick background on the setup:

Server side:

Debian Stable (Jessie)
Nginx
Goaccess

All of those are installed via standard Debian packages

Client side:

python3
fabric3 (API compatible with Fabric but built to run on python 3)

Now most would say use a virtualenv but for tools I tend to use universally (glances, fabric, pelican, etc) I skip the virtualenv because I just want them ready on a system wide basis.

First of all each of my sites has a script that looks like this. On my to do list is the script is universal is to change it to a python utility and pass in the site path variables to make it just one script.

    #!/bin/bash

    # Variables
    LOG_PATH="/var/log/nginx"
    RPT_PATH="setup a writable report path"
    TMP_PATH="setup a writable tmp path"

    # Get files more than a week but less than a month old
    for i in $(find $LOG_PATH -name 'access.log*.gz' -mtime -31 -mtime +6); do
      zcat $i|grep mack-z.com >> $TMP_PATH/month.log
    done

    # Get files more than a day but less than a week old
    for i in $(find $LOG_PATH -name 'access.log*.gz' -mtime -7 -mtime +0); do
      zcat $i|grep mack-z.com >> $TMP_PATH/month.log
      zcat $i|grep mack-z.com >> $TMP_PATH/week.log
    done

    # Get files less than a day old
    for i in $(find $LOG_PATH -name 'access.log*.gz' -mtime -1); do
      zcat $i|grep mack-z.com >> $TMP_PATH/month.log
      zcat $i|grep mack-z.com >> $TMP_PATH/week.log
      zcat $i|grep mack-z.com >> $TMP_PATH/day.log
    done

    # Get current uncompressed Logs
    for i in $(find $LOG_PATH -name 'access.log*' -not -name '*.gz' -mtime -1); do
      cat $i|grep mack-z.com >> $TMP_PATH/month.log
      cat $i|grep mack-z.com >> $TMP_PATH/week.log
      cat $i|grep mack-z.com >> $TMP_PATH/day.log
    done

    # Create Regular Reports
    goaccess -f $TMP_PATH/month.log -a > $RPT_PATH/SiteMonth.html
    goaccess -f $TMP_PATH/week.log -a > $RPT_PATH/SiteWeek.html
    goaccess -f $TMP_PATH/day.log -a > $RPT_PATH/SiteDay.html

    # Clean up temp files
    rm $TMP_PATH/*.log

So, to explain another part of this. I don't maintain a different log for each site. I must confess I am not dealing in the volume levels that would require that. Also I would rather just deal with one log for everything. So what these scripts do is pull out the specific site entries from each log. That is what gets built in the temp area. Also I break everything out by a range of time. For a podcast site/feed I used to run there were additional steps to pull requests that contained .mp3 to try to get a handle on actual episode downloads.

The next part is goaccess is used to create HTML reports and store them in the report directory. I should probably also mention now that if you are using nginx like I am that you will need the following lines in you /etc/goaccess.conf to properly report off of nginx logs.

log-format %h %^[%d:%t %^]  "%r" %s %b "%R" "%u" %T %^
date-format %d/%b/%Y

So where does fabric come in. Real simply here is an excerpt form my fabfile.py:

    from fabric.api import *

    [...]

    def get_rpts(mysite):
        sudo("base path of all sites" + mysite + "/scripts/*.sh", mysite)
        get("base path of all sites" + mysite + "/SiteReports/*", mysite)

    [...]

I actually have some other scripts for archives, but that is for another time, but that is why I execute *.sh.

The end result is all the HTML reports get downloaded to a local directory for each site to review.