I’ve been hacking around with varnish a lot lately. If you don’t know what varnish is, it is an open-source, in-memory caching HTTP reverse proxy and really just an amazing piece of software. You can learn more here. Varnish, in keeping with it’s philosophy of being low-footprint and high-performance, does not by default log anything to slow disks. Instead it keeps an ephemeral log in memory. However, it does ship with a utility called [shell]varnishncsa[/shell] which translates it’s in-memory log into a standard NCSA log format (common log format). You’re familiar with this format if you’ve ever looked at an apache or nginx access log.
The reason I’m interested in enabling these disk logs via varnishncsa is because I’ve also been playing around with Logstash, Elasticsearch, and Kibana recently to centralize logs and create meaningful graphs from the data. The easiest way I’ve found to plug varnish data into the logstash-forwarder daemon is to enable varnishncsa and have logstash parse them.
By default, varnishncsa doesn’t automatically separate logs into individual vhosts. This is a problem right away if you have multiple domains on one server, but if you think about it, it wouldn’t make sense for them to try. Both varnish and varnishncsa are highly configurable, so we can make them do whatever we want. So let’s get to it.
Here’s what we want.
1) Individual log files for each vhost.
2) We want this logging mechanism to be started and stopped via a standard sysv init script.
3) Bonus points: Automate our configuration.
4) Super bonus points: Create an upstart script? I haven’t gotten to this yet but I will.*
The best way I found to do this is to spawn an individual [shell]varnishncsa[/shell] process per virtual host, and specify which [shell]Host[/shell] header to look for in the request, using a Varnish Shared Memory Language Query (VSL), using the [shell]-q[/shell] option. For instance:
[shell]
varnishncsa -a -D \
-q ‘ReqHeader:Host ~ “^(www\.)?kevops.com$”‘ \
-w /var/log/varnish/kevops.com.access.log \
-P /var/run/varnishncsa.kevops.com.pid \
-F ‘%{X-Forwarded-For}i %l %u %t “%r” %s %b “%{Referer}i” “%{User-agent}i”‘
[/shell]
Most of these options are pretty self-explanatory. The most important one is [shell]-q[/shell] which we’ll get to in a minute. But I want to first explain the last option ([shell]-F [/shell]). I’m specifying the log format I want explicitly because I need to modify the first field only. I am replacing the standard client ip with the value of a header for “X-Forwarded-For”. I’m doing this becuase I have an SSL terminator (pound) in front of varnish for HTTPS requests, and further upstream, a loadbalancer. Both of these inject/append to this header as necessary so that we know the original client ip. If these situations don’t apply to you, simply remove the entire [shell]-F[/shell] option line above. (and the trailing slash in the preceeding line of course)
So let’s talk about that [shell]-q[/shell] option:
[shell]
-q ‘ReqHeader:Host ~ “^(www\.)kevops\.com$”‘
[/shell]
This is VSL. It’s very intuitive, especially if you’ve spent some time hacking around with varnish’s domain-specific configuration language VCL, which I’m sure you have if you have been using and tuning Varnish. Since varnishncsa just uses the in-memory varnish log, it’s parsing behavior is very similar to the varnishlog utility. We’re taking all HTTP requests whose Host header was exactly kevops.com, or exactly www.kevops.com and writing those to a specific log file. You could use this [shell]-q[/shell] option to write very specific log files if you wanted, based on whatever HTTP headers (or anything) you want. I haven’t gotten that creative yet, or seen a need. But you certainly could.
Anyway that’s one virtual host, we’ve daemonized (backgrounded it) with the [shell]-D[/shell] option, and it’s logging requests. Awesome. But what if we have 15 different vhosts on this server and we want to log all of them to individual files?
Here’s what I did. This might seem a bit hacky, but it’s the right way to use the tools we’re given in this case, in my opinion. First, create a wrapper script. Let’s put it somewhere in our [shell]$PATH[/shell], like [shell]/usr/local/bin/varnishncsa-wrapper[/shell]
[shell]
#!/bin/bash
# Wrapper script for varnishncsa per vhost
# Invoked by our sysv init script /etc/init.d/varnish-logger
varnishncsa -a -D \
-q ‘ReqHeader:Host ~ “^(www\.)?example\.com$”‘ \
-w /var/log/varnish/example.com.access.log \
-P /var/run/varnishncsa.example.com.pid \
varnishncsa -a -D \
-q ‘ReqHeader:Host ~ “^(www\.)?example2\.com$”‘ \
-w /var/log/varnish/example2.com.access.log \
-P /var/run/varnishncsa.example2.com.pid \
varnishncsa -a -D \
-q ‘ReqHeader:Host ~ “^(www\.)?example3\.com$”‘ \
-w /var/log/varnish/example3.com.access.log \
-P /var/run/varnishncsa.example3.com.pid \
# and etc, for each virtual host…
[/shell]
That’s already tedious, so we’ll need to automate this obviously. We’ll get to that later. If you have this script created for all or some of your virtual hosts, next we need to create our sysv init script, [shell]/etc/init.d/varnish-logger[/shell]
[shell]
#!/bin/sh
# Simple init script for starting/stopping logging vhosts via varnishncsa processes
### BEGIN INIT INFO
# Provides: varnishncsa-wrapper
# Required-Start: $local_fs $remote_fs $network varnish
# Required-Stop: $local_fs $remote_fs $network varnish
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Vhosts wrapper script to spawn varnishncsa daemons
# Description: This script provides logging for varnish in NCSA format for all vhosts
### END INIT INFO
DAEMON=”/usr/local/bin/varnishncsa-wrapper”
method=”$1″
do_stop()
{
killall -9 varnishncsa >/dev/null 2>&1
ps auxf | grep varnishncsa | grep -v grep > /dev/null 2>&1 || return 0
return 1
}
do_restart()
{
if do_stop; then
${DAEMON} && return 0
fi
return 1
}
case “$method” in
*start)
do_restart && echo “[OK]” && exit 0
echo “[FAIL]” && exit 1
;;
stop)
do_stop && echo “[OK]” && exit 0
echo “[FAIL]” && exit 1
;;
status)
pgrep varnishncsa > /dev/null 2>&1 && echo “Running…” && exit 0
echo “Not running…” && exit 1;
;;
esac
[/shell]
Yep, it’s very dumb and very hacky. I’m aware. I don’t care. All it needs to do is start and stop logging for varnish HTTP requests and separate requests that match valid Hosts. And this does the job. Improvements will come later.
Speaking of improvements, let’s automate the creation of the wrapper script. So this gets kind of weird. If you are using configuration management tools like salt, puppet, chef, ansible, then I don’t need to tell you how to automate this. You are probably already writing a state, recipe, manifest, or playbook, and making what I did better, and more suited for your needs. I am using Salt right now, so building the wrapper script is trivial.
It looks something like:
[shell]
#!/bin/bash
# Simple wrapper for varnishncsa to individually log vhosts
# Generated by SaltStack
{% for website in websites %}
# http
echo “Starting varnish NCSA logging for HTTP traffic /var/log/varnish/{{ website }}.access.log …”
varnishncsa -a -D \
-q ‘ReqHeader:Host ~ “^(www\.)?{{ website }}$”‘ \
-w /var/log/varnish/{{ website }}.access.log \
-P /var/run/varnishncsa.{{ website }}.pid \
-F ‘%{X-Forwarded-For}i %l %u %t “%r” %s %b “%{Referer}i” “%{User-agent}i”‘
{% endfor %}
[/shell]
If you are still reading this, you are:
1) Writing some bash script that creates the wrapper script for each host and you’ve gotten what you need from me and are happy.
2) You’ve digested it and are taking bits of it for your own use with configuration management tools of your choice.
3) You’re absolutely insane to have read this far.
* – I couldn’t figure this out. I tried for 15 minutes and gave up since [shell]system[/shell] works with both types for now.
Do you picture any way to pipe varnish logs into WHM apache logs so every virtualhosts gets integrated varnish + apache logs for awstats or webalizer? With Varnish one miss part of the logs and specially the bytes transferred.