Ever seen this snippet below for Varnish virtual hosts and wondered how you’re going to manage a dozen of websites with the same dozen of if
statements in your VCL file?
if (! req.http.Host) {
error 404 "Need a host header";
}
set req.http.Host = regsub(req.http.Host, "^www\.", "");
set req.http.Host = regsub(req.http.Host, ":80$", "");
if (req.http.Host == "something.com") {
include "/etc/varnish/site-something.com.vcl";
} elsif (req.http.Host == "somethingelse.com") {
include "/etc/varnish/site-somethingelse.com.vcl";
}
While Varnish is so fine and great, it really lacks some documentation and tutorials on setting up virtual hosts the right way.
Varnish Virtual Hosts
Why do we need virtual hosts in Varnish so much? It’s a caching server. It doesn’t care for the domain name that is present in a request. It simply passes a request along to the backend server, or, if it’s present in Varnish cache, serves it directly without talking to Nginx or Apache.
But we need virtual hosts in Varnish. Because different sites use different technologies, different login pages, and so most importantly, they use different cookie names. Cookies are the primary reason the need for Varnish virtual hosts exists. So that we can filter against different cookies.
In general, we need Varnish to distinguish between the sites to adjust its caching policy towards specific website.
There is no built-in way and likely would never be. However, having the understanding of how the VCL works, you can manage to define your virtual hosts very similar to the way you love to do it in Nginx: through sites-available and sites-enabled directories. So let’s go.
How Varnish VCL works
Before we proceed to implementing Varnish virtual hosts, let’s review the most important thing about VCL – how include files work.
When you land with your new Varnish installation, you start coding from default.vcl
. However, you have to realize one thing. There is another file with very base default VCL rules which Varnish has internally, let’s call it builtin.vcl. After executing routines in our default.vcl
, Varnish will append routines from builtin.vcl
making those run after the ones in our VCL file.
The two files may have the same routines, i.e. vcl_recv
in both files, and these routines would both run on every request. In this order:
- first,
default.vcl
- last,
builtin.vcl
So the same routine, defined in last included file, will stack up and be called last.
If we include another file, say my.vcl
and define vcl_recv
in there, Varnish will run it in this order:
vcl_recv
fromdefault.vcl
vcl_recv
frommy.vcl
vcl_recv
frombuiltin.vcl
How is this multiple files inclusion any useful?
To make things flexible, Varnish would not call routines from included file, if you put return(...)
statement in procedure of the current file.
It means that we can prevent Varnish default behavior (found in builtin VCL) by running specific logic on the same routine, and we can extend things further using include files.
So if vcl_recv
had return(...)
in default.vcl
, then Varnish would only run:
vcl_recv
fromdefault.vcl
Varnish Virtual Hosts strategy
So here’s the strategy we should start with when we code our VCL for multiple hosts. Let’s review on that same routine vcl_recv
, which is most important, since it commonly have rules for filtering cookies or setting backend hints.
We assume you’re using CentOS/RHEL based paths, you can adjust accordingly for Debian derived systems.
First, create a directory holding your virtual hosts:
mkdir /etc/varnish/sites-enabled
Suppose we have a site a.example.com, it’s a WordPress blog with comments disabled. We want to have it ignore all the cookies except for the /wp-admin. Let’s create virtual host file.
nano /etc/varnish/sites-enabled/a.example.com.vcl
And paste in:
sub vcl_recv {
if (req.http.host == "a.example.com") {
# ignore all cookies on a WP site without comments (except for admin areas)
if (req.url !~ "^/wp-(login|admin)") {
unset req.http.cookie;
}
}
}
Now, another website of ours, b.example.com is so much different. It’s a Trac ticketing website and it runs using standalone Python app on a different port!
nano /etc/varnish/sites-enabled/b.example.com.vcl
And paste in:
backend trac {
.host = "127.0.0.1";
.port = "3050";
}
sub vcl_recv {
if (req.http.host == "b.example.com") {
set req.backend_hint = trac;
}
}
Another website of ours, has WordPress with Woocommerce plugin. We don’t want to cache Woocommerce pages there. So we run:
nano /etc/varnish/sites-enabled/c.example.com.vcl
And paste in:
sub vcl_recv {
if (req.http.host == "c.example.com") {
if (req.url ~ "/(cart|my-account|checkout|addons|/?add-to-cart=)") {
return (pass);
}
}
}
For every website, we use Google Analytics tracking. So let’s create handling for all the hosts in the file /etc/varnish/catch-all.vcl
with the following:
sub vcl_recv {
set req.http.Cookie = regsuball(req.http.Cookie, "_ga=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "_gat=[^;]+(; )?", "");
}
Next, we want to put everything together.
Update default.vcl
in the following way:
vcl 4.0;
...
sub vcl_recv {
....
# Normalize the header, remove the www and port
set req.http.host = regsub(req.http.host, "^www\.", "");
set req.http.host = regsub(req.http.host, ":[0-9]+", "");
}
...
# at the very bottom:
include "all-vhosts.vcl";
include "catch-all.vcl";
Create all-vhosts.vcl
file. It should contain:
include "sites-enabled/a.example.com.vcl";
include "sites-enabled/b.example.com.vcl";
include "sites-enabled/c.example.com.vcl";
Now we can reload Varnish by running service varnish reload
. Varnish will handle different websites in specific way. Our main VCL file will not be abused by dozens of if
statements and we can always disable special handling by commenting an include from all-vhosts.vcl
file and reloading again.
The basic rules of placing VCL logic this way are the following:
vcl_recv()
indefault.vcl
should contain things like normalising headers. It is crucial that this procedure does not callreturn(...)
statementvcl_recv()
in virtual host files likesites-enabled/a.example.com.vcl
should contain filtering that is specific to this domain and may optionally callreturn(...)
to halt further processing or filtering. It may also contain backend hints or rules to skip cache for specific URLsvcl_recv()
incatch-all.vcl
should contain just very common filtering, i.e. Google Analytics cookies or anything that is common for all the sites
You can start with the following sample configuration. Feel free to fork or send pull requests.