There are often cases when you need Varnish to cache the page whether it contains query parameters or not.
The most common example of this is when Google (Adwords, Analytics, etc.) adds tracking parameters to your website URLs.
Namely, ?gclid
and ?utm_
are appended to the final URL.
But this will cause Varnish to hold multiple cache entries for a single page.
The solution is quite simple. Varnish VCL can do wonders and we can actually rewrite the final URL that will reach our backend (Nginx). Simplicity is beauty: we strip the specific parameters. As a result, Varnish will cache those pages properly.
How to change your VCL to strip ?gclid
and ?utm
parameters
Add the following to your vcl_recv
procedure (between sub vcl_recv {
and closing bracket }
:
if (req.url ~ "(\?|&)(gclid|utm_[a-z]+)=") {
set req.url = regsuball(req.url, "(gclid|utm_[a-z]+)=[-_A-z0-9+()%.]+&?", "");
# remove trailing question mark and ampersand from URL
set req.url = regsub(req.url, "[?|&]+$", "");
}
You can test the main regex in question by visiting this link. I made sure that it will work in all possible cases, including the case when the parameter’s value has round brackets.
The code will strip out Google Analytics campaign variables properly. Those variables are only needed by the Javascript running on the page. Variables are utm_source
, utm_medium
, utm_campaign
, gclid
, etc.
vmod-querystring
You may want to look into using the vmod-querystring for the same purpose. It has an advantage of less memory footprint, especially in case you have long URLs.
Installing vmod-querystring
for CentOS/RHEL 7 and Varnish 4.x
sudo yum -y install https://extras.getpagespeed.com/release-latest.rpm
sudo yum -y install vmod-querystring
Installing vmod-querystring
for CentOS/RHEL 7 and Varnish 6.0.x LTS
sudo yum -y install https://extras.getpagespeed.com/release-latest.rpm
sudo yum install yum-utils
sudo yum-config-manager --enable getpagespeed-extras-varnish60
sudo yum install vmod-querystring
Installing vmod-querystring
for CentOS/RHEL 8 and Varnish 6.0.x LTS
sudo yum -y install https://extras.getpagespeed.com/release-latest.rpm
sudo yum -y install vmod-querystring
Using vmod-querystring
for stripping (marketing) URL parameters
You can get documentation for the module by running man vmod_querystring
.
But here’s a simple snippet of VCL to illustrate how you can strip marketing parameters using this VMOD:
import std;
import querystring;
sub vcl_init {
new tracking_params_filter = querystring.filter();
tracking_params_filter.add_string("gclid");
tracking_params_filter.add_glob("utm_*"); # google analytics parameters
}
sub vcl_recv {
std.log("tracking_params_filter:" + tracking_params_filter.extract(req.url, mode = keep));
set req.url = tracking_params_filter.apply(req.url);
}
As you can see, using this VMOD allows for a cleaner VCL, because if often allows you to do things without fancy regex.
But in case you have a requirement for a parameter name which can be expressed with a regex, this VMOD also has .add_regex
method.