Achieving a tenfold increase in Varnish throughput by replacing libvmod‑curl with native request restarts

We decided to implement authentication offloading in Varnish - later realizing that this had been a significant performance bottleneck. Here, we go through the “why” and “how” we did what we did, and how we changed the offloading implementation to win back performance.

OAuth request offloading

At Showmax, we use OAuth 2.0 for authenticating calls to our APIs. Our backend microservices are written in a few different languages (mostly Ruby, Python and Go). We could have implemented a library that would take the OAuth token from the incoming API request and fetch the information about the user from the OAuth backend for each language that we use. But, we wanted to make our lives easier and have consistent behavior for all of the backends - regardless of the language used. So, we implemented OAuth offloading in Varnish, which we use (among others) as our API Gateway.

OAuth offloading in OAuth

The initial implementation was straightforward. There’s a Varnish module called libvmod-curl, that provides you with cURL bindings inside your VCL. You can easily intercept the requests that need authentication in vcl_recv, call the OAuth backend to validate the token, and then enrich the request with the user’s information. Here’s a simplified example:

sub vcl_recv {
    curl.header_add("OAuth-Ip: " + req.http.X-Forwarded-For);
    curl.header_add("OAuth-Token: " + regsub(req.http.Authorization, "^Bearer ", ""));
    curl.fetch("http://localhost:6081/oauth/token_validation");

    if (curl.status() == 200) {
        set req.http.OAuth-Country = curl.header("OAuth-Country");
        set req.http.OAuth-User-Id = curl.header("OAuth-User-Id");
        set req.http.OAuth-User-Scope = curl.header("OAuth-User-Scope");
    } else if (curl.status() == 0) {
        return (synth(502, "Bad Gateway"));
    } else {
        return (synth(401, "Unauthorized"));
    }
}

This piece of code forwards the OAuth token provided in the Authorization HTTP header to the OAuth service, and then enriches the request headers with data about the user’s country, ID, and scope. When the request reaches the backend responsible for returning the response, that backend can just look at the values of the headers and does not need to fetch this information on its own. This solution is completely language-agnostic and transparent for the backend services.

Why change what’s working?

We have been using the setup described above in production for a couple of years now and we have never had any issues. But, we recently introduced sports streaming for our customers and that has completely changed the game.

Sure, we had our weekend evening spikes even when we were just a pure VOD service. But those were nothing compared to the situation when you have hordes of customers wanting to access the service just before an important rugby match is about to start.

Mild panic just before a start of a major event

When the first major sport events were published on our service, we ran into performance issues. To make sure that our platform could sustain the load, we worked on a proper load testing tooling (more on that to come in a future post). Sure enough, one of the bottlenecks it revealed was with the throughput of our Varnishes.

As mentioned above, we use Varnish as our API Gateway, meaning that it handles all the incoming API requests. Our load testing revealed that one Varnish instance with our configuration running on a machine with an Intel® Xeon® Processor E3-1275 quad-core CPU can only handle around 7,000 requests per second until it maxed out all the cores. This was not enough to satisfy our new requirements for sports streaming.

Yes, we could have scaled that layer horizontally and added more machines to the pool. But before spending additional money on infrastructure, we decided to take a look at ways to improve the throughput of the Varnish instances.

Figuring out what was wrong

Finding out that Varnish was having performance issues was not that hard. When we ran htop on the Varnish instances during the load test, we saw that every CPU was maxed out by the varnishd process at a mere 7,000 requests per seconds on one instance. Since we were expecting our own backend services to be the bottleneck, we were quite surprised with this finding. We had been expecting the throughput to be at least an order of magnitude higher.

The hard part was figuring out what exactly was causing Varnish to be so slow. We unleashed strace on the varnishd process, but we did not learn much except that Varnish kept spawning and destroying a lot of threads. We debated over lunch and found this suspicious.

Varnish is supposed to use a thread pool with workers for processing the requests. So why would it be spawning new threads? And then it hit us - it was caused by our OAuth offloading. Using the libvmod-curl module on the hot path meant that, for each request, Varnish had to spawn a new thread to process the requests to the OAuth backend.

We tested our assumptions by replacing the curl.fetch() call with returning a synthetic response (using the synth() function). The throughput immediately jumped all the way to 130,000 requests per second - a 20x improvement in throughput. Based on this experiment, we concluded that using blocking operations inside the VCL on the hot path was probably a very bad idea. It was not the libvmod-vcl module to be blamed, it was our architecture.

To verify this hypothesis, we tried doing a very short sleep before returning the synthetic response mentioned above, using the following code:


C{
   #include <unistd.h>
}C

sub vcl_recv {
    C{
        usleep(15 * 1000);
    }C
    return (synth(200, "OK"));
}

As we expected, the performance dropped dramatically - down to roughly 8,000 requests per second. Since the latency of our OAuth service can spike quite high (up to 100ms) under heavy load, we introduced an architecture capable of reliably killing our Varnish instances. Not exactly what we wanted.

Request restarts FTW

Knowing that using a blocking module on the hot path has a significant negative impact on performance, we decided to try out a native Varnish way of achieving the same result - restarting the requests internally.

Here’s a crude example of how you might rewrite the previous VCL without using the libvmod_curl:

sub vcl_recv {
    if (req.restarts == 0) {
        set req.http.Orig-Method = req.method;
        set req.http.Orig-Url = req.url;

        set req.http.Oauth-Ip = req.http.X-Forwarded-For;
        set req.http.Oauth-Token = regsub(req.http.Authorization, "^Bearer ", "");

        set req.method = "GET";
        set req.url = "/oauth/token_validation";
        set req.backend_hint = oauth.backend();

        return (hash);
    }
}

sub vcl_deliver {
    if (req.restarts == 0) {
        if (resp.status == 200) {
            set req.method = req.http.Orig-Method;
            set req.url = req.http.Orig-Url;

            unset req.http.Orig-Method;
            unset req.http.Orig-Url;
            unset req.http.Oauth-Ip;
            unset req.http.Oauth-Token;

            set req.http.OAuth-Country = resp.http.OAuth-Country;
            set req.http.OAuth-User-Id = resp.http.OAuth-User-Id;
            set req.http.OAuth-User-Scope = resp.http.OAuth-User-Scope;

            return (restart);
        } else if (resp.status == 503) {
            return (synth(502, "Bad Gateway"));
        } else {
            return (synth(401, "Unauthorized"));
        }
    }
}

sub vcl_backend_fetch {
    if (var.get("calling_oauth") == "true") {
        unset bereq.body;
    }
}

With this configuration, all requests will initially hit the OAuth backend and enrich the request with the information about the user. Once that’s done, the request will be restarted, the information about the original URL and method will be restored, and the request will be sent to the appropriate backend service.

Obviously in the real world you will want to implement some sanity check, the possibility to skip the OAuth offloading for specific requests, and so on. But that will all depend on your environment. We also suggest to use the awesome vmod_var for passing variables around instead of spamming the request HTTP headers as shown in the example. Finally, don’t forget to adjust your vcl_hash accordingly to make sure that your OAuth requests will be properly cached.

Putting it all together

Even though changing the way we call the OAuth backend from Varnish seemed difficult initially, it went surprisingly well and we were able to prepare, test, and deploy the changes in a matter of days. We ran into a couple of issues, but they were mostly specific to our environment (e.g. we initially did not change the value of the Accept header when calling the OAuth service, which caused a bunch of HTTP 400s for requests which were expecting a binary/octet-stream response, etc.) But overall, it was a smooth transition.

After putting all of the changes together and deploying them to our staging environment, we ran wrk against one of the Varnish instances, full of expectations. It was quite a relief to find that we had increased the throughput of the single instance from 7,000 to 70,000 requests per second. And most of all, Varnish was no longer the bottleneck. It was the backend service that it was talking to.

Conclusion

When we launched sports streaming on Showmax, it brought tons of new and interesting challenges - almost all related to performance. When we found out that we had issues with Varnish throughput, we could have just bought more machines and been done with it. But that’s not how we operate.

We were determined to understand what the problem was and to fix it properly. That led us to replacing libvmod-curl with native request restarts for our OAuth offloading. Without spending any extra money on the infrastructure or changing the interface for the backends, we were able to improve the throughput by a factor of 10.

If you enjoy solving interesting problems like this one, come and join us, we’re hiring.

Please check the original version of this article at