This is the first in a three part series on sinkholing, targeted attacks, and how to defend your platform.
As our user base (and our popularity) grows, so too does the threat of sudden attacks against our platform. We have encountered many attempts of attackers to brute-force their way into Showmax by trying to guess password/username combinations and promo codes, or accessing random URLs in hope of finding something to abuse. To face this challenge, we started a project called Sinkholing, and it actually works just like it sounds - the goal is to allow us to send all unwanted requests into a virtual sinkhole. For the first entry in our series on how and why we adopted sinkholing, I’ll go through its high-level architecture and the various components that make it up.
The general architecture of the Sinkholing project has three main parts:
- A mechanism that decides what to block
- Storage for currently-banned offenders
- A tool that will perform the actual blocking of the requests
Before we get to the technical stuff and implementation details, here’s a bit about our platform and infrastructure.
Every request that hits our platform is initially handled by HAProxy on our frontend machines. The HAProxy does some basic preprocessing and then sends the requests to the welcoming hands of Varnish, which takes care of caching and routing the traffic to our backend microservices. Each request on the way is logged in real-time. We use Elasticsearch to store our logs, and as the source of truth for the banning tool.
How to Dig a Sinkhole
The Great Sinkholing Barrier
At the start, we had to decide between two options for blocking the rogue requests: iptables (light and fast) or HAProxy (robust). HAProxy has multiple advantages over the iptables, including its having more features from the matching point of view. That means that, going forward, we can match offenders not only by their IP (as is the case with iptables), but also by other arbitrary parameters like User-Agent header or request body. Plus, HAProxy is extendable by custom Lua scripts, and it can be easily adjusted to log both accepted and blocked traffic into our logging infrastructure. In the future, we can also configure HAProxy to seamlessly redirect banned clients to a Sorry page where they can unban themselves after passing a CAPTCHA-like challenge.
We weighed the pros and cons and decided to go with HAProxy. For every request, the HAProxy now checks its datastore to determine whether the request should pass or not. If the request should be blocked, the HAProxy responds with an error message explaining to the user what happened.
Where to find the truth?
That’s all great, but we needed to figure out how to identify the offenders. As already mentioned, all of our logs are shipped to the Elasticsearch, thus we are equipped with very powerful tools for identifying our attackers. Using Kibana and its Lucene Query Syntax, it becomes quite easy. Would you like to find somebody who is trying to guess a voucher code? No problem, just search for the right
http_code range. Looking for password guessers? Again, not a problem. All the data is there.
Doing all of this searching manually is slow, however, and not really sustainable. What we needed was some kind of automatic search tool that would alert us when something undesirable happens. The ElastAlert (by Yelp) provided the answer - it’s the perfect tool for the job.
For those, who never heard of ElastAlert, it is a service that regularly queries Elasticsearch with a user-defined query, and then decides whether to raise an alert or not. Essentially, the ElastAlert ties the whole Sinkholing process together. It is very flexible and can be configured to alert on spikes (unusually high/low number of events matching given query), or if certain threshold has been reached. For example, we can set it to alert us if
“an IP made more than 10 requests on our login endpoint within the last minute that ended up with the 403 error code” - basically, if someone is trying to guess passwords of our customers. That is pretty cool, right? One can either use predefined alerts (email, messenger, etc) or create a custom alerter. We used this feature to write our own alerter that processes Sinkholing alerts and exports important data, like the IP of the offender, to a backend storage for HAProxy to take.
Find it, distribute it, ban it!
Because our platform is distributed all over the world and we use more than 15 instances of HAProxy on our frontend machines, we need to make sure their lists of banned IPs is synchronized. Since the amount of data required for banning is quite low and we don’t need any features of classical relational databases, a simple key-value store (such as Redis, is an ideal candidate for data storage. One more plus: it elegantly solves automatic ban expiration via key expiration.
As it turns out, Sinkholing has proven itself to be a pretty useful tool. It protects our infrastructure from excessive usage and prevents hackers from brute-forcing their way into Showmax. In the last month we applied almost 19.000 bans and blocked more than 1.5M requests that came from 86 different countries. Thanks to Sinkholing, we were also able to satisfy some HackerOne (a hacking as a service program, which Showmax is a part of) complaints about endpoints that were not rate-limited.
That’s it for now. In the next part of this series we’ll dive into the details of the architecture and implementation. Stay tuned.