Setting up Spark Streaming - Part II

In this blog, I’m going to show you how I reduced batch processing time from something over 126 seconds, to just 17 seconds.

You may want to read Part 1 of this blog, where I wrote about some of the challenges we had integrating Spark Streaming into the Showmax platform and setting it up for production deployment.


…but first, a quick detour for database tuning

Back in production, running Spark streaming in three minute batches, it was clear that after just a few hours the processing time of each batch began taking longer than three minutes.

Uh oh.

In effect, I’d swapped a large batch process for a potentially infinitely growing queue of smaller batches, which would need more and more time to process until it got out of control. If it’s not fixed, we would end up worse off than at the beginning.

I set about looking at Spark parameters tuning, or maybe profiling the code to find potential optimizations. I spent lots of time learning about and tuning parameters. After a frustrating few days, I finally looked at the database, and found I hadn’t added an index to the session_id column. Doh!

Graph showing performance impact of db index

With the index in the database the batches were processed in slightly over two minutes. Problem solved, and back in production-worthy performance.

Still, in the back of my head I was worried that even two minute batch processing is too slow, especially considering how fast our data requirements are increasing, so I refocused on optimization. It turns out that the time I spent investigating optimization, before fixing the database indexing, taught me a lot that I could use to improve performance.

A number of these optimization steps were easy to implement and resulted in huge gains. You might try them yourself.

Dropping from 126 to just 17 Seconds

Repartitioning streams

In Spark Streaming each stream is modeled as a series of RDDs (Resilient Distributed Datasets). Each RDD in is then split into multiple partitions which are distributed to the machines of the cluster. Spark executes the computations on the partitions in parallel. By default Spark Streaming receiver creates five partitions per second per stream. Before starting the computation, the number of partitions per stream can be easily changed like this:

  JavaReceiverInputDStream<T> stream = jssc.receiverStream(new Receiver());
  JavaDStream<T> repartitionedStream = stream.repartition(3);

This command transforms each RDD of the stream into new RDD with the given number of partitions before further processing. The repartition of the stream is done in order to minimize the data movement. To get more detailed information about repartitioning, I recommend this article from Matthew Powers.

For Showmax this simple change to three partitions cut batch processing in half.

Spreading receivers to cluster

Another way of increasing processing speed was to spread the receivers to different nodes of the cluster. In our Showmax setup we use two receivers, with each receiver consuming the messages from a different queue. Simply by making sure that the receivers are running on different machines in the cluster, we slightly reduced the processing time and saved some unnecessary data transfers. For configuring the preferred location of the receiver you have to simply override preferredLocation method of the receiver:

    @Override
    public scala.Option<String> preferredLocation() {
        String myPrefferedLocation = "host01.showmax.com";
        return Option.apply(myPrefferedLocation);
    }

Changing the number of cores per executor

In the original Spark Streaming setup, I was using three spark executors. Each executor was allocated three CPU cores. By switching the configuration to allocate two cores to each executor and adding one extra core, we now have five executors with two cores each.

This cut our batch processing time in half again.

To see if the performance improvement was caused by increasing the number of executors or by adding the one extra core, I conducted another experiment where I tried to reduce the number of cores per executor to one (keeping the 10 cores in total). The actual performance got worse in this setup and was almost the same as with three cores per executor. So it is worthwhile to do experiments to see which number is best for your system.

Performance tuning results

Graph showing performance evolution

The graph above tells the story. Before tuning, batch processing took 126 seconds. By the time I’d repartitioned, spread the receivers and changed the cores per executor, I’d got it down to just 17 seconds.

Overall, the result is better than what I hoped for with so little effort.

Future tuning possibilities

There is definitely place for future performance tuning. In my optimizations I focused mainly on the low hanging fruit of simple changes that have potentially huge impacts.

Future performance gain can be achieved if you add multiple receivers per queue. Yet another improvement may lie in finding better configuration of the assigned number of executors and cores.

I’m looking forward to more experiments to find the ideal number of receivers and ideal executors/cores configuration.

Goal achieved

Now, after all our tinkering, we’ve got close to realtime data. We can monitor how our systems perform almost immediately, responding quickly to solve problems. This is already making my colleagues happier and providing better service to our customers. All the trouble of figuring things out has been more than worth the pain.

Importantly, we can scale this solution. Our previous installation required significant resources, especially memory, as most of the data from the previous 24 hours were in memory at the same time.

Spark Streaming reduces our memory needs considerably, (now it’s enough to have just 4GB allocated for each executor). So we can now run multiple analyses simultaneously with fewer resources. For comparison the same analysis with batch setup required 20 cores and 180G RAM, with Spark Streaming we are using 10 cores and 20G RAM. We could probably do even better but this is good enough for us.

Instead of having to scale according to the absolute volume of logs per day, we can use just a few minutes as a window. There’s plenty of performance to spare, and we don’t need to worry about keeping up with continued fast growth. Spark streaming is ready for action.

Final thoughts

Yet, in the end, having to solve these problems of my own creation gave me the knowledge that allowed me to tune our setup and improve performance dramatically.

When your rosy forecast turns dark, you make a newbie mistake and can’t see clearly, just keep going. You’ll learn in the process, and find solutions to other problems along the way to fixing your first mistakes.

Please check the original version of this article at