How Showmax got into live streaming

Part IV - Why so serious?

In the third part of the this series, we explained how we trim our live video stream, troubleshoot the process (there was a lot of that!), and all of the other intricacies involved.

In this post - the fourth and final in our live streaming series - we focus on various small, often comical, issues that gave us more than a few gray hairs.

On cappuccinos and odometers

With the Cat and the Clock monitoring our live stream, all we had to do on a beautiful Friday morning was fix our morning cappuccinos and check that the stream was up and running.

Unfortunately for us, we never got a chance to drink those cappuccinos. The stream started misbehaving after around 26 hours of runtime.

Looking for answers to this puzzle, we were happy (and rather proud) to find the answer from right here in the Czech Republic. At Masaryk University in Brno, an informatics student named Roman Kollar posted his bachelor’s thesis on the subject of making ffmpeg encoding more robust. He wrote:

Using the -copyts option in ffmpeg, it starts to produce invalid output after 26.51 hours. The logs show that the timestamps are non-monotonous.

[ mpegts @ 0 x38e3a40 ] Non−monotonous DTS in output stream 0:0;
      previous: 858993120 , current: 208; changing to 8589931201.

This may result in incorrect timestamps in the output file. This happens because of a timestamp overflow in the MPEGTS (section 3.3) demuxer. Since timestamps in MPEG-TS have 33 bits and 16 5. KNOWN PROBLEMS a frequency of 90 kHz, they overflow exactly after 2 33 90000 ≈ 95443.72 s ≈ 26.51 h.

Wikipedia corroborates this:

To enable a decoder to present synchronized content, such as audio tracks matching the associated video, at least once each 100 ms a program clock reference (PCR) is transmitted in the adaptation field of an MPEG-2 transport stream packet. The PID with the PCR for an MPEG-2 program is identified by the pcr_pid value in the associated PMT. The value of the PCR, when properly used, is employed to generate a system_timing_clock in the decoder. The system time clock (STC) decoder, when properly implemented, provides a highly accurate time base that is used to synchronize audio and video elementary streams. Timing in MPEG2 references this clock. For example, the presentation time stamp (PTS) is intended to be relative to the PCR. The first 33 bits are based on a 90 kHz clock. The last 9 are based on a 27 MHz clock. The maximum jitter permitted for the PCR is +/- 500 ns.

Basically, like an old mechanical odometer on your car, it rolls over to zero at 26:51. When this happens, things like players or downstream devices might not like it.

Odometer

Source: https://www.pexels.com/photo/auto-automobile-automotive-blur-533685/

For more in-depth detail on MPEG TS, you can read the great Guide to MPEG Fundamentals and Protocol Analysis from Tektronix. If you’re interested in this topic - which, if you’ve made it this far you probably are - we highly recommend you to read it!

In the real world, we don’t actually live stream for more than 90 minutes so, in production, we wouldn’t encounter this problem. It’s a testing issue. In order to have an endless streaming in staging for test purposes, we implemented a simple workaround - the stream gets restarted every 24 hours. Since it’s only for testing purposes, we don’t really mind the little discontinuity that results, but it sure is interesting to learn about the limits of your system.

On lip sync and dancing; aka “our own Bollywood”

“Has anyone made sure that lip sync works?”

Embarrassed silence. This simple question had not even occurred to us up to this point. Once again, our cappuccinos were destined to turn into iced coffee.

Along with the Cat and the Clock development video stream, we provided a beep every second using ffmpeg’s audio source sine. But, because this was not tied to any movement in the video, it did not tell us if lip sync worked properly.

Turning the camera to ourselves was like watching an old Kung Fu movie where nobody’s mouth matches the dialogue. We had to continue testing the live synchronization, so we talked, danced, and sang in front of the streaming camera. By capturing our own live sound and movement, we could see whether or not our audio and video streaming components remained synchronized in the player.

As we tweaked the FFmpeg parameters and searched for answers to our lip-sync problems, we could see that our video and audio resampling, through the -r and -ar ffmpeg flags, was actually the cause of the problem. We wanted to preserve the original experience as much as possible, and wanted to avoid introducing any stretching/squeezing to fit the the timestamps.

This is one of those “less is more” sorts of situations. We just needed to leave the stream rates on our encoder alone instead of adjusting them on the fly. The solution is to configure the source to deliver the audio and video rate as required by the client devices.

@classmethod
def _encode_aac_h264(cls, **params):
   offset_base = 10000000
   return [
       # Audio Section:
       # DO NOT use -ar {RATE} and let FFmpeg use the original rate for
       # proper Lip-Sync.
       '-map', '0:a',
       '-absf', 'aac_adtstoasc',
       '-c:a', 'libfdk_aac',
       '-cutoff', '16000',
       '-b:a', str(params['audio']['bitrate']),
       # Video Section:
       # DO NOT use -r {RATE}, -fflags +genpts, -vsync crf and let FFmpeg
       # use the original rate for proper Lip-Sync.
       '-movflags', 'isml+frag_keyframe',
       '-ism_offset', str(int(time.time()) * offset_base),
       ...

Problem solved. We learned two important lessons:

First, remember to black box test your simplest use cases. We all love to focus on interesting technical details, and that makes it easier to forget the human experience of actually watching videos. If something so basic as the lip syncing does not work, all our other work is wasted.

Second, our Showmax engineers are surprisingly good at song and dance.

Spity singing

On the brief history of time

Ahhh….Time for morning cappuccino, and for a check of the development live stream.

Hmm, the second hand of the clock has stopped moving.

What the…?

Encoder logs showed the stream rolling merrily along, with no pileup in the logs, and no activation of self-recovery. Everything checked out as normal, except the second hand of the clock in our live video stream was clearly not moving. How could our system monitoring be so wrong?

Just then, Jiri Brunclik picked up the clock, held it to his ear, shook it a few times, and grinned. “It’s broken”.

Improvised test stage

This live streaming series is the collective result of the Content Management System (CMS) team at Showmax. I would like to personally thank to Jiří Brunclík, Peter Lisák, and Jan Panáček for all their contributions.

If you enjoyed reading about what we do, do not hesitate and send us an email at geeks@showmax.com.

Please check the original version of this article at