The Showmax Engineering story, Part II.

Getting content to the customer as efficiently as possible

On our blog we usually discuss our work in pretty in-depth terms, or terms that we set on our own. At times, we may speak a language that is unfamiliar, or even unintelligible, to some people. So, we decided to put together a vocabulary resource to help people understand the magic of the SVOD business. This is the second part in this series, again focused on Media Engineering, one of the most important areas of our work. If you missed the first one, you can read it here.

Advanced terms in media encoding

To encode video, you can use different approaches for bitrate control, depending on the encoding scenarios (archiving, streaming, or storing on devices with limited capacity like CDs or DVDs). Constant bitrate (CBR) is a type of encoding with a specified target bitrate value where the encoder uses adequate compression to produce a stream with the requested bitrate. It can waste bits on scenes with low complexity, and it is not suitable for archiving. But, it’s useful for streaming when you need to predict and control the file size.

The inverse approach is called variable bitrate (VBR), a process of encoding in which you define the resulting quality. The encoder tries to achieve it using the lowest amount of bits possible, within some defined constraints. Average bitrate (ABR) is a variant of VBR, where the goal is to create files with predictable long-term average bitrate. This is achieved with 2-pass encoding, where the first pass is used to analyze the content, and the second one for actual encoding based on the previous analysis. It’s best for devices with limited storage space.

Another variant of VBR is Constant Quality (CQ) / Constant Rate Factor (CRF), which, as the name suggests, produces the stream with full-length constant quality and controlled file size. It is very useful for content archiving, as it preserves content in the highest quality.

For streaming, it is important to have predictable bitrate and peaks within a given constraints. This is possible thanks to the Constrained Encoding (VBV) method, which is applicable for 2-pass VBR or CRF (also called capped CRF). It can control bitrate peaks and variability range. The advantage of capped CRF is that it requires only 1-pass. However, it’s important to find a reasonable CRF value in order to avoid hitting the maximum bitrate too often, and to avoid quality loss.

All of these methods are related to the encoding process. The decoding process is always the same regardless of bit allocation in the stream. Most of the codecs and encoders (free ones like FFmpeg or commercial solutions like AWS Elementals) support all of the above-mentioned methods.

Bonus terms

Sintel and Big Buck Bunny are short, animated, “open movies” from the Blender Foundation. Thanks to their licensing, they are very popular within the video engineer community for testing and to compare encoding configurations. Note: watching the same video or scene over and over may have a dark side, check out this lightning talk.

Sintel and Big Buck Bunny Sintel and Big Buck Bunny posters (CC BY 3.0).

A mezzanine file, also known as an intermediate file, is a compressed version of the original source video without any perceivable quality loss. It has a form of standardized file format in terms of codecs, container, bitrate, FPS and more, and it’s both faster, and requires less bandwidth, to process mezzanine files into transcoded files. Movie files may be ingested in mezzanine format as part of the deal with studios, or you may create mezzanine files yourself as a preliminary step in your encoding workflow.

Basic keywords

Progressive download is a technique of streaming video content used before Adaptive bitrate streaming. The file is transferred from server to client via HTTP and can be played before the download is complete if the required metadata is at the beginning of the media file. It’s not a stretch to say that fetching segments of ABR streaming is based on progressive download, so ABR streaming is just a sequence of progressive downloads.

Adaptive bitrate streaming (ABR) is video streaming over HTTP, where players are provided with a list of streams in various codecs, resolutions, and bitrates, but containing the same content. The files are segmented into short, usually 2 - 10 seconds long parts (also called segments or chunks), that are described in the manifest/playlist file which the player uses for playback.

As a user, you’ve surely noticed that the quality of streams is usually worse at playback start but improves after 2-10 seconds. The client player downloads the segments based on current network state and device capabilities, continuously monitoring the state of the network, and may switch between streams to keep playback running smoothly and without rebuffering (read more about our second stream backup strategy. The player often does not have enough information at the beginning of the playback session, so it starts with lower bitrates. But, this depends on the player’s configuration, as individual segments must be in such a format that they can be played standalone.

Bitrate, or encoding ladder, is the set of profiles (codec, resolutions, bitrate, etc.), or in other words, the set of encoding configurations used to create multiple files for Adaptive bitrate streaming to make video available on a range of devices using various connection bandwidth. The number of, and spaces between, steps of the ladder must be chosen based on the target devices and network conditions of the target market to be optimal for delivery.

AVC HEVC
144p 75kbps 144p 75kbps
216p 110kbps 216p 110kbps
288p 200kbps 288p 200kbps
360p 400kbps 360p 300kbps
432p 800kbps 432p 540kbps
576p 1400kbps 576p 1000kbps
720p 2800kbps 720p 1870kbps
1080p 5000kbps 1080p 3340kbps

Showmax bitrate ladder - our choice of resolution and bitrate, focused on users with lower connectivity. We are using capped CRF mode, so bitrates are upper caps.

For encoding within the bitrate ladder, two methods - CBR or VBR - and a set of constant parameters were used for a decade. The parameters were chosen based on the content set as a one-size-fits-all solution. This changed in late 2015, when Netflix introduced per-Title encoding. This approach adjusts the bitrate ladder to the actual content asset to achieve the highest possible quality. It was quickly adopted industry-wide in various ways, either as parameter optimization techniques, or by changing the number of rungs in the ladder and their resolutions. For example, if two consecutive rungs are comparable in quality, the one with higher bitrate can be dropped. The follow-up step is shot-based encoding, again introduced by Netflix, where encoding parameters are configured per the actual scene rather than the whole movie asset.

HTTP Live Streaming (HLS) is a variant of HTTP-based adaptive bitrate streaming protocol introduced by Apple in 2009. It uses an extended M3U playlist (master playlist) to serve a list of available streams in the form of links to another extended M3u8 playlist (segment playlist) containing links to actual segments. It supports MPEG-TS or fragmented MP4 as file containers for media segments. HLS is used mainly by all Apple devices and the Safari browser, but is also supported on Android devices.

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=150000,RESOLUTION=416x234,CODECS="avc1.42e00a,mp4a.40.2"
http://example.com/low/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=240000,RESOLUTION=416x234,CODECS="avc1.42e00a,mp4a.40.2"
http://example.com/lo_mid/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=440000,RESOLUTION=416x234,CODECS="avc1.42e00a,mp4a.40.2"
http://example.com/hi_mid/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=640000,RESOLUTION=640x360,CODECS="avc1.42e00a,mp4a.40.2"
http://example.com/high/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=64000,CODECS="mp4a.40.5"
http://example.com/audio/index.m3u8

Example of playlist with bitrate variants.

#EXTM3U
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:2

#EXTINF:2,
seq-0.ts
#EXTINF:2,
seq-1.ts
#EXTINF:2,
seq-2.ts
#EXTINF:2,
seq-3.ts
#EXTINF:2,
seq-4.ts
#EXTINF:2,
seq-5.ts
#EXT-X-ENDLIST

Example of variant playlists with actual segments.

Dynamic Adaptive Streaming over HTTP (DASH), also known as MPEG-DASH, is another variant of HTTP-based adaptive bitrate streaming protocol. It is similar to Apple’s HLS solution, but uses an XML-based manifest to describe available streams. Segments are served in fragmented MP4 containers. It’s supported by a wide range of web browsers, Smart TVs, Android devices, and more.

<?xml version="1.0" encoding="utf-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" type="static" minBufferTime="PT1.5S" mediaPresentationDuration="PT0H10M0.00S" profiles="urn:mpeg:dash:profile:full:2011">
    <Period start="PT0S" duration="PT0H10M0.00S">
        <AdaptationSet segmentAlignment="true" bitstreamSwitching="true" maxWidth="1920" maxHeight="1080" maxFrameRate="25" par="16:9">
            <ContentComponent id="1" contentType="video"/>
            <SegmentTemplate timescale="1000" duration="10000" media="video-$RepresentationID$$Number$.m4s" startNumber="1" initialization="video_init.mp4"/>
            <Representation id="low" mimeType="video/mp4" codecs="avc1.64000d" width="320" height="180" frameRate="25" sar="1:1" startWithSAP="4" bandwidth="67001"> </Representation>
            <Representation id="mid" mimeType="video/mp4" codecs="avc1.64001e" width="640" height="360" frameRate="25" sar="1:1" startWithSAP="4" bandwidth="228204"> </Representation>
            <Representation id="hd" mimeType="video/mp4" codecs="avc1.64001f" width="1280" height="720" frameRate="25" sar="1:1" startWithSAP="4" bandwidth="580017"> </Representation>
            <Representation id="full" mimeType="video/mp4" codecs="avc1.640028" width="1920" height="1080" frameRate="25" sar="1:1" startWithSAP="4" bandwidth="899712"> </Representation>
        </AdaptationSet>
        <AdaptationSet segmentAlignment="true" bitstreamSwitching="true" lang="und">
            <AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="1"/>
            <ContentComponent id="1" contentType="audio"/>
            <SegmentTemplate timescale="1000" duration="9520" media="audio$RepresentationID$$Number$.m4s" startNumber="1" initialization="audio_init.mp4"/>
            <Representation id="aaclc_low" mimeType="audio/mp4" codecs="mp4a.40.02" audioSamplingRate="44100" startWithSAP="1" bandwidth="19042"> </Representation>
            <Representation id="aaclc_high" mimeType="audio/mp4" codecs="mp4a.40.02" audioSamplingRate="44100" startWithSAP="1" bandwidth="66341"> </Representation>
        </AdaptationSet>
    </Period>
</MPD>

Example of DASH manifest.

SmoothStreaming is another variant of HTTP-based adaptive bitrate streaming protocol developed by Microsoft as an IIS Media Services extension. It uses an XML-based manifest enlisting available streams. The principle is the same as HLS or DASH, differing only in Manifest syntax and used media file containers. It is supported by devices implementing Streaming Client software development kits for Windows, and a Smooth Streaming Porting Kit for other operating systems.

<?xml version="1.0" encoding="utf-8"?>
<SmoothStreamingMedia MajorVersion="2" MinorVersion="1" Duration="1209510000">
    <StreamIndex Type="video" Name="video" Chunks="61" QualityLevels="8" MaxWidth="1280" MaxHeight="720" DisplayWidth="1280" DisplayHeight="720" Url="QualityLevels({bitrate})/Fragments(video={start time})">
        <QualityLevel Index="0" Bitrate="2962000" FourCC="AVC1" MaxWidth="1280" MaxHeight="720" CodecPrivateData="..."/>
        <QualityLevel Index="1" Bitrate="2056000" FourCC="AVC1" MaxWidth="992" MaxHeight="560" CodecPrivateData="..."/>
        <QualityLevel Index="2" Bitrate="1427000" FourCC="AVC1" MaxWidth="768" MaxHeight="432" CodecPrivateData="..."/>
        <QualityLevel Index="3" Bitrate="991000" FourCC="AVC1" MaxWidth="592" MaxHeight="332" CodecPrivateData="..."/>
        <c d="20020000"/>
        ...
        <c d="6670001"/>
    </StreamIndex>
    <StreamIndex Type="audio" Index="0" Name="audio" Chunks="61" QualityLevels="1" Url="QualityLevels({bitrate})/Fragments(audio={start time})">
        <QualityLevel FourCC="AACL" Bitrate="128000" SamplingRate="44100" Channels="2" BitsPerSample="16" PacketSize="4" AudioTag="255" CodecPrivateData="1210"/>
        <c d="20201360"/>
        ...
        <c d="8126985"/>
    </StreamIndex>
</SmoothStreamingMedia>

Example of SmoothStreaming manifest.

The files for these streaming protocols usually use the same encoded content (codec), but the container differs. To produce streaming files for various streaming protocols, the tool called packager may come in handy. It can remux/repackage the encoded source file into the desired streaming protocol. It also often supports DRM encryption or on-the-fly packaging, useful for live streaming.

A popular free tool is Shaka Packager. Here at Showmax, we use a commercial tool called Unified Origin or Unified Packager. Note that encoders can also produce files in streaming protocol formats, but as implied above, packagers can prevent you from wasting time and storage due to multiple encoding.

We hope you find this resource useful, and encourage you to read (or re-read) some of our other posts that may now be a bit more clear. Pushing video bitrate to the limit explains the challenges of African infrastructure we face, and how we manage to stream the video at the top possible quality while saving customer data. Another related post, Keep the stream live, despite any infrastructure failure, should also be a bit more accessible after going through these terms.

Enjoy!

Please check the original version of this article at