The Patented Visual Quality Measure that was Designed to Drive Higher Compression Efficiency

Posted on September 11, 2019by Tamar Shoham

At the heart of Beamr’s closed-loop content-adaptive encoding solution (CABR) is a patented quality measure. This measure compares the perceptual quality of each candidate encoded frame to the initial encoded frame. The quality measure guarantees that when the bitrate is reduced the perceptual quality of the target encode is preserved. In contrast to general video quality measures – which aim to quantify any difference between video streams resulting from bit errors, noise, blurring, change of resolution, etc. – Beamr’s quality measure was developed for a very specific task. It reliably and quickly quantifies the perceptual quality loss introduced in a video frame due to artifacts of block-based video encoding. In this blog post, we present the components of our patented video quality measure, as shown in Figure 1.

Pre-analysis

Before determining the quality of an encoded frame, the quality measure component performs some pre-analysis on the source and initial encoded frames to extract data used in the quality measure calculation and to collect information used to configure the quality measure. The analysis consists of two parts, where part I of the analysis is performed on the source frame and part II of the analysis is performed on an initial encoded frame.

beamr closed loop perceptual quality measure functional block diagram

Figure 1. A block diagram of the video quality measure used in Beamr’s CABR engine

The goal of part I of the pre-analysis is to characterize the content, the frame, and areas of interest within a given frame. In this phase, we can determine whether the frame has skin and face areas, rich chroma information typical of 3D animation, or highly localized movement with static background, found in cell animation content. The algorithms used are designed for low CPU overhead. For example, our facial detection algorithm applies a full detection mechanism at scene changes and a unique, low complexity adaptive-tracking mechanism in other frames. For skin detection, we use an AdaBoost classifier, which we trained on a marked dataset we created. The classifier uses YUV pixel values and 4×4 Luma variance values input. At this stage, we also calculate the edge map which we employ in the Edge-Loss-Factor score component described below.

Part II of the pre-analysis is used to analyze the characteristics of the frame after the initial encoding. In this phase, we may determine if the frame has grain and estimate the amount of grain, and use it to configure the quality measure calculation. We also collect information about the complexity of each block, which is indicated, for example, by the bit usage and block quantization level used to encode each block. At this stage, we also calculate the density of local textures in each block or area of the frame, which is used for the texture preservation score component described below.

Quality Measure Process and Components

The quality measure evaluates the quality of a target frame when compared to a reference frame. In the context of CABR, the reference frame is the initial encoded frame and the target frame is the candidate frame of a specific iteration. After performing the two phases of the pre-analysis, we proceed to the actual quality measure calculation, which is described next.

Tiling

After completing the two phases of the pre-analysis stage, each of the reference and target frames is partitioned into corresponding tiles. The location and dimensions of these tiles are adapted according to the frame resolution and other frame characteristics. For example, we will use smaller tiles in a frame which has highly localized motion. Tiles are also sometimes partitioned further into sub-tiles, for at least some of the quality measure components. A quality metric score is calculated for each tile, and these per-tile scores are perceptually pooled to obtain a frame quality score.

The quality score for each tile is calculated as a weighted geometric average of the values calculated for each quality measure component. The components include a local similarity component which determines a pixel-wise difference, an added artifactual edges component, a texture distortion component, an edge loss factor, and a temporal component. We now provide a brief review of these five components of Beamr’s quality measure.

Local Similarity

The local similarity component evaluates the level of similarity between pixels at the same position in the reference and target tiles. This component is somewhat similar to PSNR, but uses adaptive sub-tiling, pooling, and thresholding, to provide results that are more perceptually oriented than regular PSNR. In some cases, such as when pre-analysis determined that the frame contains rich chroma content, the calculation of pixel similarity for chroma planes is also included in this component, but in most cases, only luma is used. For each sub-tile, regular PSNR is calculated. To give greater weight to low-quality sub-tiles, which are located in tiles that have far superior quality, we perform the pooling using only values which are below a threshold that depends on the lowest sub-tile PSNR values. This can happen when there are changes only in a small area, even just a few pixels. We then scale the pooled value using a factor which is adapted according to the level of brightness in the tile, since distortion in dark areas is more perceptually disturbing than in bright areas. Finally, we clip the local similarity component score so that it lies in the range [0,1], where 1 indicates that the target and reference tiles are perceptually identical.

Added Artifactual Edges (AAE)

The Added Artifactual Edges score component evaluates additional blockiness introduced in the target tile compared to reference tile. Blockiness in video coding is a well-known artifact introduced by the independent encoding done on each block. Many previous attempts have been made to avoid this blockiness artifact, mainly using de-blocking filters which are integral parts of modern video encoders such as AVC and HEVC. However, our focus in the AAE component is to quantify the extent of this artifact rather than eliminate it. Since we are interested only in the added blockiness in the target frame relative to the reference frame, we evaluate this component of the quality measure on the difference between the target and reference frames. For each horizontal and vertical coding block boundary in the difference block, we evaluate the change or gradient across the coding block border and compare it to the local gradient within the coding block on either side. For example, for AVC encoding this is done along the 16×16 grid of the full-frame. We apply soft thresholding to the blockiness value, using adaptive threshold values, adapted according to information from the pre-analysis stage. For example, in an area recognized as skin, where human vision is more sensitive to artifacts, we will use tighter thresholds so that mild blockiness artifacts are more heavily penalized. These calculations result in an AAE scores map, containing values in the range of [0, 1] for each horizontal and vertical block border point. We average the values per block border, and then average these per-block-border average values, excluding or giving low weight to block borders with no added blockiness. The value is then scaled according to the percent of extremely disturbing blockiness artifacts, i.e. cases where the original blockiness value prior to thresholding was very high, and finally is clipped to the range [0,1] with 1 indicating no added artifactual edges in the target tile relative to the reference tile.

Texture Distortion

The texture distortion score component quantifies how well texture is preserved in the target tile. Most block-based codecs, including AVC and HEVC, use a frequency transform such as DCT and perform quantization of the transform coefficients, usually applying more aggressive quantization to the high-frequency components. This can cause two different textural artifacts. The first artifact is a loss of texture detail, or over-smoothing, due to loss of energy in high-frequency coefficients. The second artifact is known as “ringing,” and is characterized by the noise around edges or sharp changes in the image. Both these artifacts cause a change in the local variance of the pixel values: over-smoothing causes a decrease in pixel variance, while added ringing or other high-frequency noise, causes an increase in pixel variance. Therefore, we measure the local deviation, in corresponding blocks in the reference and target frame tiles, and compare their values. This process yields a texture tile score in the range [0,1] with 1 indicating no visible texture distortion in the target image tile.

Temporal consistency

The temporal score component evaluates the preservation of temporal flow in the target video sequence compared to the temporal flow in the reference video sequence. This is the only component of the quality measure that also requires the preceding target and reference frames to be leveraged. In this component, we measure two kinds of changes: “new” information introduced in the reference frame which is missing in the target frame, and “new” information in target frame where there was no “new” information in the reference frame. In this context, “new” information refers to information that exists in the current frame but doesn’t exist in the preceding frame. We calculate the Sum of Absolute Differences (SAD) between each co-located 8×8 block in the reference frame and the preceding reference frame, and the SAD between each co-located 8×8 block in the target frame and the preceding target frame. The local (8×8) score is derived from the relation between these two SAD values, and also according to the value of the reference SAD, which indicates whether the block is dynamic or static in nature. Figure 2 illustrates the value of the local score for different combinations of the reference and target SAD values. After all local temporal scores are calculated, they are pooled to obtain a tile temporal score component in the range [0,1].

Figure 2. local temporal score as a function of reference SAD and target SAD values

Edge Loss Factor (ELF)

The Edge Loss Factor score component reflects how well edges in the reference image are preserved in the target image. This component uses the input image edge map, generated during part I of the pre-analysis. In part II of the pre-analysis, the strength of the edge at each edge point in the reference frame is calculated, as the most substantial absolute difference between the edge pixel value and its 8 closest neighbors. We can optionally discard pixels which are considered false edges, by comparing the reference frame edge strength of the pixel to a threshold, which can be adapted, for example, to be higher in a frame which contains film grain. Once values for all edge pixels have been accumulated the final value is scaled to provide an ELF tile score component, in the range [0,1] with 1 indicating perfect edge preservation.

Combining the Score Components

The five tile score components described above are combined into a tile score using weighted geometric averaging, where the weights can be adapted according to the codec used or according to the pre-analysis stage. For example, in codecs with good in-loop deblocking filters we can lower the weight of the blockiness component, while in frames with high levels of film grain (as determined by the pre-analysis stage) we can reduce the weight of the texture distortion component.

Tile Pooling

In the final step of the frame quality score calculation, the tile scores are perceptually pooled to yield a single frame score value. The perceptual pooling uses weights which are dependent on importance (derived from the pre-analysis stages, such as the presence of face and/or skin in the tile), and on the complexity of blocks in the tile compared to average complexity of the frame. The weights are also dependent on tile score values – we give more weight to low scoring tiles, in the same way, human viewers are drawn to quality drops even if they occur in isolated areas.

Score Configurator

The score configurator block is used to configure the calculations for different use cases. For example, in implementations where latency or performance are tightly bounded, the configurator can apply a fast score calculation which skips some of the stages of pre-analysis and uses a somewhat reduced complexity score. To still guarantee a perceptually identical result, the score calculated in this fast mode can be scaled or compensated to account for the slightly lower perceptual accuracy, and this scaling may in some cases slightly reduce savings.

To learn more about CABR, continue reading “A Deep Dive into CABR, Beamr’s Content-Adaptive Rate Control.”

Authors: Dror Gill & Tamar Shoham

Beamr 5 cabr Content-Adaptive

A Deep Dive into CABR, Beamr’s Content-Adaptive Rate Control

Posted on September 11, 2019by Tamar Shoham

Going Inside Beamr’s Frame-Level Content-Adaptive Rate Control for Video Coding

When it comes to video, the tradeoff between quality and bitrate is an ongoing dance. Content producers want to maximize quality for viewers, while storage and delivery costs drive the need to reduce bitrate as much as possible. Content-adaptive encoding addresses this challenge, by striving to reach the “optimal” bitrate for each unique piece of content, be it a full clip or a single scene. Our CABR technology takes it a step further by adapting the encoding at the frame level. CABR is a closed-loop content-adaptive rate control mechanism enabling video encoders to lower the bitrate of their encode, while simultaneously preserving the perceptual quality of the higher bitrate encode. As a low-complexity solution, CABR also works for live or real-time encoding.

All Eyes are on Video

According to Grand View Research, the global video streaming market is expected to grow at a CAGR of 19.6% from 2019 to 2025. This shift, fueled by the increasing popularity of direct-to-consumer streaming services such as Netflix, Amazon and Hulu, the growth of video on social media networks and user-generated video platforms such as Facebook and YouTube, and other applications like online education & video surveillance, has all eyes on video workflows. Therefore, efficient video encoding, in terms of encoding and delivery costs, and meeting the viewer’s rising quality expectations, are at the forefront of video service provider’s minds. Beamr’s CABR solution can reduce bitrates without compromising quality while keeping a low computational overhead to enhance video services.

Comparing Content-Adaptive Encoding Solutions

Instead of using fixed encoding parameters, content-adaptive encoding configures the video encoder according to the content of the video clip to reach the optimal tradeoff between bitrate and quality. Various content-adaptive encoding techniques have been used in the past to provide a better user experience with reduced delivery costs. Some of them have been entirely manual, where encoding parameters are hand-tuned for each content category and sometimes, like in the case of high-volume Blu-ray titles, at the scene level. Manual content-adaptive techniques are restricted in the sense that they can’t be scaled, and they don’t provide granularity lower than the scene level.

Other techniques, such as those used by YouTube and Netflix, use “brute force” encoding of each title by applying a wide range of encoding parameters, and then by employing rate-distortion models or machine learning techniques, try to select the best parameters for each title or scene. This approach requires a lot of CPU resources since many full encodes are performed on each title, at different resolutions and bitrates. Such techniques are suitable for diverse content libraries that are limited in size, such as premium content including TV series and movies. These methods do not apply well to vast repositories of videos such as user-generated content, and are not applicable to live encoding.

Beamr’s CABR solution is different from the techniques described above in that it works in a closed-loop and adapts the encoding per frame. The video encoder first encodes a frame using a configuration based on its regular rate control mechanism, resulting in an initial encode. Then, Beamr’s CABR rate control instructs the encoder to encode the same frame again with various values of encoding parameters, creating candidate encodes. Using a patented perceptual quality measure, each candidate encode is compared with the initial encode, and then the best candidate is selected and placed in the output stream. The best candidate is the one that has the lowest bitrate but still has the same perceptual quality as the initial encode.

Taking Advantage of Beamr’s CABR Rate Control

In order for Beamr’s CABR technology to encode video to the minimal bitrate and still retain the perceptual quality of a higher bitrate encode, it compresses each video frame to the maximum extent that provides the same visual quality when the video is viewed in motion. Figure 1 shows a block diagram of an encoding solution which incorporates CABR technology.

Figure 1 – A block diagram of the CABR encoding solution

An integrated CABR encoding solution consists of a video encoder and the CABR rate control engine. The CABR engine is comprised of the CABR control module responsible for managing the optimization process and a module which evaluates video quality.

As seen in Figure 2, the CABR encoding process consists of multiple steps. Some of these steps are performed once for each encoding session, some are performed once for each frame, and some are performed for each iteration of candidate frame encoding. When starting a content-adaptive encoding session, the CABR engine and the encoder are initialized. At this stage, we set system-level parameters such as the maximum number of iterations per frame. Then, for each frame, the encoder rate control module selects the frame types by applying its internal logic.

Figure 2. A block diagram of a video encoder incorporating Content Adaptive Bit-Rate encoding.

The encoder provides the CABR engine with each original input frame for pre-analysis within the quality measure calculator. The encoder performs an initial encode of the frame, using its own logic for bit allocation, motion estimation, mode selections, Quantization Parameters (QPs), etc. After encoding the frame, the encoder provides the CABR engine with the reconstructed frame corresponding to this initially encoded frame, along with some side information – such as the frame size in bits and the QP selected for each MacroBlock or Coding Tree Unit (CTU).

In each iteration, the CABR control module first decides if the frame should be re-encoded at all. This is done, for example, according to the frame type, the bit consumption of the frame, the quality of previous frames or iterations, and according to the maximum number of iterations set for the frame. In some cases, the CABR control module may decide not to re-encode a frame at all – in that case, the initial encoded frame becomes the output frame, and the encoder continues to the next frame. When the CABR control module decides to re-encode, the CABR engine provides the encoder with modified encoding parameters, for example, a proposed average QP for the frame, or the difference from the QP used for the initial encode. Note that the QP or delta QP values are an average value, and QP modulation for each encoding block can still be performed by the encoder. In more sophisticated implementations a QP map of value per encoding block may be provided, as well as additional encoder configuration parameters.

The encoder performs a re-encode of the frame with the modified parameters. Note that this re-encode is not a full encode, since it can utilize many encoding decisions from the initial encode. In fact, the encoder may perform only re-quantization of the frame, reusing all previous motion vectors and mode decisions. Then, the encoder provides the CABR engine with the reconstructed re-encoded frame, which becomes one of the candidate frames. The quality measure module then calculates the quality of the candidate re-encoded frame relative to the initially encoded frame, and this quality score, along with the bit consumption reported by the encoder is provided to the CABR control module, which again determines if the frame should be re-encoded. When that is the case, the CABR control module sets the encoding parameters for the next iteration, and the above process is repeated. If the control module decides that the search for the optimal frame parameters is complete, it indicates which frame, among all previously encoded versions of this frame, should be used in the output video stream. Note that the encoder rate control module receives its feedback from the initial encode of the current frame, and in this way the initial encode of the next frames (which determines the target quality of the bitstream) is not affected.

The CABR engine can operate in either a serial iterative approach or a parallel approach. In the serial approach, the results from previous iterations can be used to select the QP value for the next iteration. In the parallel approach, all candidate QP values are provided simultaneously and encodes are done in parallel – which reduces latency.

Integrating the CABR Engine with Software & Hardware Encoders

Beamr has integrated the CABR engine into its AVC software encoder, Beamr 4, and into its HEVC software encoder, Beamr 5. However, the CABR engine can be integrated with any software or hardware video encoder, supporting any block-based video standard such as MPEG-2, AVC, HEVC, EVC, VVC, VP9, and AV1.

To integrate the CABR engine with a video encoder, the encoder should support several requirements. First and foremost, the encoder should be able to re-encode an input frame (that has already been encoded) with several different encoding parameters (such as QP values), and save the “state” of each of these encodes, including the initial encode. The reason for saving the state is that when the CABR control module selects one of the candidate frame encodes (or the initial encode) as the one to use in the output stream, the encoder’s state should correspond to the state it was right after encoding that candidate frame. Encoders that support multi-threaded operation and hardware encoders typically have this capability, since each frame encode is performed by a stateless unit.

Second, the encoder should support an interface to provide the reconstructed frame and the per block QP and bit consumption information for the encoded frame. To improve compute performance, we also recommend that the encoder supports a partial re-encode mode, where information related to motion estimation, partitioning and mode decisions found in the initial encode can be re-used for re-encoding without being computed again, and only the quantization and entropy coding stages are repeated for each candidate encode. This results in a minimal encoding efficiency drop for the optimized encoding result, with significant speed-up compared to full re-encode. As described above, we recommend that the encoder will use the initial encoded data (QPs, compressed size, etc.) for its Rate Control state update. However, the selected frame and accompanying data must be used for reference frames and other reference data, such as temporal MV predictors, as it is the only data available in the bitstream for decoding.

When integrating with hardware encoders that support parallel encoding with no increase in latency, we recommend using the parallel search approach where multiple QP values per frame are evaluated simultaneously. If the hardware encoder can perform parallel partial encodes (for example, re-quantization and entropy coding only), while all parallel encodes use the analysis stage of the initial encode, such as motion estimation and mode decisions, better CPU performance will be achieved.

Sample Results

Below, we provide two sample results of the CABR engine, when integrated with Beamr 5, Beamr’s HEVC software encoder, each illustrating different aspects of CABR.

For the first example, we encoded various 4K 24 FPS source clips to a target bitrate of 10 Mbps. Sample frames from each of the clips can be seen in Figure 3. The clips vary in their content complexity: “Crowd Run” has very high complexity since it has great detail and very significant motion of the runners. “StEM” has medium complexity, with some video compression challenges such as different lighting conditions and reasonably high film grain. Finally, a promotional clip of JPEGmini by Beamr has low complexity due to relatively low motion and simple scenes.

Figure 3. Sample frames from the test clips. top: crowd-run, bottom left: StEM bottom right: JPEGmini.

We encoded 500 frames from each clip to a target bitrate of 10 Mbps, using the VBR mode of the Beamr 5 HEVC encoder, which performs regular encoding, and using the CABR mode, which creates a lower bit-rate, perceptually identical stream. For the high complexity clip “Crowd Run,” where providing excellent quality at such an aggressive bitrate is very challenging, CABR reduced the bitrate by only 3%. For the intermediate complexity clip “StEM,” bitrate savings were higher and reached 17%. For the lowest complexity clip “JPEGmini,” CABR reduced the bitrate by a staggering 45%, while still obtaining excellent quality which matches the quality of the 10 Mbps VBR encode. This extensive range of bitrate reduction percentage demonstrates the fully automatic content-adaptive nature of CABR-enhanced encoder, which reaches a different final bitrate, according to the content complexity.

The second example uses a 500 frame 1080p 24 FPS clip from the well-known “Tears Of Steel” movie by the Blender open movie project. The same clip was encoded using the VBR and CABR modes of the Beamr 5 HEVC software encoder, with three target bitrates: 1.5, 3 and 5 Mbps. Savings, in this case, were 13% for the lowest bitrate resulting in a 1.4 Mbps encode, 44% for the intermediate bitrate resulting in an encode of 1.8 Mbps, and 62% for the highest bitrate, resulting in a 2 Mbps encode. Figures 4 and 5 show sample frames from the encoded clips with VBR encoding on the left vs. CABR encoding on the right. The top two images are from encodes to a bitrate of 5 Mbps, while the bottom two were taken from the 1.5 Mbps encodes. As can be seen here, both 5 Mbps target encodes preserve the details, such as the texture of the bottom lip or the two hairs on the forehead above the right eye, while in the lower bitrate encodes these details are somewhat blurred. This is the reason that when starting from different target bitrates, CABR does not converge to the same bitrate. We also see, however, that the more generous the initial encoding, generally the more savings can be obtained. This example shows that CABR adapts not only to the content complexity, but also to the quality of the target encode, and preserves perceptual quality in motion while offering significant savings.

Figure 4. A sample from the “Tears of Steel” 1080p 24 FPS encode to 5 Mbps (top) and 1.5 Mbps (bottom), encoded in VBR mode (left) and CABR mode (right)

Figure 5. Closer view of the face in Figure 4, showing detail of lips and forehead from the encode to 5 Mbps (top) and 1.5 Mbps (bottom), encoded in VBR mode (left) and CABR mode (right).

To learn how our CABR solution leverages our patented quality measure, continue to “The patented visual quality measure that makes all the difference.”

Authors: Dror Gill & Tamar Shoham

eGaming Uncategorized

They don’t collect baseball cards, but eSports super fans are giving traditional sports a run for their money.

Posted on June 22, 2019by Dror Gill

For sports fans that grew up before the 90s, you likely have fond memories of collecting baseball memorabilia or going to your first game. To hear eSports fans rattle off statistics and kill ratios from their favorite Twitch stream or boasting about their latest Fortnite Skin may bring back memories of spilling over your favorite team’s stats in the newspaper on Sundays or trading cards with your friends. Today’s eSports fans may engage a little differently than the traditional sporting fans of yonder, but they may be the most engaged fans in history – and they’re making their mark.

With over 2.2M creators streaming on Amazon’s popular video game streaming service, Twitch, every month, 517M watching gaming on YouTube, and another 185M consuming their video content on Twitch, eSports viewership has surpassed HBO, Netflix, ESPN & Hulu, combined. The massive online gaming viewership is changing the sporting landscape and technology requirements for fans, content providers, and ISPs alike.

To meet online gamers demand for faster & higher quality gaming experiences, Cox Cable recently launched a trial of their Elite Gamer package. The Elite Gamer package is a premium offer that they claim will result in “34% less lag, 55% fewer ping spikes, and 45% less jitter” by speeding up the connection between the player and the desired gaming server.

We believe this marks one of the first in what will be standard practices for ISPs and content providers. When you factor in the massive amount of content delivered and consumed via Twitch & YouTube into perspective, it’s no wonder that vendors are starting to consider how they will address the bandwidth & technology requirements that are needed to maintain the eGaming industry. For the casual gamer, they are required to have a download speed of at least 3 Mbps, an upload speed of at least 1 Mbps, and a ping rate under 150 ms and those figures multiply with each concurrent player in your household.

At Beamr, we live and breathe optimization. For us, the quality and bandwidth challenges introduced by the gaming industry are an opportunity to see how far we can push the limits of balancing video compression with the highest video quality possible.

If you are passionate about gaming and are curious about what it takes to deliver a high-quality cloud gaming experience, you will enjoy this episode from our podcast, The Video Insiders, where we interviewed Yuval Noimark from Electronic Arts. Listen to Episode 15 here.

Sources:

https://www.superdataresearch.com/market-data/gaming-video-content/

https://twitchadvertising.tv/audience/

https://www.highspeedinternet.com/resources/how-much-speed-do-i-need-for-online-gaming

Uncategorized

Why Game of Thrones is pushing video encoder capabilities to the edge

Posted on May 3, 2019by Dror Gill

Game of Thrones entered its eighth and final season on April 14th, 2019. Though Game of Thrones has been a cultural phenomenon since the beginning of its airing, the attention and eyeballs on these final episodes have been higher than ever. Right now, every aspect of season eight is under a microscope and up for discussion, from the battle of Winterfell to theories on the Azor Ahai prophecy, super fans are taking to the internet and social media to debate and swap theories. Yet, even if you’ve installed the Chrome extension GameOfSpoilers to block Game of Thrones news from popular social networks, you probably did not miss all the fans who flocked to social media to report their dissatisfaction with the poor quality of Season 8 Episode 3, “The Longest Night.”

CHECK OUT THIS PODCAST EPISODE

If nothing else that episode of #GameofThrones is proof positive that @Xfinity needs to start broadcasting in 4K pronto because the night scenes in that kind of looked like ass in their HBO broadcast. If I’m going to pay this much for cable I want better picture quality.
— Michael S (@TheMovieVampire) April 29, 2019

So can we talk about the bootleg quality of the war scenes or nah? #GameOfThrones
— Marquis Johns (@weaksauceradio) April 29, 2019

Though not all viewers experienced degraded visual quality for “The Longest Night”, a sufficiently high number did report a poor viewer experience, which triggered TechCrunch to write an article titled, “Why did last night’s ‘Game of Thrones’ look so bad?”

And TechCrunch wasn’t alone, The Verge also wrote an entire piece on how to setup your TV for a rewatch of “The Longest Night”, something that seems hardly possible. After all how is it possible that fans could need to rewatch an episode, not because the plot was so twisted or complicated that they needed a second pass at deciphering it, but because they couldn’t see what was happening on the screen? And in fact, Game of Thrones super fans were not shy in taking to Twitter with their quality assessments.

Why does this look so bad?

@HBO @GameOfThrones the picture quality tonight is absolutely terrible. All that time spent filming this beautiful episode and it looks like it’s running off a 1989 VHS tape…
— Chris Bohn (@CBohn83) April 29, 2019

Before you throw a Valyrian steel dagger at your TV, let’s take a close look at what happened to create this poor video quality by diving into the operational structure of the underlying video codecs that are used by all commercial streaming services.

The video compression schemes used in video streaming, including AVC which is also used by most PayTV cable and satellite distributors, utilize hybrid block-based video codecs. These codecs use block-based encoding methods which mean each video frame or picture is partitioned into blocks during the compression process, and they also apply motion estimation between frames. Though the effective compression that is made possible by these techniques is impressive, hybrid block-based compression schemes are inherently prone to creating blockiness and banding artifacts, which can be particularly evident in dark scenes.

Blockiness is a video artifact where areas of a video image appear to be comprised of small squares, rather than proper detail and smooth edges as the viewer would expect to see. The blockiness artifact happens when not enough detail is preserved in each of the coding blocks, resulting in inconsistencies between adjacent blocks and making each block appear separate from its neighbors. The video quality will suffer from blockiness when too much detail is lost within each block.

There are two main causes of blockiness. The first is when there is a mismatch between the content complexity and the target bitrate. It can be present either in highly complex content which is encoded at typical bitrates, or in standard content which is compressed to overly aggressive bitrates. Content providers can avoid this by using content adaptive solutions which match the encoder bitrate to the content complexity. The second cause of blockiness is from poor quality decisions made by the encoder, such as discarding information which is crucial for visual quality.

As noted by TechCrunch, for the specific Game of Thrones episode “The Longest Night”, the images are very dark and have a limited range of colors and brightness levels, basically between grey and dark grey. Encoding this limited range of grey shades, which filled up most of the screen, resulted in “banding” artifacts which is where there are visible transitions as a result of the video being represented by just a few shades of grey, which look like “stripes” instead of smooth gradients. Video suffers from banding when the color or brightness palette being used has too few values to accurately describe the shades present in part or all of the video frame.

The prevalent assumption even among some video engineers is that increasing bitrate is the cure-all to video quality problems. But as we’ll see in this case, it’s likely that even if the bitrate had been doubled, the systemic artifacts would still be present. Thus, the solution is not likely external to the video encoding process, but rather can only be addressed at the codec implementation level.

That is, the video encoding engine must be improved to prevent situations like this in the future. That, or HBO and other premium content owners could instruct their filmmakers to avoid dark scenes – We’ll stick with Option #1!

In this case, the video quality issues were not caused by the video encoding bitrate being too low. In fact, the bitrate used was more than sufficient to represent the limited range of colors and brightness. The issue was in the decisions made by the specific video encoder used by HBO. These are decisions regarding how to allocate the available bitrate, or how the bits should be used for different elements of the compressed video stream.

Without getting too deep into video compression technology, it is sufficient to say that a compressed video stream consists basically of two types of data.

The first type is prediction data which enables the creation of a prediction block from previously decoded pixels (in either the same or a reference frame). This prediction block acts as a rough estimate of the block and is complemented by the residual or error block, encoded in the stream. This is essentially a block that fills in the difference between the predicted block and the actual source video block.

The second type of data that is key to how a block-based video encoder works can be found in the rate-distortion algorithm which optimizes the selection of the prediction modes which the prediction data represents. This determines the level of detail to preserve in the residual block. The decisions are made in an attempt to minimize distortion, or maximize quality, for a specific bit allocation which is derived from the target bit-rate.

When a scene is very dark and consists of small variations of pixel colors, the numerical estimates of distortion may be skewed. Components of the video encoder including motion estimation and the rate-distortion algorithm should adapt to optimize the allocations and decisions for this particular use case.

For example, the motion estimation might classify the differences in pixel values as noise instead of actual motion, thus providing inferior prediction information. In another example, if the distortion measures are not tuned correctly, the residual may be considered noise rather than true pixel information and may be discarded or aggressively quantized.

Another common encoder technique that is affected and often “fooled” by very dark scenes is early terminations. Many encoders use this technique to “guess” what would be the best encoding decision they should make, instead of making an exhaustive search of all possibilities, and computing their cost. This technique improves the performance of the video encoder, but in the case of dark scenes with small variations, it can cause the encoder to make the wrong decision.

Some encoding engineers use a technique called “capped CRF” for encoding video at a constant quality instead of a pre-defined bitrate. This is a simple form of “content-adaptive” or “content-aware” encoding, which produces different bitrates for each video clip or scene based on its content. In some implementations, when this technique is used for dark scenes, it can also be “fooled” by the limited range of color and brightness values and may perform very aggressive quantization thus removing too much information form the residual blocks, resulting in these blockiness and banding artifacts.

In summary, we can conclude that dark scenes can lead to various encoding issues if the encoder is not “well prepared” for this type of content, and it seems that this is what happened with this Game of Thrones episode.

Better luck quality, next time.

Hey HBO, why is #GameofThrones in potato quality?
— Rabbit Raccoon (@PollyQPublic) April 29, 2019

In order to ensure good quality video across different content types, the encoder must be able to correctly adapt to each and every frame being encoded. In Beamr encoders we tackle this with a combination of tools and algorithms to provide the best video quality possible.

Beamr encoders use unique, patented and patent-pending approaches, to calculate psycho-visual distortions to be used by the rate control module when deciding on prediction modes and on the bit allocations for different components of the compressed video data. This means that the actual visual impact of different decisions is taken into account, resulting in the improved visual quality of the content across a variety of content types.

Beamr encoders offer a wide variety of encoding speeds for different use cases, ranging from lightning fast to enable a full 4K ABR stack to be generated on a single server for live or VOD. When airing premium content of the caliber of Game of Thrones, one should opt for the maximum quality by using slower encoder speeds.

In these speeds, the encoder is wary of invoking early termination methods and thus does not overlook the data that may be hiding in the small deviations of the dark pixel values. We invest a huge effort to discover the optimal combinations for all the internal controls of the algorithm such as the most optimum lambda values for the different rate-distortion cost calculations, optimal values for the deblocking filter (and SAO in the case of Beamr 5 for HEVC) , and many other details – none of which are overlooked.

Rather than use a CRF like based approach for constant quality encoding, Beamr employs a sophisticated technique for content-adaptive encoding called CABR. The content-adaptive bitrate mode of encoding operates in a closed loop and examines the quality of each frame using a patented perceptual quality measure. Our perceptual quality measure is also specifically tuned to adapt to the “darkness” of the frame and each region within the frame, which makes it highly effective even when processing very dark scenes such as the “The Longest Night”, or fade-in and fade-out sequences.

Looking to the Future

For content providers, viewer expectations and demands for quality will continue to rise each year. A decade ago, you could slide by not delivering a consistent experience across devices. Today, not only is video degradation noticed by your viewers, it can have a massive impact on your audience and churn if you’re not delivering an experience inline with their quality expectations.

To see what high quality at the lowest bitrate should look like, try Beamr Transcoder for free or contact our team by sending an email to sales@beamr.com to learn about our comparison tool Beamr View.

HEVC

HEVC Bitrate Efficiency

Posted on April 13, 2019by Dror Gill

Today, video streaming services must offer solutions that can evolve as the demand for content availability across a wide range of devices increases. Though new innovations in display and capture technology are making headlines, the core pillars that differentiate every service are still video quality and user experience.

Beamr’s HEVC & H.264 codecs having been engineered to reduce the bitrate of video files and streams while maintaining the perceptual quality of the content, ultimately reducing the bandwidth required to stream video to every viewer’s device while offering the best visual quality possible.

Leveraging our 44 granted patents, the content-adaptive bitrate (CABR) technology enables a 20 to 40 percent (sometimes higher) reduction in bitrate without any degradation to the video.

We know that this sounds too good to be true, which is why we are providing a real example, including download links to the original files, and special free access to Beamr View so you can test yourself.

To get Beamr View CLICK HERE.

In this example, we are comparing an HEVC VBR encode with an HEVC CABR version.

We took the Test File and created an original using Beamr 5x VBR rate control, then compared it with the file on the right which was encoded with CABR. This dropped the bitrate from 3.09 Mbps to 1.44 Mbps.

Would you like to test the results yourself?

To see the results, follow one of these testing methods:

1. Download the pre-encoded files and then use Beamr View to compare the visual quality of the HEVC VBR file against the file encoded with HEVC CABR.

2. Download the Test File and run your own encodes of the Test File using Beamr Transcoder. Compare the visual quality of your results by comparing the HEVC CABR with the HEVC VBR encoded file using Beamr View.

Below you will find the test files:

Download Original Test File

Download file encoded with HEVC VBR

Download file encoded with HEVC CABR

Comparing Quality

When is comes to assessing and comparing video quality, do you know what to look for? Our team of image scientists have put together the following tips for you to use during your tests.

Quality is in the eye of the beholder

When a user is comparing visual quality, the best measurement tool is the human eye. Visual quality is a subjective measure, meaning that image scientists and video engineers must rely on physically looking at an encoded file to determine whether the visual quality is better or worse than the comparison. To go off of quality metrics such as PSNR and SSIM alone isn’t enough because even if a video has the highest possible PSNR or SSIM score, it may not have the highest visual quality at a given bitrate.

Speed, bitrate, and rate control

There are multiple methods to encode video and blocks can be encoded in various ways for speed, bitrate, and quality. In order to validate your test, you must configure both encoders to operate at similar speeds in order to assess whether the bitrate-to-quality tradeoff is favorable or not. To take it a step further, leveraging rate controls enables the user to maintain bitrate limits throughout a clip or video to replicate the needs of a scalable application.

Comparing moving video instead of still frames

The only way to effectively assess the quality of video is to comparing moving video instead of still frames. In order to accurately compare the visual quality of two decoded frames, artifacts, motion inaccuracy, and other visual degradation must be assessed while the content is moving.

The Video Insiders

MPEG Through the Eyes of Its Chairman [podcast]

Posted on February 12, 2019by Dror Gill

In “E08: MPEG Through the Eyes of Its Chairman,” The Video Insiders had the honor of sitting down with, Leonardo Chiariglione, the chairman and founder of the MPEG committee, to discuss the history of MPEG and what it means for the next generation of video codecs.

Leonardo brings his extensive experience to the table as he revisits 40 years of codec development and the strategy behind building leading codecs. Listeners dive into a hearty discussion surrounding patent pools and licensing terms and their effect on the success, failure, and adoption of video codecs.

https://youtu.be/tf3VNHr4GGA

Tune in to the full episode “MPEG Through the Eyes of Its Chairman” or watch the snippet above.

Want to join the conversation? Reach out to TheVideoInsiders@beamr.com

Broadcast Content Everywhere OTT The Video Insiders

Microservices – Good on a Bad Day [podcast]

Posted on February 4, 2019by Dror Gill

Live streaming is arguably the least forgiving industry in today’s market. Anyone involved with live streaming workflows understands the sensitivity and high stakes involved with live streaming any event. Your viewers, on the other hand, don’t factor in the complexities of what happens behind the scenes when it comes to their quality expectations – but they certainly notice when something goes awry. In the words of id3as’ Adrien Roe, “What differentiates a great service from a merely good service is what happens when things go wrong.” And that’s where microservices can save the day.

In “Episode 07: Microservices – Good on a Bad Day,” The Video Insiders sit down with Dom Robinson & Dr. Adrian Roe from id3as to discuss how broadcasters are leveraging microservices to solve some of their workflow challenges.

https://youtu.be/Bt2earTfEU8

Press play to hear a snippet from Episode 07 or click here for the full episode.

Want to join the conversation? Reach out to TheVideoInsiders@beamr.com

H.264 HEVC HEVC/VP9 The Video Insiders x265

In the battle between open source & proprietary technology, does video win? [podcast]

Posted on January 24, 2019by Dror Gill

Video engineers dedicated to engineering encoding technologies are highly skilled and hyper-focused on developing the foundation for future online media content. Such a limited pool of experts in this field creates a lot of opportunity for growth and development, it also means there must be a level of camaraderie and cooperation between different methodologies.

In past episodes, you’ve seen The Video Insiders compare codecs head-to-head and debate over their strengths and weaknesses. Today, they are tackling a deeper debate between encoding experts: the advantages and disadvantages of proprietary technology vs. community-driven open source.

In Episode 05, Tom Vaughan surprises The Video Insiders as he talks through his take on open source vs. proprietary technology.

Tune in to “E05: In the battle between open source & proprietary technology, does video win?” here.

https://youtu.be/DS10zDO7FOg

Press play to hear a snippet from Episode 05, or click here for the full episode.

Want to join the conversation? Reach out to TheVideoInsiders@beamr.com

TRANSCRIPTION (lightly edited to improve readability only)

Mark Donnigan: 00:00 In this episode, we talk with a video pioneer who drove a popular open source codec project before joining a commercial codec company. Trust me, you want to hear what he told us about proprietary technology, open source, IP licensing, and royalties.

Announcer: 00:18 The Video Insiders is the show that makes sense of all that is happening in the world of online video, as seen through the eyes of a second generation codec nerd and a marketing guy who knows what iframes and macroblocks are. Here are your hosts, Mark Donnigan and Dror Gill.

Mark Donnigan: 00:35 Okay.

Mark Donnigan: 00:35 Well, welcome back everyone to this very special edition. Every edition is special, isn’t it, Dror?

Dror Gill: 00:43 That’s right. Especially the first editions where everybody’s so excited to see what’s going to happen and how it would evolve.

Mark Donnigan: 00:49 You know what’s amazing, Dror, we had in the first 48 hours, more than 180 download.

Dror Gill: 00:55 Wow.

Mark Donnigan: 00:56 You know, we’re like encoding geeks. I mean, are there even 180 of us in the world?

Dror Gill: 01:01 I don’t know. I think you should count the number of people who come to Ben Wagoner’s compressionist breakfast at NAB, that’s about the whole industry, right?

Mark Donnigan: 01:09 Yeah. That’s the whole industry.

Mark Donnigan: 01:11 Hey, we want to thank, seriously in all seriousness, all the listeners who have been supporting us and we just really appreciate it. We have an amazing guest lined up for today. This is a little personal for me. It was IBC 2017, I had said something about a product that he was representing, driving, developing at the time. In fact, it was factually true. He didn’t like it so much and we exchanged some words. Here’s the ironic thing, this guy now works for us. Isn’t that amazing, Dror?

Click to view x265 vs. Beamr 5 speed and performance test.

Dror Gill: 01:49 Yeah, isn’t that amazing?

Mark Donnigan: 01:52 You know what, and we love each other. The story ended well, talk about a good Hollywood ending.

Mark Donnigan: 01:58 Well, we are talking today with Tom Vaughn. I’m going to let you introduce yourself. Tell the listeners about yourself.

Tom Vaughn: 02:10 Hey Mark, hey Dror. Good to be here.

Tom Vaughn: 02:12 As Mark mentioned, I’m Beamr’s VP of strategy. Joined Beamr in January this year. Before that I was Beamr’s, probably, primary competitor, the person who started and led the x265 project at MulticoreWare. We were fierce competitors, but we were always friendly and always friends. Got to know the Beamr team when Beamr first brought their image compression science from the photo industry to the video industry, which was three or four years ago. Really enjoyed collaborating with them and brainstorming and working with them, and we’ve always been allies in the fight to make new formats successful and deal with some of the structural issues in the industry.

Dror Gill: 03:02 Let me translate. New formats, that means HEVC. Structural issues, that means patent royalties.

Tom Vaughn: 03:08 Yes.

Dror Gill: 03:09 Okay, you can continue.

Tom Vaughn: 03:11 No need to be subtle here.

Tom Vaughn: 03:13 Yeah, we had many discussions over the years about how to deal with the challenging macro environment in the codec space. I decided to join the winning team at Beamr this year, and it’s been fantastic.

Mark Donnigan: 03:28 Well, we’re so happy to have you aboard, Tom.

Mark Donnigan: 03:32 I’d like to just really jump in. You have a lot of expertise in the area of open source, and in the industry, there’s a lot of discussion and debate, and some would even say there’s religion, around open source versus proprietary technology, but you’ve been on both sides and I’d really like to jump into the conversation and have you give us a real quick primer as to what is open source.

Tom Vaughn: 04:01 Well, open source is kind of basic what it says is that you can get the full source code to that software. Now, there isn’t just one flavor of open source in terms of the software license that you get, there are many different open source licenses. Some have more restrictions and some have less restrictions on what you can do. There are some well known open source software programs and platforms, Linux is probably the most well known in the multimedia space, there’s FFmpeg and Libav. There’s VLC, the multimedia player. In the codec space, x264, x265, VP9, AV1, et cetera.

Dror Gill: 04:50 I think the main attraction of open source, I think, the main feature is that people from all over the world join together, collaborate, each one contributes their own piece, then somehow this is managed together. Every bug that is discovered, anyone can fix it, because the source is open. This creates kind of a community and together a piece of software is created that is much larger and more robust than anything that a single developer could do on his own.

Tom Vaughn: 05:23 Yeah, ideally the fact that the source code is open means that you have many sets of eyes, not only trying the program, but able to go through the source code and see exactly how it was written and therefore more code review can happen. On the collaboration side, you’re looking for volunteers, and if you can find and energize many, many people worldwide to become enthusiastic and devote time or get their companies motivated to allocate developers full- or part-time to a particular open source project, you get that collaboration from many different types of people with different individual use cases and motivations. There are patches submitted from many different people, but someone has to decide, does that patch get committed or are there problems with that? Should it be changed?

Tom Vaughn: 06:17 Designed by a committee isn’t always the optimal, so someone or some small group has to decide what should be included, what should be left out.

Dror Gill: 06:27 It’s interesting to see, actually, the difference between x264 and x265 in this respect, because x264, the open source implementation of x264 was led by a group of developers, really independent developers, and no single company was owning or leading the development of that open source project. However, with x265, which is the open source implementation of HEVC, your previous company, MulticoreWare, has taken the lead and devoted, I assume, most of the development resources that have gone into the open source development, most of the contributions came from that company, but it is still an open source project.

Tom Vaughn: 07:06 That’s right. x264 was started by some students at a French university, and when they were graduating, leaving the university, they convinced the university to enable them to take the code with them, essentially under an open source license. It was very much grassroots open source beginnings and execution where developers may come and go, but it was a community collaboration.

Tom Vaughn: 07:31 I started x265 at MulticoreWare with a couple of other individuals, and the way we started it was finding some commercial companies who expressed a strong interest in such a thing coming to life and who were early backers commercially. It was quite different. Then, because there’s a small team of full-time developers on it working 40 hours plus a week, that team is moving very fast, it’s organized, it’s within a company. There was less of a need for a community. While we did everything we could to attract more external contributors, attracting contributors is always a challenge of open source projects.

Mark Donnigan: 08:14 What I hear you saying, Tom, is it sounds like compared to the x264 project, the x265 project didn’t have as large of a independent group of contributors. Is that …?

Tom Vaughn: 08:29 Well, x264 was all independent contributors.

Mark Donnigan: 08:32 That’s right.

Tom Vaughn: 08:33 And still is, essentially. There are many companies that fund x264 developers explicitly. Chip companies will fund individual developers to optimize popular open source software projects for their instruction set. AVX, AVX2, AVX512, essentially, things like that.

Tom Vaughn: 08:58 HEVC is significantly more complex than AVC, and I think, if I recall correctly, x265 already has three times the number of commits than x264, even though it’s only been in existence for one third of the life.

Dror Gill: 09:12 So Tom, what’s interesting to me is everybody’s talking about open source software being almost synonymous with free software. Is open source really free? Is it the same?

Tom Vaughn: 09:23 It can be at times. One part depends on the license and the other part depends on how you’re using the software. For example, if it’s a very open license like Apache, or BSD, or UIUC, that’s an attribution only license, and you’re pretty much free to create modifications, incorporate the software in your own works and distribute the resulting system.

Tom Vaughn: 09:49 Software programs like x264 and x265 are licensed under the GNU GPL V2, that is an open source license that has a copyleft requirement. That means if you incorporate that in a larger work and distribute that larger work, you have to open source not only your modifications, but you have to open source the larger work. Most commercial companies don’t want to incorporate some open source software in their commercial product, and then have to open source the commercial product. The owners of the copyright of the GPL V2 code, x264 LLC or MulticoreWare, also offer a commercial license, meaning you get access to that software, not under the GNU GPL V2, but under a separate, different license, in which case for you, it’s not open source anymore. Your commercial license dictates what you can and can’t do. Generally that commercial license doesn’t include the copyleft requirement, so you can incorporate it in some commercial product and distribute that commercial product without open sourcing your commercial product.

Dror Gill: 10:54 Then you’re actually licensing that software as you would license it from a commercial company.

Tom Vaughn: 10:59 Exactly. In that case it’s not open source at all, it’s a commercial license.

Dror Gill: 11:04 It’s interesting what you said about the GPL, the fact that anything that you compile with it, create derivatives of, incorporate into your software, you need to open source those components that you integrate with as well. I think this is what triggered Steve Ballmer to say in 2001, he said something like, “Open source is a cancer that spreads throughout your company and eats your IP.” That was very interesting. I think he meant mostly GPL because of that requirement, but the interesting thing is that he said that in 2001, and in 2016 in an interview, he said, “I was wrong and I really love Linux.” Today Microsoft itself open sources a lot of its own development.

Mark Donnigan: 11:48 That’s right. Yeah, that’s right.

Mark Donnigan: 11:50 Well Tom, let’s … This has been an awesome discussion. Let’s bring it to a conclusion. When is proprietary technology the right choice and when is open source maybe the correct choice? Can you give the listeners some guidelines?

Tom Vaughn: 12:08 Sure, people are trying to solve problems. Engineers, companies are trying to build products and services, and they have to compete in their own business environment. Let’s say you’re a video service and you run a video business. The quality of that video and the efficiency that you can deliver that video matters a lot. We know what those advantages of open source are, and all things being equal, people gravitate towards open source a lot because engineers feel comfortable actually seeing the source code, being able to read through it, find bugs themselves if pushed to the limit.

Tom Vaughn: 12:45 At the end of the day, if an open source project can’t produce the winning implementation of something, you shouldn’t necessarily use it just because it’s open source. At the end of the day you have a business to run and what you want is the most performant libraries and platforms to build your business around. If you find that a proprietary implementation in the long run is more cost effective, more efficient, higher performance, and the company that is behind that proprietary implementation is solid and is going to be there for you and provide a contractual commitment to support you, there’s no reason to not choose some proprietary code to incorporate into your product or service.

Tom Vaughn: 13:32 When we’re talking about codecs, there are particular qualities I’m looking for, performance, how fast does it run? How efficiently does it utilize compute resources? How many cores do I need in my server to run this in real time? And compression efficiency, what kind of video quality can I get at a given bit rate under a given set of conditions? I don’t want the second best implementation, I want the best implementation of that standard, because at scale, I can save a lot of money if I have a more efficient implementation of that standard.

Mark Donnigan: 14:01 Those are excellent pointers. It just really comes back to we’re solving problems, right? It’s easy to get sucked into religious debates about some of these things, but at the end of the day we all have an obligation to do what’s right and what’s best for our companies, which includes selecting the best technology, what is going to do the best job at solving the problems.

Mark Donnigan: 14:24 Thank you again for joining us.

Tom Vaughn: 14:25 My pleasure, thank you.

Dror Gill: 14:26 I would also like to thank you for joining us, not only joining us on this podcast, but also joining Beamr.

Mark Donnigan: 14:32 Absolutely.

Mark Donnigan: 14:33 Well, we want to thank you the listener for, again, joining The Video Insiders. We hope you will subscribe. You can go to thevideoinsiders.com and you can stream from your browser, you can subscribe on iTunes. We’re on Spotify. We are on Google Play. We’re expanding every day.

Announcer: 14:57 Thank you for listening to The Video Insiders podcast, a production of Beamr Limited. To begin using Beamr’s codecs today, go to Beamr.com/free to receive up to 100 hours of no-cost HEVC and H.264 transcoding every month.

ATSC 3.0 HEVC The Video Insiders Video Encoding Video Trends

2018, the Year HEVC Took Flight. [podcast]

Posted on January 1, 2019by Dror Gill

By now, most of us have seen the data and know that online video consumption is soaring at a rate that is historically unrivaled. It’s no surprise that in the crux of the streaming era, so many companies are looking to innovate and figure out how to make their workflows or customers workflows better, less expensive, and faster.

In Episode 4 of The Video Insiders, we caught up with streaming veteran Tim Siglin to discuss HEVC implementation trends that counter previous assumptions, notable 2018 streaming events, and what’s coming in 2019.
Tune in to hear The Video Insiders cover top-of-mind topics:

HEVC for lower resolutions
Streaming the World Cup
Moving from digital broadcast to IP-based infrastructure
What consumers aren’t thinking about when it comes to 4K and HDR
Looking forward into 2019 & beyond

Tune in to Episode 04: 2018, the Year HEVC Took Flight or watch the video below.

https://www.youtube.com/watch?v=WHsSkrFcJJc

Want to join the conversation? Reach out to TheVideoInsiders@beamr.com

TRANSCRIPTION (lightly edited to improve readability only)

Mark Donnigan: 00:00 On today’s episode, the Video Insiders sit down with an industry luminary who shares results of a codec implementation study, while discussing notable streaming events that took place in 2018 and what’s on the horizon for 2019. Stay tuned. You don’t want to miss receiving the inside scoop on all this and more.

Announcer: 00:22 The Video Insiders is the show that makes sense of all that is happening in the world of online video, as seen through the eyes of a second generation Kodak nerd and a marketing guy who knows what I frames and macroblocks are. Here are your hosts, Mark Donnigan and Dror Gill.

Mark Donnigan: 00:40 Welcome, everyone. I am Mark Donnigan, and I want to say how honored Dror and I are to have you with us. Before I introduce this very special guest and episode, I want to give a shout of thanks for all of the support that we’re receiving. It’s really been amazing.

Dror Gill: 00:58 Yeah. Yeah, it’s been awesome.

Mark Donnigan: 00:59 In the first 48 hours, we received 180 downloads. It’s pretty amazing.

Dror Gill: 01:06 Yeah. Yeah, it is. The industry is not that large, and I think it’s really an amazing number that they’re already listening to the show from the start before the word of mouth starts coming out, and people spread the news and things like that. We really appreciate it. So, if it’s you that is listening, thank you very much.

Mark Donnigan: 01:29 We really do aim for this to be an agenda-free zone. I guess we can put it that way. Obviously, this show is sponsored by Beamr, and we have a certain point of view on things, but the point is, we observed there wasn’t a good place to find out what’s going on in the industry and sort of get unbiased, or maybe it’s better to say unfiltered, information. That’s what we aim to do in every episode.

Mark Donnigan: 01:57 In this one, we’re going to do just that. We have someone who you can definitely trust to know what’s really happening in the streaming video space, and I know he has some juicy insights to share with us. So, without further ado, let’s bring on Tim Siglin.

Tim Siglin: 02:15 Hey, guys. Thank you for having me today and I will definitely try to be either as unfiltered or unbiased as possible.

Mark Donnigan: 02:21 Why don’t you give us a highlight reel, so to speak, of what you’ve done in the industry and, even more specifically, what are you working on today?

Tim Siglin: 02:31 Sure. I have been in streaming now for a little over 20 years. In fact, when Eric Schumacher-Rasmussen came on as the editor at StreamingMedia.com, he said, “You seemed to be one of the few people who were there in the early days.” It’s true. I actually had the honor of writing the 10-year anniversary of Streaming Media articles for the magazine, and then did the 20-year last year.

Tim Siglin: 02:57 My background was Motion Picture production and then I got into video conferencing. As part of video conferencing, we were trying to figure out how to include hundreds of people in a video conference, but not need necessarily have them have two-way feedback. That’s where streaming sort of caught my eye, because, ultimately, for video conferencing we maybe needed 10 subject matter experts who would talk back and forth, and together a hundred, then it went to thousands, and now hundreds of thousands. You can listen in and use something like chat or polling to provide feedback.

Tim Siglin: 03:31 For me, the industry went from the early revolutionary days of “Hey, let’s change everything. Let’s get rid of TV. Let’s do broadcast across IP.” That was the mantra in the early days. Now, of course, where we are is sort of, I would say, two-thirds of the way there, and we can talk a little bit about that later. The reality is that the old mediums are actually morphing to allow themselves to do heap, which is good, to compete with over the top.

Tim Siglin: 04:01 Ultimately, what I think we’ll find, especially when we get to pure IP broadcast with ATSC 3.0 and some other things for over-the-air, is that we will have more mediums to consume on rather than fewer. I remember the early format ways and of course we’re going to talk some in this episode about some of the newer codec like HEVC. Ultimately, it seems like the industry goes through the cycles of player wars, format wars, browser wars, operating system wars, and we hit brief periods of stability which we’ve done with AVC or H.264 over the last probably eight years.

Tim Siglin: 04:46 Then somebody wants to stir the pot, figure out how to either do it better, less expensively, faster. We go back into a cycle of trying to decide what the next big thing will be. In terms of what I’m working on now, because I’ve been in the industry for almost 21 years. Last year, I helped start a not-for-profit called Help Me Stream, which focuses on working with NGOs in emerging economies, trying to help them actually get into the streaming game to get their critical messages out.

Tim Siglin: 05:18 That might be emerging economies like African economies, South America, and just the idea that we in the first world have streaming down cold, but there are a lot of messages that need to get out in emerging economies and emerging markets that they don’t necessarily have the expertise to do. My work is to tie experts here with need there and figure out which technologies and services would be the most appropriate and most cost effective.

Mark Donnigan: 05:46 That’s fascinating, Tim.

Tim Siglin: 05:48 The other thing I’m working on here, just briefly, is we’re getting ready for the Streaming Media Sourcebook, the 2019 sourcebook. I’m having to step back for the next 15 days, take a really wide look at the industry and figure out what the state of affairs are.

Dror Gill: 06:06 That’s wonderful. I think because this is exactly the right point, is one you end and the other one begins, kind of to summarize where we’ve been in 2018, what is the state of the industry and the fact that you’re doing that for the sourcebook, I think, ties in very nicely with our desire to hear from you an overview of what were the major milestones or advancements that were made in the streaming industry in 2018, and then looking into next year.

Dror Gill: 06:39 Obviously, the move to IP, getting stronger and stronger, now the third phase after analog and digital, now we have broadcast over IP. It’s interesting what you said about broadcasters not giving up the first with the pure OTT content providers. They have a huge business. They need to keep their subscribers and lower their churn and keep people from cutting the cord, so to speak.

Dror Gill: 07:04 The telcos and the cable companies still need to provide the infrastructure for Internet on top of which the over-the-top providers and their content, but they still need to have more offering and television and VLD content in order to keep their subscribers. It’s very interesting to hear how they’re doing it and how they are upgrading themselves to the era of IP.

Tim Siglin: 07:30 I think, Dror, you hit a really major point, which is we, the heavy lift … I just finished an article in ATSC 3.0 where I talk about using 2019 to prepare for 2020 when that will go live in the U.S. The heavy lift was the analog to digital conversion. The slightly easier lift is the conversion from digital to IP, but it still requires significant infrastructure upgrade and even transmission equipment to be able to do it correctly for the over-the-year broadcasters and cable.

Dror Gill: 08:07 That’s right. I think on the other hand, there is one big advantage to broadcast, even broadcast over-the-air. That is the ability to actually broadcast, the ability to reach millions, tens of millions, hundreds of millions of people over a single channel that everybody is receiving. Whereas, because of historic reasons and legacy reasons in IP, we are limited, still, when you broadcast to the end user to doing everything over unicast. When you do this, it creates a tremendous load on your network. You need to manage your CDNs.

Dror Gill: 08:46 I think we’ve witnessed in 2018 on one hand very large events being streamed to our record audience. But, on the other hand, some of them really failed in terms of user experience. It wasn’t what they expected because of the high volume of users, and because more and more people have discovered the ability to stream things over IP to their televisions and mobile devices. Can you share with us some of the experience that you have, some of the things that you’re hearing about in terms of these big events where they had failures and what were the reasons for those failures?

Tim Siglin: 09:30 I want to reiterate the point you made on the OTA broadcast. It’s almost as if you have read the advanced copy of my article, which I know you haven’t because it’s only gone to the editor.

Dror Gill: 09:42 I don’t have any inside information. I have to say, even though we are the Video Insiders.

Mark Donnigan: 09:47 We are the Video Insiders. That’s right.

Dror Gill: 09:49 We are the Video Insiders, but …

Mark Donnigan: 09:49 But no inside information here.

Dror Gill: 09:51 No inside information. I did not steal that copy.

Tim Siglin: 09:55 What I point out in that article, Dror, I think which will come out in January shortly after CES is basically this. We have done a good job in the streaming industry, the OTT space of pushing the traditional mediums to upgrade themselves. One of the things as you say with OTA, that ability to do essentially a multicast from a tower wirelessly is a really, really good thing, because to get us to scale, and I think about things like the World Cup, the Olympics and even the presidential funeral that’s happened here in December, there are large-scale events that we in the OTT space just can’t handle, if you’re talking about having to build the capacity.

Tim Siglin: 10:39 The irony is, one good ATSC transmission tower could hit as many people as we could handle essentially globally with the unicast (OTT) model. If you look at things like that and then you look at things like EMBMS in the mobile world, where there is that attempt to do essentially a multicast, and it goes to points like the World Cup. I think one of the horror stories in the World Cup was in Australia. There was a mobile provider named Optus who won the rights to actually do half of the World Cup preliminary games. In the first several days, they were so overwhelmed by the number of users who wanted to watch and were watching, as you say, in a unicast model that they ended up having to go back to the company they had bid against who had the other half of the preliminaries and ask them to carry those on traditional television.

Tim Siglin: 11:41 The CEO admitted that it was such a spectacular failure that it damaged the brand of the mobile provider. Instead of the name Optus being used, everybody was referring to it as “Floptus.” You don’t want your brand being known as the butt of jokes for an event that only happens once every four years that you have a number of devotees in your market. And heaven forbid, it had been the World Cup for cricket, there would have been riots in the street in Sydney and Melbourne. Thank goodness it was Australia with soccer as opposed to Australia with cricket.

Tim Siglin: 12:18 It brings home the point that we talk about scale, but it’s really hard to get to scale in a unicast environment. The other event, this one happened, I believe, in late 2017, was the Mayweather fight that was a large pay-per-view event that was streamed. It turned out the problem there wasn’t as much the streams as it was the authentication servers were overwhelmed in the first five minutes of the fight. So, with authentication gone, it took down the ability to actually watch the stream.

Tim Siglin: 12:53 For us, it’s not just about the video portion of it, it’s actually about the total ecosystem and who you’re delivering to, whether you’re going to force caps into place because you know you can’t go beyond a certain capacity, or whether you’re going to have to partner up with traditional media like cable service providers or over-the-air broadcasters.

Mark Donnigan: 13:14 It’s a really good point, Tim. In the World Cup, the coverage that I saw, it was more of, I’d almost say or use the phrase, dashed expectations. Consumers, they were able to watch it. In most cases, I think it played smoothly. In other words, the video was there, but HDR signaling didn’t work or didn’t work right. Then it looked odd on some televisions or …

Tim Siglin: 13:40 In high frame rate …

Tim Siglin: 13:43 20 frames a second instead of 60 frames a second.

Mark Donnigan: 13:48 Exactly. What’s interesting to me is that, what I see is, the consumer, they’re not of course walking around thinking as we are, like frame rate and color space and resolution. They are getting increasingly sensitive to where they can look at video now and say, “That’s good video,” or “That doesn’t look right to me.” I know we were talking before we started recording about this latest Tom Cruise public service announcement, which is just super fascinating, because it …

Tim Siglin: 14:24 To hear him say motion interpolation.

Mark Donnigan: 14:26 Yeah. Maybe we should tell the audience, for those, since it literally just came out I think today, even. But you want to tell the audience what Tom Cruise is saying?

Tim Siglin: 14:38 Essentially, Tom Cruise was on the set of Top Gun, as they’re shooting Top Gun. Another gentleman did a brief PSA for about a minute asking people to turn off motion interpolation on their televisions, which motion interpolation essentially takes a 24-frame per second and converts it to 30 frames per second by adding phantom frames in the middle. Because Mission Impossible: Fallout is just being released for streaming, Cruise was concerned and obviously others were concerned that some of the scenes would not look nearly as good with motion interpolation turned on.

Tim Siglin: 15:17 I think, Mark, we ought to go to a PSA model, asking for very particular things like, “How do you turn HDR on? How do you …” Those types of things, because those get attention in a way that you and I or a video engineer can’t get that attention.

Dror Gill: 15:33 How do you know if what you’re getting is actually 4K or interpolate HD, for example?

Tim Siglin: 15:38 Especially in our part of the industry, because we will call something OTT 4K streaming. That may mean that it fits in a 4K frame, but it doesn’t necessarily mean it’s that number of pixels being delivered.

Dror Gill: 15:52 It can also mean that the top layer in your adaptive bit rate stream is 4K, but then if you don’t have enough bandwidth, you’re actually getting the HD layer or even lower.

Tim Siglin: 16:01 Exactly.

Dror Gill: 16:02 Even though it is a 4K broadcast and it is 4K content. Sometimes, you can be disappointed by that fact as well.

Mark Donnigan: 16:11 I have to give a very, very funny story directly related, and this happened probably, I don’t know, maybe, at least 18 months ago, maybe two years ago. I’m sitting on an airplane next to this guy. It’s the usual five-minute, get acquainted before we both turn on our computers. Anyway, when someone asks, “What do you do?” I generally just say, “I work for a video software company,” because how do you explain digital encoding? Most people just sort of stop at that, and don’t really ask more.

Mark Donnigan: 16:44 But this guy is like, “Oh, really?” He said, “So, I just bought a 4K TV and I love it.” He was raving about his new Samsung TV. Of course, he figured I’m a video guy. I would appreciate that. I said, “Hey.” “So, you must subscribe to Netflix.” “Yes. Yes, of course,” he says. I said, “What do you think of the Netflix quality? It looks great, doesn’t it?”

Mark Donnigan: 17:10 He sort of hem and hawed. He’s like, “Well, it really … I mean, yeah. Yeah, it looks great, but it’s not quite … I’m just not sure.” Then, I said, “I’m going to ask you two questions. First of all, are you subscribed to the 4K plan?” He was. Then I said, “How fast is your Internet at home.” He’s like, “I just have the minimum. I don’t know. I think it’s the 20 megabit package,” or whatever it was. I don’t remember the numbers.

Mark Donnigan: 17:38 I said, “There’s this thing.” And I gave him like a 30-second primer on adaptive bit rate, and I said, “It is possible, I have no idea of your situation, that you might be watching the HD version.” Anyway, he’s like, “Hah, that’s interesting.” I connect with the guy on LinkedIn. Three days later, I get this message. He says, “I just upgraded my Internet. I now have 4K on my TV. It looks awesome.”

Mark Donnigan: 18:04 On one hand, the whole situation was not surprising and, yet, how many thousands, tens of thousands, maybe millions of people are in the exact same boat? They’ve got this beautiful TV. It could be because they’re running some low-end router in the house. It could be they truly have a low end bandwidth package. There could be a lot of reasons why they’re not getting the bandwidth. They’re so excited about their 4K TV. They’re paying Netflix to get the top layer, the best quality, and they’re not even seeing it. It’s such a pity.

Tim Siglin: 18:37 I had a TSA agent asked me that same question, Mark, when I came through customs. I’m like, “Sure. I’ll stand here and answer that question for you.” The router was actually what I suggested that he upgrade, because he said his router was like this (old unit).

Mark Donnigan: 18:53 In a lot of homes, it’s a router that’s 15 years old and it just isn’t (up to the task).

Tim Siglin: 18:58 But it brings out the point that even as we’re talking about newer codecs and better quality, even if we get a lower sweet spot in terms of 4K content (streaming bandwidth), or as we found in the survey that we worked on together, that using HEVC for 1080p or 720p, if the routers, if the software in the chain is not updated, the delivery quality will suffer in a way that people who have a tuned television and seen the consistent quality aren’t certain what to do to fix when they use an over-the-top service.

Tim Siglin: 19:34 I think this is a key for 2019. As we prepare for ATSC 3.0 on over-the-air broadcast where people will be able to see pristine 4K, it will actually force those of us in the OTT space to up our game to make sure that we’re figuring out how to deliver across these multiple steps in a process that we don’t break.

Dror Gill: 19:54 You really see ATSC 3.0 as a game-changer in 2019?

Tim Siglin: 19:59 What I see it as is the response from the broadcast industry to, A) say that they’re still relevant, which I think is a good political move. And, B) it provides the scale you were talking about, Dror. See, I think what it does is it at least puts us in the OTT space on notice that there will be in certain first world countries a really decent quality delivery free of charge with commercials over the air.

Tim Siglin: 20:31 It takes me back to the early days of video compression when, if you had a good class-one engineer and an analog NTSC transmission system, they could give you really good quality if your TV was tuned correctly. It only meant having to tune your TV. It didn’t mean having to tune your router or having to tune your cable modem, having to tune your settings on your TV. I think that’s where the game-changer may be, is that those tuner cards, which will send HDR signaling and things like that with the actual transmission, are going to make it much easier for the consumer to consume quality in a free scenario. I think that part of it is a potential game-changer.

Mark Donnigan: 21:19 That’s interesting. Tim, we worked together earlier this year on a survey, an industry survey that I think it would be really, really interesting to listeners to talk about. Shall we pivot into that? Maybe you can share some of the findings there.

Tim Siglin: 21:38 Why don’t you take the lead on why Beamr wanted to do that? Then I’ll follow up with some of the points that we got out of it.

Mark Donnigan: 21:46 Obviously, we are a codec developer. It’s important for us to always be addressing the market the way that the market wants to be addressed, meaning that we’re developing technologies and solutions and standards that’s going to be adopted. Clearly, there has been, especially if we rewind a year ago or even 18 months ago, AV1 was just recently launched. There were still questions about VP9.

Mark Donnigan: 22:19 Obviously, H264 AVC is the standard, used everywhere. We felt, “Let’s go out to the industry. Let’s really find out what the attitudes are, what the thinking is, what’s going on ‘behind closed doors’ and find out what are people doing.” Are they building workflows for these new advanced codecs? How are they going to build those workflows? That was the impetus, if you will, for it.

Mark Donnigan: 22:49 We are very happy, Tim, to work with you on that and of course Streaming Media assisted us with promoting it. That was the reason we did it. I know there were some findings that were pretty predictable, shall we say, no surprises, but there were some things that I think were maybe a little more surprising. So, maybe if you like to share some of those.

Tim Siglin: 23:12 Yeah. I’ll hit the highlights on that. Let me say too that one of the things that I really like about this particular survey, there was another survey that had gone on right around that time that essentially was, “Are you going to adopt HEVC?” What we took the approach on with this survey was to say, “Okay. Those of you who’ve already adopted HEVC, what are the lessons that we can learn from that?”

Tim Siglin: 23:36 We didn’t exclude those who were looking at AV1 or some of the other codes, even VP9, but we wanted to know those people who used HEVC. Were they using it in pilot projects? Were they thinking about it? Were they using it in actual production? What we found in the survey is that AVC, or H.264, was still clearly dominant in the industry, but that the ramp-up to HEVC was moving along much faster than at least I … I believed. Mark, I told you when we started the survey question creation, which was about a year ago and then launched it in early 2018, I expected we wouldn’t see a whole lot of people using HEVC in production.

Tim Siglin: 24:23 I was pleasantly surprised to say that I was wrong. In fact, I think you mentioned in our recent Streaming Media West interview that there was a statistic you gave about the number of households that could consume HEVC. Was it north of 50%?

Mark Donnigan: 24:40 Yeah, it’s more than 50%. What’s interesting about that number is that that actually came from a very large MSO. Of course, they have a very good understanding of what devices are on their network. They found that there was at least one device in at least 50% of their homes that could receive and decode, playback, HEVC. That’s about as real world as you can get.

Tim Siglin: 25:06 What was fascinating to me too in this study was, we asked open-ended questions, which is what I’ve done in the research projects for the last 25 years both the video conferencing and streaming. One of the questions we asked was, “Do you see HEVC as only a 4K solution or do you see it as an option for lower resolutions?” It turned out overwhelmingly, people said, “We not only see it for 4K. We see it for high-frame rate (HFR) 1080p, standard frame rate 1080p, with some HDR.”

Tim Siglin: 25:40 Not a majority, but a large number of respondents said they would even see it as a benefit at 720p. What that tells me is, because we had a large number of engineers, video engineers, and we also have people in business development who answer these questions, what it tells me is that companies know as we scale because of the unicast problem that Dror pointed out in the beginning that scaling with a codec that consumes more bandwidth is a good way to lose money, kind of like the joke that the way a rich man can lose money really fast is to invest in an airline.

Tim Siglin: 26:19 If indeed you get scale with AVC, you could find yourself with a really large bill. That look at HEVC is being not just for 4K, HDR, or high frame rate in the future, but also for 1080p with some HDR and high frame rate. It tells me that the codec itself or the promise of the codec itself was actually really good. What was even more fascinating to me was the number of companies that had AVC pipelines that were actually looking to integrate HEVC into those same production pipe.

Tim Siglin: 26:55 It was much easier from a process standpoint to integrate HEVC into an AVC pipeline, so in other words, H265 into H264 pipeline than it was to go out of house and look at something like AV1 or VP9, because the work that was done on HEVC builds on the benefits that were already in place in AVC. Of course, you got Apple who has HLS, HTTP Live Streaming, and a huge ecosystem in terms of iPhones and iPads, laptops and desktops supporting HEVC not just as a standard for video delivery, but also with the HEIC or HEIF image format, now having all of their devices shoot images using HEVC instead of JPEG. That in and of itself drives forward adoption of HEVC. I think you told me since that survey came out, probably now seven months ago, you all have continued to see the model of all-in HEVC adoption.

Dror Gill: 28:03 This is what we promote all the time. It’s kind of a movement. Are you all in HEVC or are you doing it just for 4K, just where you have to do it? We really believe in all-in HEVC. Actually, this week, I had an interesting discussion with one of our customers who is using our optimization product for VOD content, to reduce bit-rate of H.264 (streams). He said, “I want to have a product. I want to have a solution for reducing bit-rates on our live channels.”

Dror Gill: 28:32 So, I asked them, “Okay. Why don’t you just switch your codec to HEVC?” He said, “No, I can’t do that.” I said, “Why not?” He said, “You know compatibility and things like that.” I asked, “Okay. What are you using? What are you delivering to?” He said, “We have our own set-top boxes (STB), IP set-top boxes which we give out to our customers. Well, these are pretty new.” So, they support HEVC. I’m okay there. “Then we have an Apple TV app.” “Okay, Apple TV has a 4K version. So, it supports HEVC. All of the latest Apple TV devices have HEVC. That’s fine.” “Then we have smartphone apps, smart TV apps for Android TV and for the LG platform.”

Dror Gill: 29:15 Obviously, TV’s support 4K. So, I’m okay there. With delivering to mobile devices, all the high-end devices already support HEVC. He was making this estimate that around 50 to 60% of his viewers are using devices that are HEVC capable. Suddenly, he’s thinking, “Yeah, I can do that. I can go all in HEVC. I will continue, of course, to support H.264 for all of the devices that don’t support HEVC. But if I can save 50% of the bandwidth to 50 to 60% of my customers, that’s a very big savings.”

Mark Donnigan: 29:48 What’s interesting about this conversation, Dror, is first of all I’m pretty certain that the operator you’re talking with is different than the operator that I shared, found the exact same thing. This is a consistent theme, is that pretty much in developed parts of the world, it really is true that 50% or more of the users can today receive HEVC. This number is only growing. It’s not like it’s static It is just growing. Next year, I don’t know if that number will be 60% or 70%, but it’s going to be even bigger.

Mark Donnigan: 30:27 What’s fascinating is that, again, we’ve said earlier, that the consumer is getting just more aware of quality, and they’re getting more aware of when they’re being underserved. For operators who are serving to lowest common denominator, which is to say, AVC works across all my devices, and it’s true. AVC works on all the high-end devices equally well, but you’re under-serving a large and growing number of your users.

Mark Donnigan: 31:01 If your competitors are doing the same, then I guess you could say … well, “Who are they going to switch to?” But there are some fast-moving leaders in the space who are either planning or they’re shortly going to be offering better quality. They’re going to be extending HEVC into lower bit rates or lower resolutions, that is, and therefore lower bit rates, and the consumers are going to begin to see like, “Well, wait a second. This service over here that my friend has or we have another subscription in the household, how come the video looks better?” They just begin to migrate there. I think it’s really important when we have these sorts of conversations to connect to this idea that don’t underserve your consumer in an effort to be something to everybody.

Tim Siglin: 31:57 I would add two other quick things to that, Mark. One is, we’ve always had this conversation in the industry about the three-legged stool of speed, quality and bandwidth in terms of the encoding.

Mark Donnigan: 32:09 That’s right.

Tim Siglin: 32:09 Two of those are part of the consumer equation, which is quality and bandwidth. Then, oftentimes, we’ve had to make the decision between quality and bandwidth. If the argument is ostensibly that HEVC as it stands right now, had a couple years of optimization, can get us to about, let’s say, 40%. Let’s not even say 50%. For equivalent quality, it can get us to 40% bandwidth reduction. Why wouldn’t you switch over to something like that?

Tim Siglin: 32:39 Then the second part, and I have to put a plugin for what Eric Schumacher-Rasmussen and the Streaming Media team did at Streaming Media West by having Roger Pantos come and speak, Roger Pantos being of course the inventor of HLS, and I’m not a huge fan of HLS, just because of the latency issues. But he pointed out in his presentation, his tutorial around HLS that you can put two different codecs in a manifest file. There is absolutely no reason that an OTT provider could not provide both HEVC and AVC within the same manifest file and then allow the consumer device to choose.

Tim Siglin: 33:22 When Dror mentioned the company who has the OTT boxes that they give away, they could easily set a flag in those boxes to say, “If you’re presented with a manifest file that has AVC and HEVC, go with HEVC to lower the bandwidth, overall.” The beauty is it’s a technical issue at this point and it’s a technical implementation issue, not a ‘can we make it work?’ Because we know that it works based around the HLS.

Mark Donnigan: 33:54 This is excellent. Tim, let’s wrap this up, as I knew it would be. It has just been an awesome conversation. Thank you for sharing all your years of collective experience to give some insight into what’s happening in the industry. Let’s look at 2019. I know we’ve been talking a little bit about … you’ve made references to ATSC 3.0. Some of our listeners will be going to CES. Maybe there’s some things that they should be looking at or keeping their eyes opened for. What can you tell us about 2019?

Tim Siglin: 34:35 Here’s what I think 2019 is bringing. We have moved in the cloud computing space and you all are part of this conversation at Beamr. We’ve moved from having cloud-based solutions that were not at parity with on-premise solutions to actually in 2018 reaching parity between what you could do in an on-premise solution versus the cloud. Now, I think in 2019, what we’re going to start seeing is a number of features in cloud-based services, whether it’s machine learning, which the popular nomenclature is AI, but I really like machine learning as a much better descriptor, whether it’s machine learning, whether it’s real-time transcoding of live content, whether it’s the ability to simultaneously spit out AVC and HEVC like we’ve been talking about here that the cloud-based solutions will move beyond parity with the on-premise solutions.

Tim Siglin: 35:35 There always will be needs for the on-premise parts from a security standpoint in sort of the industries, but I don’t think that will inhibit cloud-based in 2019. If people are going to CES, one of the things to look at there, for instance, is a big leap in power consumption savings for mobile devices. I’m not necessarily talking about smartphones, because the research I’ve done says the moment you turn GPS on, you lose 25% of battery. Tablets have the potential to make a resurgence in a number of areas for consumers and I think we’ll see some advances in battery (capacity).

Tim Siglin: 36:19 Part of that goes to HEVC, which as we know is a much harder codec to decode. I think the consumer companies are being forced into thinking about power consumption as HEVC becomes more mainstream. That’s something I think people should pay attention to as well. Then, finally, HDR and surround sound solutions, especially object placement like Dolby Atmos and some of these others, will become much more mainstream as a way to sell flat panels and surround sound systems.

Tim Siglin: 36:56 We sort of languished in that space. 4K prices have dropped dramatically in the last two years, but we’re not yet ready for 8K. But I think we’ll see a trend toward fixing some of the audio problems. In the streaming space, to fix those audio problems, we need to be able to encode and encapsulate into sort of the standard surround sound model. Those are three areas that I would suggest people pay attention.

Mark Donnigan: 37:25 Well, thank you for joining us, Tim. It’s really great to have you on. We’ll definitely do this again. We want to thank you, the listener, for supporting the Video Insiders. Until the next episode. Happy encoding!

Announcer: 37:39 Thank you for listening to the Video Insiders Podcast, a production of Beamr Imaging Limited. To begin using Beamr’s codecs today, go to Beamr.com/free to receive up to 100 hours of no cost HEVC and H.264 transcoding every month.

AVC VP9 vvc

The Future of 3 Character Codecs. [podcast]

Posted on December 15, 2018by Dror Gill

Anyone familiar with the streaming video industry knows that we love our acronyms. You would be hard-pressed to have a conversation about the online video industry without bringing one up…

In today’s episode, The Video Insiders focus on the future of three-character codecs: AVC, VP9, and VVC.

But before we can look at the future, we have to take a moment to revisit the past.

The year 2018 marks the 15-year anniversary of AVC and in this episode, we visit the process and lifecycle of standardization to adoption and what that means for the future of these codecs.

Tune in to Episode 03: The Future of 3 Character Codecs or watch video below.

https://youtu.be/TmDFpmtnbU8

Want to join the conversation?

Reach out to TheVideoInsiders@beamr.com.

TRANSCRIPTION (lightly edited for improved readability)

Mark Donnigan: 00:49 Well, Hi, Dror!

Dror Gill: 00:50 Is this really episode three?

Mark Donnigan: 00:52 It is, it is episode three. So, today we have a really exciting discussion as we consider the future of codecs named with three characters.

Dror Gill: 01:03 Three character codecs, okay, let’s see.

Mark Donnigan: 01:06 Three character codecs.

Dror Gill: 01:09 I can think of …

Mark Donnigan: 01:09 How many can you name?

Dror Gill: 01:10 Let’s see, that’s today’s trivia question. I can think of AVC, VP9, AV1, and VVC?

Mark Donnigan: 01:21 Well, you just named three that I was thinking about and we’re gonna discuss today! We’ve already covered AV1. Yeah, yeah, you answered correctly, but we haven’t really considered where AVC, VP9, and VVC fit into the codec stew. So when I think about AVC, I’m almost tempted to just skip it because isn’t this codec standard old news? I mean, c’mon. The entire video infrastructure of the internet is enabled by AVC, so what is there to discuss?

Dror Gill: 01:57 Yeah. You’re right. It’s like the default, but in fact, the interesting thing is that today, we’re (in) 2018 and this is the twenty year anniversary of AVC. I mean, ITU issued the call for proposals, their video coding expert group, issued the call for proposal for a project. At the time was called H26L, and their target was to double the coding efficiency, which effectively means halving the bit rate necessary for given level of fidelity. And that’s why it was called H26L, it was supposed to be low bit rate.

Mark Donnigan: 02:33 Ah! That’s an interesting trivia question.

Dror Gill: 02:35 That’s where the L came from!

Mark Donnigan: 02:36 I wonder how many of our listeners knew that? That’s kind of cool. H26L.

Dror Gill: 02:42 But they didn’t go alone. It was the first time they joined forces in 2001 with the ISO MPEG, that’s the same Motion Pictures Experts Group, you know we discussed in the first episode.

Mark Donnigan: 02:56 That’s right.

Dror Gill: 02:57 And they came together, they joined forced, and they created JVT, that was the Joint Video Team, and I think it’s a great example of collaboration. There are standards by dealing with video communication standards, and ISO MPEG, which is a standards body dealing with video entertainment standards. So, finally they understood that there’s no point in developing video standards for these two different types of applications, so they got all the experts together in the JVT and this group developed what was the best video compression standard at the time. It was launched May 30, 2003.

Mark Donnigan: 03:35 Wow.

Dror Gill: 03:36 There was one drawback with this collaboration in that the video standard was known by two names. There was the ITU name which is H.264. And then there’s the ISO MPEG name which is AVC, so these created some confusion at the start. I think by now, most of our listeners know that H.264 and AVC are two of the same.

Mark Donnigan: 03:57 Yeah, definitely. So, AVC was developed 15 years ago and it’s still around today.

Dror Gill: 04:02 Yeah, yeah. I mean, that’s really impressive and it’s not only around, it’s the most popular video compression standard in the world today. I mean, AVC is used to deliver video over the internet, to computers, televisions, mobile devices, cable, satellite, broadcast, and even blu-ray disks. This just shows you how long it takes from standardization to adoption, right? 15 years until we get this mass market adoption market dominance of H.264, AVC as we have today.

Dror Gill: 04:31 And the reason it takes so long is that, we discussed it in our first episode, first you need to develop the standard. Then, you need to develop the chips that support the standard, then you need to develop devices that incorporate the chip. Even when initial implementation of the codec got released, they are still not as efficient as they can be, and it takes codec developers more time to refine it and improve the performance and the quality. You need to develop the tools, all of that takes time.

Mark Donnigan: 04:59 It does. Yeah, I have a background in consumer electronics and because of that I know for certainty that AVC is gonna be with us for a while and I’ll explain why. It’s really simple. Decoding of H.264 is fully supported in every chip set on the market. I mean literally every chip set. There is not a device that supports video which does not also support AVC today. It just doesn’t exist, you can’t find it anywhere.

Mark Donnigan: 05:26 And then when you look at in coding technologies for AVC, H.264, (they) have advanced to the point where you can really achieve state of the art for very low cost. There’s just too much market momentum where the encode and decode ecosystems are just massive. When you think about entertainment applications and consumer electronics, for a lot of us, that’s our primary market (that) we play in.

Mark Donnigan: 05:51 But, if you consider the surveillance and the industrial markets, which are absolutely massive, and all of these security cameras you see literally everywhere. Drone cameras, they all have AVC encoders in them. Bottom line, AVC isn’t going anywhere fast.

Dror Gill: 06:09 You’re right, I totally agree with that. It’s dominant, but it’s still here to stay. The problem is that, we talked about this, video delivery over the internet. The big problem is the bandwidth bottleneck. With so much video being delivered over the internet, and then the demand for quality is growing. People want higher resolution, they want HDR which is high dynamic range, they want higher frame rate. And all this means you need more and more bit rate to represent the video. The bit rate efficiency that is required today is beyond the standard in coding in AVC and that’s where you need external technologies such as content adaptive encoding perceptual optimization that will really help you push AVC to its limits.

Mark Donnigan: 06:54 Yeah. And Dror, I know you’re one of the inventors of a perceptual optimization technique based on a really unique quality measure, which I’ve heard some in the industry believe could even extend the life of AVC from a bit rate efficiency perspective. Tell us about what you developed and what you worked on.

Dror Gill: 07:13 Yeah, that’s right. I did have some part in this. We developed a quality measure and a whole application around it, and this is a solution that can reduce the bit rate of AVC by 30%, sometimes even 40%. It doesn’t get us exactly to where HEVC starts, 50% is pretty difficult and not for every content (type). But content distributors that recognize AVC will still be part of their codec mix for at least five years, I think what we’ve been able to do can really be helpful and a welcome relief to this bandwidth bottleneck issue.

Mark Donnigan: 07:52 It sounds like we’re in agreement that for at least the midterm horizon, the medium horizon, AVC is gonna stay with us.

Dror Gill: 08:01 Yeah, yeah. I definitely think so. For some applications and services and certain regions of the world where the device penetration of the latest, high end models is not as high as in other parts, AVC will be the primary codec for some time to come.

Dror Gill: 08:21 Okay, that’s AVC. Now, let’s talk about VP9.

Mark Donnigan: 08:24 Yes, let’s do that.

Dror Gill: 08:25 It’s interesting to me, essentially, it’s mostly a YouTube codec. It’s not a bad coded, it has some efficiency advantages over AVC, but outside of Google, you don’t see any large scale deployments. By the way, if you look at Wikipedia, you read about the section that says where is VP9 used, it says VP9 is used mostly by YouTube, some uses by Netflix, and it’s being used by Wikipedia.

Mark Donnigan: 08:50 VP9 is supported fairly well in devices. Though, it’s obviously hard to say exactly what the penetration is, I think there is support in hardware for decode for VP9. Certainly it’s ubiquitous on Android, and it’s in many of the UHD TV chip sets as well. So, it’s not always enabled, but again, from my background on the hardware side, I know that many of those SOCs, they do have a VP9 decoder built into them.

Mark Donnigan: 09:23 I guess the question in my mind is, it’s talked about. Certainly Google is a notable both developer and user, but why hasn’t it been adopted?

Dror Gill: 09:33 Well, I think there are several issues here. One of them is compression efficiency. VP9 brings maybe 20, 30% improvement in compression efficiency over AVC, but it’s not 50%. So, you’re not doubling your compression efficiency. If you want to replace the codec, that’s really a big deal. That’s really a huge investment. You need to invest in coding infrastructure, new players. You need to do compatibility testing. You need to make sure that your packaging and your DRM work correctly and all of that.

Dror Gill: 10:04 You really want to get a huge benefit to offset this investment. I think people are really looking for that 50% improvement, to double the efficiency, which is what you get with HEVC but not quite with VP9. I think the second point is that VP9, even though it’s an open source coder, it’s developed and the standard is maintained by Google. And some industry players are kind of afraid of the dominance of Google. Google has taken over the advertising market online.

Mark Donnigan: 10:32 Yes, that’s a good point.

Dror Gill: 10:34 You know, and search and mobile operating systems, except Apple, it’s all Android. So, those industry players might be thinking, I don’t want to depend on Google for my video compression format. I think this is especially true for traditional broadcasters. Cable companies, satellite companies, TV channels that broadcast over the air. These companies traditionally like to go with established, international standards. Compression technologies that are standardized, they have the seal of approval by ITU and ISO.

Dror Gill: 11:05 They’re typically following that traditional codec developer past. ISO MPEG too, now it’s AVC, starting with HEVC. What’s coming next?

Mark Donnigan: 11:16 Well, our next three letter codec is VVC. Tell us about VVC, Dror.

Dror Gill: 11:21 Yeah, yeah, VVC. I think this is another great example of collaboration between ITU and ISO. Again, they formed a joint video experts team. This time it’s called JVET.

Dror Gill: 12:10 So, JVET has launched a project to develop a new video coding standard. And you know, we had AVC that was advanced video coding. Then we had HEVC which is high efficiency video coding. So, they thought, what would be the next generation? It’s already advanced, it’s high efficiency. So, the next one, they called it VVC, which is versatile video code. The objective of VVC is obviously to provide a significant improvement in compression efficiency over the existing HEVC standard. Development already started. The JVET group is meeting every few in months in some exotic place in the world and this process will continue. They plan to complete it before the end of 2020. So, essentially in the next two years they are gonna complete the standard.

Dror Gill: 13:01 Today, already, even though VVC is in early development and they haven’t implemented all the tools, they already report a 30% better compression efficiency than HEVC. So, we have high hopes that we’ll be able to fight the video tsunami that is coming upon us with a much improved standard video coder which is VVC. I mean, its improved at least on the technical side and I understand that they also want to improve the process, right?

Mark Donnigan: 13:29 That’s right, that’s right. Well, technical capabilities are certainly important and we’re tracking of course VVC. 30% better efficiency this early in the game is promising. I wonder if the JVET will bring any learnings from the famous HEVC royalty debacles to VVC because I think what’s in everybody’s mind is, okay, great, this can be much more efficient, technically better. But if we have to go round and round on royalties again, it’s just gonna kill it. So, what do you think?

Dror Gill: 14:02 Yeah, that’s right. I think it’s absolutely true and many people in the industry have realized this, that you can’t just develop a video standard and then handle the patent and royalty issues later. Luckily some companies have come together and they formed an industry group called The Media Coding Industry Forum, or MC-IF. They held their first meeting a few weeks ago in Macau during empic meeting one through four. Their purpose statement, let me quote this from their website, and I’ll give you my interpretation of it. They say the media coding industry forum (MC-IF) is an open industry forum with a purpose of furthering the adoption of standards initially focusing on VVC, but establishing them as well accepted and widely used standards for the benefit of consumers and the industry.

Dror Gill: 14:47 My interpretation is that the group was formed in an effort for companies with interest in this next generation video codec to come together and attempt to influence the licensing policy of VVC and try to agree on a reasonable patent licensing policy in advance to prevent history from repeating itself. We don’t want that whole Hollywood story with the tragedy that took a few years until they reached the happy ending. So, what are you even talking about? This is very interesting. They’re talking about having a modular structure for the codec. These tools of the codecs, the features, can be plugged in and out, very easily.

Dror Gill: 15:23 So, if some company insists on reasonable licensing terms, this group can just decide not to support the feature and it will be very easily removed from the standard, or at least from the way that companies implement that standard.

Mark Donnigan: 15:37 That’s an interesting approach. I wonder how technically feasible it is. I think we’ll get into that in some other episodes.

Dror Gill: 15:46 Yeah. That may have some effect on performance.

Mark Donnigan: 15:49 Exactly. And again, are we back in the situation that the Alliance for Open Media is in with AV1. Where part of the issue of the slow performance is trying to work around patents. At the end of the day you end up with a solution that is hobbled technically.

Dror Gill: 16:10 Yeah. I hope it doesn’t go there.

Mark Donnigan: 16:13 Yeah, I hope we’re not there. I think you heard this too, hasn’t Apple joined the consortium recently?

Dror Gill: 16:21 Yeah, yeah, they did. They joined silently as they always do. Silently means that one day somebody discovers their logo… They don’t make any announcement or anything. You just see a logo on the website, and then oh, okay.

Mark Donnigan: 16:34 Apple is in the building.

Mark Donnigan: 16:41 You know, maybe it’s good to kind of bring this discussion back to Earth and close out our three part series by giving the listeners some pointers. About how they should be thinking about the next codec that they adopt. I’ve been giving some thought as we’ve been doing these episodes. I think I’ll kick it off here Dror if you don’t mind, I’ll share some of my thoughts. You can jump in.

Mark Donnigan: 17:11 These are complex decisions of course. I completely agree, billing this as codec wars and codec battles, it’s not helpful at the end of the day. Maybe it makes for a catchy headline, but it’s not helpful. There’s real business decisions (to be made). There are technical decisions. I think a good place to start for somebody who’s listening and saying “okay great, I now have a better understanding of the lay of the land of HEVC, for AV1, I can understand VP9, I can understand AVC and what some of my options are to even further reduce bit rate. But now, what do I do?”

Mark Donnigan: 17:54 And I think a good place to start is to just look at your customers, and do they lean towards early adopters. Are you in a strong economic environment, which is to say quite frankly, do most of your customers carry around the latest devices? Like an iPhone X, or Galaxy 9. If largely your customers lean towards early adopter and they’re carrying around the latest devices, then you have an obligation to serve them with the highest quality and the best performance possible.

Dror Gill: 18:26 Right. If your customers can receive HEVC, and it’s half the bit rate, then why not deliver it to them better quality, or say when you see the end cost with this more efficient codec and everybody is happy.

Mark Donnigan: 18:37 Absolutely, and again, I think just using pure logic. If somebody could afford a more than $1000 device in their pocket, probably the TV hanging on the wall is a very new, UHD capable (one). They probably have a game console in the house. The point is that you can make a pretty strong argument and an assumption that you can go, what I like to think of as all in HEVC including even standard definition, just SDR content.

Mark Donnigan: 19:11 So, the industry has really lost sight in my mind of the benefits of HEVC as they apply across the board to all resolutions. All of the major consumer streaming services are delivering 4K using HEVC, but I’m still shocked at how many, it’s kind of like oh, we forget that the same advantages of bit rate efficiency that work at 4K apply at 480p. Obviously, the absolute numbers are smaller because the file sizes are smaller, etc.

Mark Donnigan: 19:41 But the point is, 30, 40, 50% savings applies at 4K as it does at 480p. I understand there’s different applications in use cases, right? But would you agree with that?

Dror Gill: 19:55 Yeah, yeah, I surely agree with that. I mean, for 4K, HEVC is really an enabler.

Mark Donnigan: 20:00 That’s right.

Dror Gill: 20:01 For HEVC, you would need like 30, 40 megabits of video. Nobody can stream that to the home, but change it to 10, 15, that’s reasonable, and you must use HEVC for 4k otherwise it won’t even fit the pipe. But for all other resolutions, you get the bang with the advantage or you can trade it off for a quality advantage and deliver higher quality to your users, or higher frame rate, or enable HDR. If all of these possibilities that you can do with HD and even SD content, give them a better experience using HEVC and being able to stream on devices that your users already have. So yeah, I agree. I think it’s an excellent analysis. Obviously if you’re up in an emerging market, or your consumers don’t have high end devices, then AVC is a good solution. If there are network constraints, and there are many places in the world that network conductivity isn’t that great. Or in rural areas where we have very large parts of the population spread out (in these cases) bandwidth is low and you will get into a bottleneck even with HD.

Mark Donnigan: 21:05 That’s right.

Dror Gill: 21:06 That’s where perceptual optimization can help you reduce the bit rate even for AVC and keep within the constraints that you have. When your consumers can upgrade their devices and when the cycle comes in a few years when every device has HEVC support, then obviously you upgrade your capability and support HEVC across the board.

Mark Donnigan: 21:30 Yeah, that’s a very important point Dror, is that this HEVC adoption curve in terms of silicon, on devices. It is in full motion. Just the planning life cycles. If you look at what goes into hardware, and especially on the silicon side, it doesn’t happen that way. Once these technologies are in the designs, once they are in the dies, once the codec is in silicon, it doesn’t get arbitrarily turned on and off like light switches.

Mark Donnigan: 22:04 How should somebody be looking at VP9, VVC, and AV1?

Dror Gill: 22:13 Well, VP9 is an easy one. Unless you’re Google, you’re very likely gonna skip over this codec. Not just that the VP9 isn’t the viable choice, it simply doesn’t go so far as HEVC in terms of bit rate efficiency and quality. Maybe two years back we would consider it as an option for reducing bit rate, but now with the HEVC support that you have, there’s no point in going to VP9. You might as well go to HEVC. If you talk about VVC, (the) standard is still a few years from being ratified so, we actually don’t have anything to talk about.

Dror Gill: 22:49 The important point is again to remember, even when VVC launches, it will still be another 2 to 3 years after ratifying the standard before you have even a very basic playback ecosystem in place. So, I would tell our listeners if you’re thinking, why should I adopt HEVC, because VVC is just around the corner, well, that corner is very far. It’s more like the corner of the Earth than the corner of the next block.

Mark Donnigan: 23:15 That’s right.

Dror Gill: 23:18 So, HEVC today, VVC will be the next step in a few years. And then there’s AV1. You know, we talked a lot about AV1. No doubt, AV1 has support from huge companies. I mean Google, Facebook, Intel, Netflix, Microsoft. And those engineers, they know what they’re doing. But now, it’s quite clear that compression efficiency is the same as HEVC. Meanwhile, after removing other royalty cost for content delivery, HEVC Advance removed it. The license situation is much more clear now. You add to this the fact that at the end of the day, two to three years, you’re gonna need five to ten times more compute power to encode AV1, reaching effectively the same result. Now Google, again. Google may be that they have unlimited compute resources, they will use it. They developed it.

Dror Gill: 24:13 The smaller content providers, all the other ones, the non Googles of the world and other broadcasters with growing support for HEVC that we expect in a few years. I think it’s obvious. They’re gonna support HEVC and then a few years later when VVC is ratified, when it’s supported in devices, they’re gonna move to VVC. Because this codec does have the required compression efficiency improvement over HEVC.

Mark Donnigan: 24:39 Yeah, that’s an excellent summary Dror. Thank you for breaking this all down for our listeners so succinctly. I’m sure this is really gonna provide massive value. I want to thank our amazing audience because without you, the Video Insiders Podcast would just be Dror and me taking up bits on a server somewhere.

Dror Gill: 24:59 Yeah, talking to ourselves.

Mark Donnigan: 25:01 As you can tell, video is really exciting to us and so we’re so happy that you’ve joined us to listen. And again, this has been a production of Beamr Imaging Limited. Please, subscribe on iTunes and if you would like to try out beamer codecs in your lab or your production environment, we are giving away up to $100 of HEVC and H264 in coding every month. That’s each and every month. Just go to https://beamer.com/free and get started immediately.