There are several different video codecs available today for video streaming applications, and more will be released this year. This creates some confusion for video services who need to select their codec of choice for delivering content to their users at the best quality and lowest bitrate, also taking into account the encode compute requirements. For many years, the choice of video codecs was quite simple to make: Starting from MPEG-2 (H.262) when it took over digital TV in the late 90s, through MPEG-4 part 2 (H.263) dominating video conferencing early in the millennia and followed by MPEG4 part 10 or AVC (H.264) which has been enjoying significant market share for many years now in most video applications and markets including delivery, conferencing and surveillance. Simultaneously, Google’s natural choice for YouTube was their own video codec, VP9.
While HEVC, ratified in 2013, seemingly offered the next logical step, royalty issues put a major stick in its wheels. Add to this the concern over increased complexity, and delay in 4K adoption which was assumed to be the main use case for HEVC, and you get quite a grim picture. This situation triggered a strong desire in the industry to create an independent, royalty free, codec. Significantly reduced timelines in release of new video codec standards were thrown onto this fire and we find ourselves somewhat like Alice in Wonderland: signs leading us forward in various directions – but which do we follow?
Let’s begin by presenting our contenders for the “codec with significant market share in future video applications” competition:
We will not discuss LC-EVC (MPEG-5 Part 2), as it is a codec add-on rather than an alternative stand-alone video codec. If you want to learn more about it, https://lcevc.com/ is a good place to start.
If you are hoping that we will crown a single winner in this article – sorry to disappoint: It is becoming apparent that we are not headed towards a situation of one codec to rule them all. What we will do is provide information, highlight some features of each of the codecs, share some insights and opinions and hopefully help arm you for the ongoing codec wars.
The first point of comparison we will address is the origin, where each codec is coming from and what that implies. To date, most of the widely adopted video codecs have been standards created by the Joint Video Expert Team combing the efforts of the ITU-T Video Coding Expert Group (VCEG) and the ISO Moving Picture Experts Group (MPEG) to create joint standards. AVC and HEVC were born through this process, which involves clear procedures, from the CfP (Call for Proposals), through teams performing evaluation of the compression efficiency and performance requirements of each proposed tool, and up to creating a draft of the proposed standard. A few rounds of editing and fixes yields a final draft which is ratified to provide the final standard. This process is very well organized and has a long and proven track record of resulting in stable and successful video codecs. AVC, HEVC and VVC are all codecs created in this manner.
The EVC codec is an exception in that it is coming only from MPEG, without the cooperation of ITU-T. This may be related to the ITU VCEG traditionally not being in favor of addressing royalty issues as part of the standardization process, while for the EVC standard, as we will see, this was a point of concern.
Another source for video codecs is specific companies. A particularly successful example is the VP9 codec, developed by Google as a successor to VP8, that was created by On2 technologies (later acquired by Google). In addition, some companies have tried to push open source, royalty free, proprietary codecs, such as Daala by Mozilla or Dirac by BBC Research.
A third source of codecs is when a consortium or group of several companies that works independently, outside of official international standards bodies such as the ISO or ITU. AV1 is the perfect example of such a codec, where multiple companies have joined forces through the Alliance for Open Media (AOM), to create a royalty-free open-source video coding format, specifically designed for video transmissions over the Internet. AOM founding members include Google (who contributed their VP9 technology), Microsoft, Amazon, Apple, Netflix, FB, Mozilla and others, along with classic “MPEG supporters” such as Cisco & Samsung. The AV1 encoder was built from ‘experiments’, where each considered tool was added into the reference software along with a toggle to turn the experiment on or off, allowing flexibility during the decision process as to which tools will be used for each of the eventual profiles.
An easy point of comparison between the codecs is the timeline. AVC was completed back in May 2003. HEVC was finalized almost 10 years later in April 2013. AV1 bitstream freeze was in March 2018, with validation in June of that year and Errata-1 published in January 2019. As of the 130th MPEG meeting in April 2020, VVC and EVC are both in Final Draft of International Standard (FDIS) stage, and are expected to be ratified this year.
The next point of comparison is the painful issue of royalties. Unless you have been living under a rock you are probably aware that this is a pivotal issue in the codec wars. AVC royalty issues are well resolved and a known and inexpensive licensing model is in place, but for HEVC the situation is more complex. While HEVC Advance unifies many of the patent holders for HEVC, and is constantly bringing more on-board, MPEG LA still represents some others. Velos Media unify yet more IP holders and a few are still unaffiliated and not taking part in any of these pools. Despite the pools finally publishing reasonable licensing models over the last couple of years (over five years after HEVC finalization), the industry is for the most part taking a ‘once bitten, twice shy’ approach to HEVC royalties with some concern over the possibility of other entities coming out of the woodwork with yet further IP claims.
AV1 was a direct attempt to resolve this royalty mess, by creating a royalty-free solution, backed by industry giants, and even creating a legal defense fund to assist smaller companies that may be sued regarding the technology they contributed. Despite AOM never promising to indemnify against third party infringement, this seemed to many pretty air-tight. That is until in early March Sisvel announced a patent pool of 14 companies that hold over 1000 patents, which Sisvel claim are essential for the implementation of AV1. About a month later, AOM released a counter statement declaring AOM’s dedication to a royalty-free media ecosystem. Time, and presumably quite a few lawyers, will determine how this particular battle plays out.
VVC initially seemed to be heading down the same IP road as HEVC: According to MPEG regulations, anyone contributing IP to the standard must sign a Fair, Reasonable And Non-Discriminatory (FRAND) licensing commitment. But, as experience shows, that does not guarantee convergence to applicable patent pools. This time however the industry has taken action in the form of the Media Coding Industry Forum (MC-IF), an open industry forum established in 2018, with the purpose of furthering the adoption of MPEG standards, initially focusing on VVC. Their goal is to establish them as well-accepted and widely used standards for the benefit of consumers and industry. One of the MC-IF work groups is working on defining “sub-profiles”, which include either royalty free tools or tools for which MC-IF are able to serve as a registration authority for all relevant IP licensing. If this effort succeeds, we may yet see royalty free or royalty known sub-profiles for VVC.
EVC is tackling the royalty issue directly within the standardization process, performed primarily by Samsung, Huawei and Qualcomm, using a combination of two approaches. For EVC-Baseline, only tools which can be shown to be royalty-free are being incorporated. This generally means the technologies are 20+ years old and have the publications to prove it. While this may sound like a rather problematic constraint, once you factor in the facts that AVC technology is all 20+ years old, and a lot of non IP infringing know-how has accumulated over these years, one can conceive that this codec can still significantly exceed AVC compression efficiency. For EVC-Main a royalty-known approach has been adopted, where any entity contributing IP is committed to provide a reasonably priced licensing model within two years of the FDIS, meaning by April 2022.
Now that we have dealt with the elephant in the room, we will highlight some codec features and see how the different codecs compare in this regard. All these codecs use a hybrid block-based coding approach, meaning the encode is performed by splitting the frame into blocks, performing a prediction of the block pixels, obtaining a residual as the difference between the prediction and the actual values, applying a frequency transform to the residual obtaining coefficients which are then quantized, and finally entropy coding those coefficients along with additional data, such as Motion Vectors used for prediction, resulting in the bitstream. A somewhat simplified diagram of such an encoder is shown in FIG 1.
FIGURE 1: HYBRID BLOCK BASED ENCODER
The underlying theme of the codec improvements is very much a “more is better” approach. More block sizes and sub-partitioning options, more prediction possibilities, more sizes and types of frequency transforms and more additional tools such as sophisticated in-loop deblocking filters.
We will begin with a look at the block or partitioning schemes supported. The MBs of AVC are always 16×16, CTUs in HEVC and EVC-Baseline are up to 64×64, While for EVC-Main, AV1 and VCC block sizes of up to 128×128 are supported. As block sizes grow larger, they enable efficient encoding of smooth textures in higher and higher resolutions.
Regarding partitioning, while in AVC we had fixed-size Macro-Blocks, in HEVC the Quad-Tree was introduced allowing the Coding-Tree-Unit to be recursively partitioned into four additional sub-blocks. The same scheme is also supported in EVC-Baseline. VVC added Binary Tree (2-way) and Ternary Tree (3-way) splits to the Quad-Tree, thus increasing the partitioning flexibility, as illustrated in the example partitioning in FIG 2. EVC-Main also uses a combined QT, BT, TT approach and in addition has a Split Unit Coding Order feature, which allows it to perform the processing and predictions of the sub-blocks in Right-to-Left order as well as the usual Left-to-Right order. AV1 uses a slightly different partitioning approach which supports up to 10-way splits of each coding block.
Another evolving aspect of partitions is the flexibility in their shape. The ability to split the blocks asymmetrically and along diagonals, can help isolate localized changes and create efficient and accurate partitions. This has two important advantages: The need for fine granularity of sub-partitioning is avoided, and two objects separated by a diagonal edge can be correctly represented without introducing a “staircase” effect. The wedges partitioning introduced in AV1 and the geometric partitioning of VVC both support diagonal partitions between two prediction areas, thus enabling very accurate partitioning.
FIGURE 2: Partitioning example combining QT (blue), TT (green) and BT (red)
A good quality prediction scheme which minimizes the residual energy is an important tool for increasing compression efficiency. All video codecs from AVC onwards employ both INTRA prediction, where the prediction is performed using pixels already encoded and reconstructed in the current frame, and INTER prediction, using pixels from previously encoded and reconstructed frames.
AVC supports 9 INTRA prediction modes, or directions in which the current block pixels can be predicted from the pixels adjacent to the block on the left, above and right-above. EVC-Baseline supports only 5 INTRA prediction modes, EVC- Main supports 33, HEVC defines 35 INTRA prediction modes, AV1 has 56 and VVC takes the cake with 65 angular predictions. While the “more is better” paradigm may improve compression efficiency, this directly impacts encoding complexity as it means the encoder has a more complex decision to make when choosing the optimal mode. AV1 and VVC add additional sophisticated options for INTRA prediction such as predicting Chroma from Luma in AV1, or the similar Cross-Component Linear Model prediction of VVC. Another interesting tool for Intra prediction is INTRA Block Copy (IBC) which allows copying of a full block from the already encoded and reconstructed part of the current frame, as the predictor for the current block. This mode is particularly beneficial for frames with complex synthetic texture, and is supported in AV1, EVC-Main and VVC. VVC also supports Multiple Reference Lines, where the number of pixels near the block used for INTRA prediction is extended.
The differences in INTER prediction are in the number of references used, Motion Vector (MV) resolution and associated sub-pel interpolation filters, supported motion partitioning and prediction modes. A thorough review of the various INTER prediction tools in each codec is well beyond the scope of this comparison, so we will just point out a few of the new features we are particularly fond of.
Overlapped Block Motion Compensation (OBMC), which was first introduced in Annex F of H.263 and in MPEG4 part 2 – but not included in any profile, is supported in AV1 and though considered for VVC, was not included in the final draft. This is an excellent tool for reducing those annoying discontinuities at prediction block borders when the block on either side uses a different MV.
FIGURE 3A: OBMC ILLUSTRATION. On the top is regular Motion Compensation which creates a discontinuity due to two adjacent blocks using different parts of reference frame for prediction, on the bottom OBMC with overlap between prediction blocks
FIGURE 3B: OBMC ILLUSTRATION. Zoom into OBMC for the border between middle and left shown blocks, showing the averaging of the two predictions at the crossover pixels.
One of the significant limitations of the block matching motion prediction approach, is its failure to represent motion that is not horizontal & vertical only, such as zoom or rotation. This is being addressed by support of warped motion compensation in AV1 and even more thoroughly with 6 Degrees-Of-Freedom (DOF) Affine Motion Compensation supported in VVC. EVC-main takes it a step further with 3 affine motion modes: merge, and both 4DOF and 6DOF Affine MC.
FIGURE 4: AFFINE MOTION PREDICTION
Image credit: Cordula Heithausen – Coding of Higher Order Motion Parameters for Video Compression – ISBN-13: 978-3844057843
Another thing video codecs do is MV (Motion Vector) prediction based on previously found MV values. This reduces bits associated with MV transmission, beneficial at aggressive bitrates and/or when using high granularity motion partitions. It can also help to make the motion estimation process more efficient. While all five codecs define a process for calculating the MV Predictor (MVP), EVC-Main extends this with a history-based MVP, and VVC takes it further with improved spatial and temporal MV prediction.
The frequency transforms applied to the residual data are another arena for the “more is better” approach. AVC uses 4×4 and 8×8 Discrete Cosine Transform (DCT), while EVC-Baseline adds more transform sizes ranging from 2×2 to 64×64. HEVC added the complementary Discrete Sine Transform (DST) and supports multi-size transforms ranging from 4×4 to 32×32. AV1, VVC and EVC-Main all use DCT and DST based transforms with a wide range of sizes including non-square transform kernels.
In-loop filters have a crucial contribution to improving the perceptual quality of block-based codecs, by removing artifacts created in the separated processing and decisions applied to adjacent blocks. AVC uses a relatively simple in loop adaptive De-Blocking (DB) filter, which is also the case for EVC-Baseline which uses the filter from H.263 Annex J. HEVC adds an additional Sample Adaptive Offset (SAO) filter, designed to allow for better reconstruction of the original signal amplitudes by applying offsets stored in a lookup table in the bitstream, resulting in increased picture quality and reduction of banding and ringing artifacts. VVC uses similar DB and SAO filters, and adds an Adaptive Loop Filter (ALF) to minimize the error between the original and decoded samples. This is done by using Wiener-based adaptive filters, with suitable filter coefficients determined by the encoder and explicitly signaled to the decoder. EVC-main uses an ADvanced Deblocking Filter (ADDB) as well as ALF, and further introduces a Hadamard Transform Domain Filter (HTDF) performed on decoded samples right after block reconstruction using 4 neighboring samples. Wrapping up with AV1, a regular DB filter is used as well as a Constrained Directional Enhancement Filter (CDEF) which removes ringing and basis noise around sharp edges, and is the first usage of a directional filter for this purpose by a video codec. AV1 also uses a Loop Restoration filter, for which the filter coefficients are determined by the encoder and signaled to the decoder.
The entropy coding stage varies somewhat among the codecs, partially due to the fact that the Context Adaptive Binary Arithmetic Coding (CABAC) has associated royalties. AVC offers both Context Adaptive Variable Length Coding (CAVLC) and CABAC modes. HEVC and VVC both use CABAC, with VVC adding some improvements to increase efficiency such as better initializations without need for a LUT, and increased flexibility in Coefficient Group sizes. AV1 uses non-binary (multi symbol) arithmetic coding – this means that the entropy coding must be performed in two sequential steps, which limits parallelization. EVC-Baseline uses the Binary Arithmetic Coder described in JPEG Annex D combined with run-level symbols, while EVC-Main employs a bit-plane ADvanced Coefficient Coding (ADCC) approach.
To wrap up the feature highlights section, we’d like to note some features that are useful for specific scenarios. For example, EVC-main and VVC support Decoder side MV Refinement (DMVR), which is beneficial for distributed systems where some of the encoding complexity is offloaded to the decoder. AV1 and VVC both have tools well suited for screen content, such as support of Palette coding, with AV1 supporting also the Paeth prediction used in PNG images. Support of Film Grain Synthesis (FGS), first introduced in HEVC but not included in any profile, is mandatory in AV1 Professional profile, and is considered a valuable tool for high quality, low bitrate compression of grainy films.
Probably the most interesting question is how do the codecs compare in actual video compression, or what is the Compression Efficiency (CE) of each codec: What bitrate is required to obtain a certain quality or inversely – what quality will be obtained at a given bitrate. While the question is quite simple and well defined, answering it is anything but. The first challenge is defining the testing points – what content, at what bitrates, in what modes. As a simple example, when screen content coding tools exist, the codec will show more of an advantage on that type of content. Different selections of content, rate control methodologies if used (which are outside the scope of the standards), GOP structures and other configuration parameters, have a significant impact on the obtained results.
Another obstacle on the way to a definitive answer stems from how to measure the quality. PSNR is sadly often still used in these comparisons, despite its poor correlation with perceptual quality. But even more sophisticated objective metrics, such as SSIM or VMAF, do not always accurately represent the perceptual quality of the video. On the other hand, subjective evaluation is costly, not always practical at scale, and results obtained in one test may not be repeated when tests are performed with other viewers or in other locations.
So, while you can find endless comparisons available, which might be slightly different and sometimes even entirely contradicting, we will take a more conservative approach, providing estimates based on a cross of multiple comparisons in the literature. There seems no doubt that among these codecs, AVC has the lowest compression efficiency, while VVC tops the charts. EVC-Baseline seemingly has a compression efficiency which is about 30% higher than AVC, not far from the 40% improvement attributed to HEVC. AV1 and EVC-Main are close, with the decision re which one is superior very dependent on who performed the comparisons. They are both approximately 5-10% behind VVC in their compression efficiency.
Now, a look at the performance or computational complexity of each of the candidates. Again, this comparison is rather naïve, as the performance is so heavily dependent on the implementation and testing conditions, rather than on the tools defined by the standard. The ability to parallelize the encoding tasks, the structure of the processor used for testing, the content type such as low or high motion or dark vs. bright are just a few examples of factors that can heavily impact the performance analysis. For example, taking the exact same preset of x264 and running it on the same content with low and high target bitrates, can cause a 4x difference in encode runtime. In another example, in the Beamr5 epic face off blog post, the Beamr HEVC encoder is on average 1.6x faster than x265 on the same content with similar quality, and the range of the encode FPS across files for each encoder is order of 1.5x. Having said all that, what we will try to do here is provide a very coarse, ball-park estimate as to the relative computational complexity of each of the reviewed codecs. AVC is definitely the lowest complexity of the bunch, with EVC-Baseline only very slightly more complex. HEVC has higher performance demands for both the encoder and decoder. VVC has managed to keep the decoder complexity almost on par with that of the HEVC decoder, but encoding complexity is significantly higher and probably the highest of all 5 reviewed codecs. AV1 is also known for its high complexity, with early versions having introduced the unit Frame Per Minute (FPM) for encoding performance, rather than the commonly used Frames Per Second (FPS). Though recent versions have gone a long way to making matters better, it is still safe to say that complexity is significantly higher than HEVC, and probably still higher than EVC-Main.
In the table below, we have summarized some of the comparison features which we outlined in this blog post.
The Bottom Line
So, what is the bottom line? Unfortunately, life is getting more complicated, and the case of one or two dominant codecs covering almost all the industry – will be no more. Only time will tell which will have the highest market share in 5 years’ time, but one easy assessment is that with AVC current market share estimated at around 70%, this one is not going to disappear anytime soon. AV1 is definitely gaining momentum, and with the giants backing we expect to see it used a fair bit in online streaming. Regarding the others, it is safe to assume that the improved compression efficiency offered by VVC and EVC-Main, and the attractive royalty situation of EVC-Baseline, along with growing number of devices that support HEVC in HW, mean that having to support a plurality of codecs in many video streaming applications is the new reality for all of us.