With numerous advantages, AV1 is now supported on about 60% of devices and all major web browsers. To accelerate its adoption – Beamr has introduced an easy, automated upgrade to the codec that is in the forefront of today’s video technology
Four years ago we explored the different video codecs, analyzing their strengths and weaknesses, and took a look at current and predicted market share. While it is gratifying to see that many of our predictions were pretty accurate, that is accompanied by some degree of disappointment: while AV1 strengths are well known in the industry, significant change in adoption of new codecs has yet to materialize.
The bottom line of the 2020 post was: “Only time will tell which will have the highest market share in 5 years’ time, but one easy assessment is that with AVC current market share estimated at around 70%, this one is not going to disappear anytime soon. AV1 is definitely gaining momentum, and with the giants backing we expect to see it used a fair bit in online streaming. “
Indeed we are living in a multi-codec reality, where AVC still accounts for, by far, the largest percentage of video content, but adoption of AV1 is starting to increase with large players such as Netflix and YouTube incorporating it into their workflows, and many others using it for specific high value use cases.
Thus, we are faced with a mixture of the still dominant AVC, HEVC (serving primarily UHD and HDR use cases), AV1 and some additional codecs such as VP9, VVC which are being used in quite small amounts.
The Untapped Potential of AV1
So while AV1 adoption is increasing, there is still significant untapped potential. One of the causes for slower than hoped rollout of AV1 is the obstacle present for adoption of any new standard – critical mass of decoding support in H/W on edge devices.
While for AVC and HEVC the coverage is very extensive, for AV1 that has only recently become the case, with support across an estimate of 60% of devices and all major web browsers, and complementing the efficient software decoding offered by Dav1d.
Another obstacle AV1 faces involves the practicalities of deployment. While there is extensive knowledge, within the industry and available online, of how best to configure AVC encoding, and what presets and encoding parameters work well for which use cases – there is no such equivalent knowledge for AV1. Thus, in order to deploy it, extensive research is needed by those who intend to use it.
Additionally, AV1 encoding is complicated, resulting in much higher processing power required to perform software encoding. In a world that is constantly trying to cut back costs, and use lower power solutions, this can pose a problem. Even when using software solutions at the fastest settings, the compute required is still significantly slower than AVC encoding at typical speeds. This is a strong motivator to upgrade to AV1 using H/W accelerated solutions (Learn more about Beamr solution to the challenge).
The upcoming codec possibilities are also a deterrent for some. With AV2 in the works, VVC finalized and gaining some traction, and various groups working on AI based encoding solutions, there will always be players waiting for ‘the next big thing’, rather than having to switch out codecs twice.
In a world where JPEG, a 30+ year old standard, is still used in over 70% of websites and is the most popular format on the web for photographic content, it is no surprise that adoption of new video codecs is taking time.
While a multi codec reality is probably going to stay with us, we can at least hope that when we revisit this topic in a blog a few years down the line, the balance between deployed codecs leans more towards the higher efficiency codecs, like AV1, to yield the best bitrate – quality options for the video world.
This year at IBC 2024 in Amsterdam, we are excited to demonstrate Live 4K p60 optimized streaming with our Content-Adaptive Bitrate (CABR) technology on NVIDIA Holoscan for Media, a software-defined, AI-enabled platform that allows live video pipelines to run on the same infrastructure as AI. Using the CABR GStreamer plugin, premiered at the NAB Show earlier this year, we now support live, quality-driven optimized streaming for 4Kp60 video content.
It is no secret that savvy viewers are coming to expect the high-quality experience of 4K Ultra-High-Definition streamed at 60 frames per second for premium events. What started with a drizzle a few years back has become the high end norm for recent events such as the 2024 Olympics, where techies were sharing insights on where it could be accessed.
Given the fact that 4K means a whopping four times the pixels compared to full HD resolution, keeping up with live encoding of 4K at 60 fps can be quite challenging, and can also result in bitrates that are too high to manage.
One possible solution for broadcasters is to encode and transmit at 1080p and rely on the constantly improving upscalers available on TVs to provide the 4K experience, but this of course means they cannot control the user experience. A better solution is to have a platform that is super fast, and can create live 4Kp60 encodes, which combine excellent quality with an optimization process that minimizes the required bitrate for transmission.
Comparison of 4K Live video before and after optimization
Beamr CABR on Holoscan for Media offers exactly that, by combining the fast data buses and easy-to-use architecture of Holoscan for Media with Beamr hardware-accelerated, quality-driven optimized AV1 encoding. Together, it is possible to stream super efficient, 4K, lower bitrate encodes at top notch quality.
Content Adaptive Bitrate encoding, or CABR, is Beamr’s patented and award-winning technology that uses a quality measure to select the best candidate with the lowest bitrate and the same perceptual quality as a reference frame. In other words, users can enjoy 30-50% lower bitrate, faster delivery of files or live video streams and improved user experience – all with exactly the same quality as the original video.
In order to achieve aggressive bitrates which are feasible for broadcast of live events, we configure the system to use AV1 encoding. The advanced AV1 format has been around since 2018. However, its full potential has not been fully realized by many players in the video arena. AV1 is raising the bar significantly in comparison to previous modern codecs, such as AVC (H.264) or HEVC (H.265), in terms of efficiency, performance with GPUs and high quality for real-time video. When combined with CABR – AV1 is offering up even more. According to our tests, AV1 can reduce data by 50% compared to AVC and by 30% compared to HEVC. We also showed that CABR optimized AV1 is beneficial for machine learning tasks.
Putting all three of these technologies together, namely deploying Holoscan for Media with the Beamr CABR solution inside, which in turn is using NVIDIA’s hardware-accelerated AV1 encoder, provides a platform that offers spectacular benefits. With the rise in demand for high-quality live streaming at high resolution, high fps and manageable bitrates, while keeping an eye on the encoding costs – this solution is definitely an interesting prospect for companies looking to boost their streaming workflows.
Now available: Hardware accelerated, unsupervised, codec modernization to AV1 for increased efficiency video AI workflows
AV1, the new kid on the block of video encoders, is definitely starting to gain traction due to its high compression efficiency and increasing adoption on browsers and end devices. As we mentioned in our previous blog, H/W accelerated AV1 encoding is a particularly attractive prospect due to the combination of increased efficiency and light speed performance. H/W accelerated codec modernization – using Beamr’s Content Adaptive Bit-Rate (CABR) video optimization process running with NVIDIA video encoding – allows for fast, fully automatic, upgrade of legacy encodes to perceptually identical AV1 encodes.
Codec modernization is essentially the ability to get double the benefit – both the increased compression efficiency of codecs such as AV1, and the bitrate efficiencies of Beamr’s perceptually driven optimization. Over the years we have consistently validated that Beamr CABR technology creates optimized files that are perceptually identical, meaning they look the same to the human eye. While we have consistently demonstrated that the visual quality is indeed preserved, in this blog post we continue to explore how Beamr’s optimization lends itself to AI based workflows.
In our previous case studies, we looked at how the reduced bitrate, optimized videos, behave in Machine Learning (ML) tasks such as face detectionand action recognition training. We showed that the results when using optimized AVC and HEVC encodes are stable, despite reducing file sizes significantly with an average reduction of 24% on the source files, and an amazing x3 decrease in size of the cropped AVC encoded files created by openCV.
Now we add codec modernization to the mix, which allows to reduce the sizes of the cropped encodes further. The AV1 encoded files are smaller by a factor of 4, while still providing very similar training and inference results, as shown by the maximal and average accuracy results obtained in the different experiments and presented in the following table:
Tested on AVC
Tested on optimized AV1
Trained on AVC
67.5% (53%)
66.4% (53%)
Trained on optimized AV1
66.4% (52%)
64.8% (53.5%)
Next we decided to ramp up the fun factor and play around with some cool AI applications. Using the open source Face Fusion project, we took 10 source AVC videos, an image containing our target face and proceeded to swap the faces in the source videos with our target person. Now, while this is a fun experiment in itself, imagine how much easier it becomes when the source videos are reduced by a factor of 4, with the results looking just the same.
Below is an example showing a frame from the source video, the target face image, and side by side comparison of the video with the replaced or fused face – when using the original AVC encode (on the left) or the AV1 optimized by Beamr (on the right), looking just as good:
We are just starting to scratch the surface on how Beamr’s technology and offerings, including codec modernization to AV1, can help make AI workflows more efficient without compromising quality or accuracy. We are excited to be on this journey and will continue to explore and add on to the synergies between video optimization, video modernization and video AI solutions.
Beamr’s Content Adaptive Bit Rate solution enables significantly decreasing video file size or bitrates without changing the video resolution or compromising perceptual quality. Since the optimized file is fully standard compliant, it can be used in your workflow seamlessly, whatever your use case, be it video streaming, playback or even part of an AI workflow.
Beamr first launched Beamr cloud earlier this year, and we are now super excited to announce that our valued partnership with Oracle Cloud Infrastructure (OCI) is enabling us to offer to OCI customers more features and better performance.
The performance improvements are due in part to the availability of the powerful NVIDIA L40S GPUs on OCI. In preliminary testing we found that running our video encoding workflows can be up to 30% faster when using these cards, than when running on the cards we currently use in the Beamr Cloud solution.
This was derived from testing AVC and HEVC NVENC driven encodes for a set of nine 1080p classic test clips with eight different configurations, and comparing encoding wall times on an A10G vs. a L40S GPU. Speedup factors of up to 55% were observed, with an average just above 30%. The full test data is available here.
Another exciting feature about these cards is that they support AV1 encoding, which means Beamr Cloud will now offer to turn your videos into optimized AV1 encodes, offering even higher bitrate/file size savings.
What’s the fuss about AV1?
In order to store and transmit video, substantial compression is needed. From the very earliest efforts to standardize video compression in the 90s, there has been a constant effort to create video compression standards offering increasing efficiency – meaning that the same video quality can be achieved with smaller files or lower bitrates.
As shown in the schematic illustration below, AV1 has come a long way in improving over H.264/AVC, the most widely adopted standard today, despite being 20 years old. However, the increased compression efficiency is not free – the computational complexity of newer codecs is also significantly higher, motivating the adoption of hardware accelerated encoding options.
With the demand and need for Video AI workflows continuing to rise, the ability to perform fully automatic, fast, efficient, optimized video encoding is an important enabler.
The Beamr GPU powered video compression and optimization occur within the GPU on OCI, right at the heart of these AI workflows, making them extremely well placed to offer benefits to such workflows. We have previously shown in a number of case studies that there is no negative impact on inference or training results when using the optimized files – making the integration of this optimization process into AI workflows a natural choice for cost savvy developers.
Machine learning for Video is an expanding field, garnering vast interest, with generative AI for video picking up speed. However there are significant pain points for these technologies such as storage and bandwidth bottlenecks when dealing with video content, as well as training and inferencing speeds.
In the following case study, we show that training an AI network for action recognition using video files compressed and optimized through Beamr Content-Adaptive Bitrate technology (CABR), produces results that are as good as training the network with the original, larger files. The ability to use significantly smaller video files can accelerate machine learning (ML) training and inferencing.
Motivation
Beamr’s CABR enables significantly decreasing video file size without changing the video resolution, compression or file format or compromising perceptual quality. It is therefore a great candidate for resolving file size issues and bandwidth bottlenecks in the context of ML for video.
In a previous case study we looked at the task of people detection in video using pre-trained models. In this case study we cover the more challenging task of training a neural network for action recognition in video, comparing the outcome when using source vs optimized files.
We will start by describing the problem we targeted, and then provide the classifier architecture used. We will continue with details on the data sets used and their optimization results, followed by the experiment results, concluding with directions for future work.
Action recognition task
When setting the scope for this case study it was essential to us to define a test case that makes full use of the fact that the content is video, as opposed to image. Therefore we selected a task which requires the temporal element of video to perform the classification – action recognition. In viewing individual frames it is not possible to differentiate between frames captured during walking and running, or between someone jumping or dancing. For this a sequence of frames is required, which is why this was our task of choice.
Target data set
For the fine tuning step we collected a set of 160 user-generated content free to use video clips, downloaded from the Pexels and Envato stock-video websites. The videos were downloaded in 720p resolution, using the website default settings. We selected videos that belong to one of the following four action classes or categories: running, martial arts, dancing and rope jumping.
In order to use these in the selected architecture, they needed to be cropped to a square input. This was done by manually marking “ROI” in each clip, and performing the crop using OpenCV and corresponding OpenH264 encoding with default configuration and settings.
We first performed optimization of the original clip set, using the Beamr cloud optimization SaaS, obtaining an average reduction of 24%. This is beneficial when storing the test set for future use and possibly performing other manipulations on it. However, for our test we wanted to compress the set of cropped videos that were actually used for the training process. Applying the same optimization to these files, created by openCV, yielded a whopping 67% savings or average reduction.
Architecture
We selected an encoder-decoder architecture, which is commonly used for classification of video or other time series inputs. For the encoder we used ResNet-152 pre-trained with ImageNet, followed by 3 fully connected layers with sizes of 1024, 768 and 512. For the decoder we used an LSTM decoder followed by 2 fully connected layers consisting of 512 and 256 neurons.
Pre-training
We performed initial training of the network using the UCF-101 dataset which consists of 13,320 video clips, at a resolution of 240p, classified into 101 possible action classes. The data was split so that 85% of the files were used for the training and 15% for validation.
These videos were resized to 224 x 224 prior to feeding into the classifier. The training was done using a batch size of 24, and 35 epochs were performed. For the error function we used cross-entropy loss which is a popular choice for classifier training. The Adaptive Moment Estimation, or Adam, optimizer with a learning rate of 1e-3 was selected for the training process as it solves the problems of local minima, overshoot or oscillation caused by the fixed values of the learning rates during the updating of network parameters. This setup yielded a result of 83% accuracy on the validation set.
Training
We performed fine tuning of the pre-trained network described above, to learn the target data set..
The training was performed on 2.05 GB of cropped videos, and on 0.67 GB of cropped & optimized videos, with 76% of the files used for training and 24% for validation.
Due to the higher resolution of the input in the target data set the fine tuning training was done using a batch size of only 4. 30 epochs were performed, though we generally achieved convergence already at 9-10 epochs. Again we used cross-entropy loss and Adam optimizer with a learning rate of 1e-3.
Due to the relatively small sample size used here, a difference in one or two classifications can alter results, so we repeated the training process 10 times for each case in order to obtain confidence in the results. The obtained accuracy results for the 10 testing rounds on each of the non-optimized and optimized video sets are presented in the following table.
Minimum Accuracy
Average Accuracy
Maximum Accuracy
Non-Optimized Videos
56%
65%
69%
Optimized Videos
64%
67%
75%
To further verify the results we collected a set of 48 additional clips, and tested these independently on each of the trained classifiers. Below we provide the full cross matrix of maximum and mean accuracy obtained for the various cases.
Tested on Non-Optimized
Tested on Optimized
Trained on Non-Optimized
65%, 53%
62%, 50%
Trained on Optimized
62%, 50%
65%, 50%
Summary & Future work
the results shared above confirm that training a neural network with significantly smaller video files, optimized by Beamr’s CABR, has no negative impact on the training process. In this experiment we even saw a slight benefit resulting from training using optimized files. However, it is unclear if this is a significant conclusion, and we intend to investigate this further. We also see that the cross testing/training has similar results in the different cases.
This test was an initial, rather small scale experiment. We are planning to expand this to larger scale testing, including distributed training setups in the cloud using GPU clusters, where we expect to see further benefits from the reduced sizes of the files used.
This research is part of our ongoing quest to accelerate adoption, and increase accessibility of machine learning and deep learning video as well as video analysis solutions.
By reducing video size but not perceptual quality, Beamr’s Content Adaptive Bit Rate optimized encoding can make video used for vision AI easier to handle thus reducing workflow complexity
Written by: Tamar Shoham, Timofei Gladyshev
Motivation
Machine learning (ML) for video processing is a field which is expanding at a fast pace and presents significant untapped potential. Video is an incredibly rich sensor and has large storage and bandwidth requirements, making vision AI a high-value problem to solve and incredibly suited for AI and ML.
Beamr’s Content Adaptive Bit rate solution (CABR) is a solution that can significantly decrease video file size without changing the video resolution, compression or file format or compromising perceptual quality. It therefore interested us to examine how the Beamr CABR solution can be used to assist in cutting down the sizes of video used in the context of ML.
In this case study, we focus on the relatively simple task of people detection in video. We made use of the NVIDIA DeepStream SDK, a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio and image understanding. Using this SDK is a natural choice for Beamr as an NVIDIA Metropolis partner.
In the following we describe the test setup, the data set used, test performed and obtained results. Then we will present some conclusions and directions for future work.
Test Setup
In this case study, we limited ourselves to comparing detection results on source and reduced-size files by using pre-trained models, making it possible to use unlabeled data.
We collected a set of 19 User-Generated Content, or UGC, video clips, captured on a few different iPhone models. To these we added some clips downloaded from the Pexels free stock videos website. All test clips are in the mp4 or v file format, containing AVC/H.264 encoded video, with resolutions ranging from 480p to full HD and 4K and durations ranging from 10 seconds to 1 minute. Further details on the test files can be found in Annex A.
These 14 source files were then optimized using Beamr’s storage optimization solution to obtain files that were reduced in size by 9 – 73%, with an average reduction of 40%. As mentioned above, this optimization results in output files which retain the same coding and file formats and the same resolution and perceptual quality. The goal of this case study is to show that these reduced-size, optimized files also provide aligned ML results.
For this test, we used the NVIDIA DeepStream SDK [5] with the PeopleNet-ResNet34 detector. Once again, we calculated the mAP among the detections on the pairs of source and optimized files for an IoU threshold of 0.5.
Results
We found that for files with predictions that align with actual people, the mAP is very high, showing that true detection results are indeed unaffected by replacing the source file with the smaller, easier-to-transfer, optimized file.
An example showing how well they align is provided in Figure 1. This test clip resulted in a mAP[0.5] value of 0.98.
Figure 1: Detections for pexels_video_2160p_2 frame 305 using PeopleNet-ResNet34, source on top, with optimized (54% smaller) below
As the PeopleNet-ResNet34 model was developed specifically for people detection, it has quite stable results, and overall showed high mAP values with a median mAP value of 0.94.
When testing some other models we did notice that in cases where the detections were unstable, the source and optimized files sometimes created different false positives. It is important to note that because we did not have labeled data, or a ground truth, when such detection errors occur out of sync, they have a double impact on the mAP value calculated between the detections on the source and the detections on the optimized file. This results in poorer results than the mAP values expected when calculating for detections vs. the labeled data.
We also noticed cases where there is a detection flicker, with the person being detected only in some of the frames where they appear. This flicker is not always synchronized between the source and optimized clips, resulting once again in an ‘accumulated’ or double error in the mAP calculated among them. An example of this is shown in Figure, for a clip with a mAP[0,5] value of 0.92.
Figure 2a: Detections for frames 1170 from the clip pexels_musicians.mp4 using PeopleNet-ResNet34, source on the left and optimized (44% smaller) on the right. Note detection on the left of the stairs, present only in the source file. Figure 2b: same for frame 1171, with no detection in eitherFigure 2c: frame 1172, detected in bothFigure 2d: frame 1173, detected only in the optimized file
Summary
The experiments described above show that CABR can be applied to videos that undergo ML tasks such as object detection. We showed that when detections are stable, almost identical results will be obtained for the source and optimized clips. The advantages of reducing storage size and transmission bandwidth by using the optimized files make this possibility particularly attractive.
Another possible use for CABR in the context of ML stems from the finding that for unstable detection results, CABR may have some impact on false positives or mis-detects. In this context, it would be interesting to view it as a possible permutation on labeled data to increase training set size. In future work, we will investigate the further potential benefits obtained when CABR is incorporated at the training stage and expand the experiments to include more model types and ML tasks.
This research is all part of our ongoing quest to accelerate adoption and increase the accessibility of video ML/DL and video analysis solutions.
Annex A – test files
Below are details on the test files used in the above experiments. All the files, and detection results are available here
#
Filename
Source
Bitrate
Dims WxH
FPS
Duration [sec]
saved by CABR
1
IMG_0226
iPhone 3GS
3.61 M
640×480
15
36.33
35%
2
IMG_0236
iPhone 3GS
3.56 M
640×480
30
50.45
20%
3
IMG_0749
iPhone 5
17.0 M
1920×1080
29.9
11.23
34%
4
IMG_3316
iPhone 4S
21.9 M
1920×1080
29.9
9.35
26%
5
IMG_5288
iPhone SE
14.9 M
1920×1080
29.9
43.40
29%
6
IMG_5713
iPhone 5c
16.3 M
1080×1920
29.9
48.88
73%
7
IMG_7314
iPhone 7
15.5 M
1920×1080
29.9
54.53
50%
8
IMG_7324
iPhone 7
15.8 M
1920×1080
29.9
16.43
39%
9
IMG_7369
iPhone 6
17.9 M
1080×1920
29.9
10.23
30%
10
pexels_musicians
pexels
10.7 M
1920×1080
24
60.0
44%
11
pexels_video_1080p
pexels
4.4 M
1920×1080
25
12.56
63%
12
pexels_video_2160p
pexels
12.2 M
3840×2160
25
15.24
9%
13
pexels_video_2160p_2
pexels
15.2 M
3840×2160
25
15.84
54%
14
pexels_video_of_people_walking_1080p
pexels
3.5 M
1920×1080
23.9
19.19
58%
Table A1: Test files used, with the per file savings
Easy & Safe Codec Modernization with Beamr using Nvidia GPUs
Following a decade where AVC/H.264 was the clear ruler of the video encoding world, the last years have seen many video coding options battling to conquer the video arena. For some insights on the race between modern coding standards you can check out our corresponding blog post.
Today we want to share how easy it can be to upgrade your content to a new and improved codec in a fast, fully automatic process which guarantees the visual quality of the content will not be harmed. This makes the switchover to newer encoders a smooth, easy and low cost process which can help accelerate the adoption of new standards such as HEVC and AV1. When this transformation is done using a combination of Beamr’s technology with the Nvidia NVENC encoder, using their recently released APIs, it becomes a particularly cutting-edge solution, enjoying the benefits of the leading solution in hardware AV1 encoding.
The benefit of switching to more modern codecs lies of course in the higher compression efficiency that they offer. While the extent of improvement is very dependent on the actual content, bitrates and encoders used, HEVC is considered to offer gains of 30%-50% over AVC, meaning that for the same quality you can spend up to 50% fewer bits. For AV1 this increase is generally a bit higher.. As more and more on-device support is added for these newer codecs, the advantage of utilizing them to reduce both storage and bandwidth is clear.
Generally speaking, performing such codec modernization involves some non-trivial steps.
First, you need to get access to the modern encoder you want to use, and know enough about it in order to configure the encoder correctly for your needs. Then you can proceed to encoding using one of the following approaches.
The first approach is to perform bit-rate driven encoding. One possibility is to use conservative bitrates, in which case the potential reduction in size will not be achieved. Another possibility is to set target bitrates that reflect the expected savings, in which case there is a risk of losing quality. For example, In an experimental test of files which were converted from their AVC source to HEVC, we found that on average, a bitrate reduction of 50% could be obtained when using the Beamr CABR codec modernization approach. However, when the same files were all brute-force encoded to HEVC at 50% reduced bitrate, using the same encoder and configuration, the quality took a hit for some of the files.
This example shows the full AVC source frame on top, with the transcodes to HEVC below it. Note the distortion in the blind HEVC encode, shown on the left, compared to the true-to-source video transformed with CABR on the right.
The second approach is to perform the transcode using a quality driven encode, for instance using the constant QP (Quantization Parameter) or CRF (Constant Rate Factor) encoding modes with conservative values, which will in all likelihood preserve the quality. However, in this case you are likely to unnecessarily “blow up” some of your files to much higher bitrates. For example, for the UGC content shown below, transcoding to HEVC using a software encoder and CRF set to 21 almost doubled the file size.
Yet another approach is to use a trial and error encode process for each file or even each scene, manually verifying that a good target encoding setup was selected which minimizes the bitrate while preserving the quality. This is of course an expensive and cumbersome process, and entirely unscalable.
By using Beamr CABR this is all done for you under the hood, in a fully automatic process, which makes optimized choices for each and every frame in your video, selecting the lowest bitrate that will still perfectly preserve the source visual quality. When performed using the Nvidia NVENC SDK with interfaces to Beamr’s CABR technology, this transformation is significantly accelerated and becomes even more cost effective.
The codec modernization flow is demonstrated for AVC to HEVC conversion in the above high-level block diagram. As shown here, the CABR controller interacts with NVENC, Nvidia’s hardware video encoder, using the new APIs Nvidia has created for this purpose. At the heart of the CABR controller lies Beamr’s Quality Measure, BQM, a unique, patented, Emmy award winning perceptual video quality measure. BQM has now been adapted and ported to the Nvidia GPU platform, resulting in significant acceleration of the optimization process .
The Beamr optimization technology can be used not only for codec modernization, but also to reduce bitrate of an input video, or of a target encode, while guaranteeing the perceptual quality is preserved, thus creating encodes with the same perceptual quality at lower bitrates or file sizes. In any and every usage of the Beamr CABR solution, size or bitrate are reduced as much as possible while each frame of the optimized encode is guaranteed to be perceptually identical to the reference. The codec modernization use case is particularly exciting as it puts the ability to migrate to more efficient and sophisticated codecs, previously used primarily by video experts, into the hands of any user with video content.
For more information please contact us at info@beamr.com
The proliferation of AI-generated visual content is creating a new market for media optimization services, with companies like Beamr well positioned to help businesses optimize their video content for reduced storage, faster delivery, and better user experiences.
We are living in a brave new world, where any image and video content we can imagine is at our fingertips, merely a prompt and AI based content generation engine away. Platforms like Wochit, Synthesia, Wibbitz, and D-ID are using AI technology to automate the video creation process. Using these tools makes it almost trivial for businesses to create engaging video content at scale. These platforms allow users to create tailored content quickly and efficiently, with minimal time and cost.
Wochit, for example, offers a library of pre-made templates that users can customize with their own branding and messaging. The platform’s AI technology can also automatically generate videos from text, images, and video clips, making it easier for businesses to create engaging video content without needing specialized video production skills.
However, as businesses increasingly rely on AI-generated content to reach their audiences, and can create a multitude of ‘perfect fit’ videos, the struggle with storage and bit rates becomes a significant factor in their operations. When dealing with bandwidth gobbling video, companies need to ensure that their videos are optimized for fast delivery, high quality, and optimal user experiences. That’s where Beamr comes in.
Beamr’s technology uses advanced compression algorithms to automatically optimize image and video content for fast delivery over any network or device, without compromising quality. This means that you will get to keep the full look and feel of the content, and maintain standard compatibility, but reduce the file sizes or bitrates – without having to do anything manually. The underlying, patented and Emmy Award winning technology will guarantee that the perceptual quality is preserved while any unnecessary bits and bytes are removed. This allows businesses to deliver high-quality content that engages their audience and drives results, while also minimizing the impact on network resources and reducing storage and delivery costs.
To demonstrate the synergy between AI based video content generation and Beamr’s optimization technology we went to Wochit and created a magnificent video showcasing Ferrari above. We then applied the Beamr optimization technology, and received the reduced size perceptually identical optimized video, with file size down from the original 8.8MB to 5.4MB, offering saving of almost 38%.
For our next experiment we took the title of this blog, went to D-ID, and turned the text into a promotional video, using all the default settings. This resulted in the source video shared below.
With an easy drag & drop into the Beamr optimization utility, a completely equivalent video file – using the same codec, resolution and perceptual quality was obtained, except its size was reduced by 48%.
Image synthesis using AI is also becoming more and more common. Along with the already commonplace AI based image generators such as DALL-E (2), many additional platforms are becoming available including Midjourney, DreamStudio and Images.ai.
Feeling the tug of the Land-Down-Under we headed to https://images.ai/prompt/ and requested an image showing ‘a koala eating ice-cream’. The adorable result is shown below on the left. Then we put it through Beamr optimization software and obtained an image with the exact same quality, but reduced from the original 212 KB JPEG, to a mere 49 KB perceptually identical fully standard compliant JPEG image.
Original versionOptimized version
Beamr is also preparing to launch a new SaaS platform that leverages Nvidia’s accelerated video encoding technology, to further speed up the video optimization process. This will allow businesses to optimize their video content even faster than traditional video encoding services, giving them a competitive edge in the rapidly evolving market for AI-generated video content.
For businesses that use Wochit to create their videos, Beamr’s technology can be integrated into the delivery process, ensuring that the videos are optimized for fast delivery and high quality. This allows businesses to stay ahead of the curve in this rapidly evolving market, and keeps their audiences coming back for more. As the demand for AI-generated video content continues to grow, media optimization services like Beamr will become increasingly important for businesses that want to deliver high-quality image and video content that engages their audience and drives results ensuring that they stay ahead of the curve in this rapidly evolving market.
2023 is a very exciting year for Beamr. In February Beamr became a public company on NASDAQ:BMR on the premise of making our video optimization technology globally available as a SaaS. This month we are already announcing a second milestone for 2023: Release of the Nvidia driver that enables running our technology on the Nvidia platform. This is a result of a 2 year joint project, where Beamr engineers worked alongside the amazing engineering team at Nvidia to ensure that the Beamr solution can be integrated with all Nvidia codecs – AVC, HEVC and AV1.
The new NVENC driver, just now made public, provides an API that allows external control over NVENC, enabling Nvidia partners such as Beamr to tightly integrate with the NVENC H/W encoders for AVC, HEVC and AV1. Beamr is excited to have been a design partner for development of this API and to be the first company that uses it, to accelerate and reduce costs of video optimization.
This milestone with Nvidia offers some important benefits. A significant cost reduction is achieved when performing Beamr video optimization using this platform. For example, for 4Kp60 encoded with advanced codecs, when using the Beamr video optimization on GPU the costs of video optimization can be cut by a factor of x10, compared to running on CPU.
Using the Beamr solution integrated on GPU means that the encoding can be performed using the built in H/W codecs, which offer very fast, high frame rate, encoding. This means the combined solution can support live and real time video encoding which is a new use case for the Beamr video optimization technology.
In addition, Nvidia recently announced their AV1 codec, considered to be the highest quality AV1 HW accelerated encoder. In this comparison Jarred Walton concluded that “”From an overall quality and performance perspective, Nvidia’s latest Ada Lovelace NVENC hardware comes out as the winner with AV1 as the codec of choice”. When using the new driver to combine the Beamr video optimization with this excellent AV1 implementation, a very competitive solution is obtained, with video encoding abilities exceeding other AV1 encoders on the market.
So, how does the new driver actually allow the integration of NVENC codecs with Beamr video optimization technology?
Above you can see a high level illustration of the system flow. The user video is ingested, and for each video frame the encoding parameters are controlled by the Beamr Quality Control block instructing NVENC on how to encode the frame, to reach the target quality while minimizing bit consumption. The New NVENC API layer is what enables the interactions between the Beamr Quality Control and the encoder to create the reduced bitrate, target optimized video. As part of the efforts towards the integrated solution, Beamr also ported its quality measurement IP to GPU and redesigned it to match the performance of NVENC, thus placing the entire solution on the GPU.
Beamr uses the new API to control the encoder and perform optimization which can reduce bitrate of an input video, or of a target encode, while guaranteeing the perceptual quality is preserved, thus creating encodes with the same perceptual quality at lower bitrates or file sizes.
The Beamr optimization may also be used for automatic, quality guaranteed codec modernization, where input content can be converted to a modern codec such as AV1, while guaranteeing each frame of the optimized encode is perceptually identical to the source video. This allows for faster migration to modern codecs, for example from AVC to HEVC or AVC to AV1, in an automated, always safe process – with no loss of quality.
In the below examples the risk of blind codec modernization is clearly visible, showcasing the advantage of using Beamr technology for this task. In these examples, we took AVC sources and encoded them to HEVC, to benefit from the increased compression efficiency offered by the more advanced coding standard. On the test set we used, Beamr reduced the source clips by 50% when converting to perceptually identical HEVC streams. We compare these encodes to the results obtained when performing ‘brute force’ compression to HEVC, using 50% lower bitrates. As is clear in these examples, using the blind conversion, shown on the left, can introduce disturbing artifacts compared to the source, shown in the middle. The Beamr encodes however, shown on the right, preserve the quality perfectly.
This driver release and the technology enablement it offers, while a significant milestone, is just the beginning. Beamr is now building a new SaaS that will allow a scalable, no code, implementation of its technology for reducing storage and networking costs. This service is planned to be publicly available in Q3 of 2023. In addition Beamr is looking for design partners that will get early access to its service and help us build the best experiences for our customers.
At the same time Beamr will continue to strengthen relationships with existing users by offering them low level API’s for enhanced controls and specific workflow adaptations.
For more information please contact us at info@beamr.com
There are several different video codecs available today for video streaming applications, and more will be released this year. This creates some confusion for video services who need to select their codec of choice for delivering content to their users at the best quality and lowest bitrate, also taking into account the encode compute requirements. For many years, the choice of video codecs was quite simple to make: Starting from MPEG-2 (H.262) when it took over digital TV in the late 90s, through MPEG-4 part 2 (H.263) dominating video conferencing early in the millennia and followed by MPEG4 part 10 or AVC (H.264) which has been enjoying significant market share for many years now in most video applications and markets including delivery, conferencing and surveillance. Simultaneously, Google’s natural choice for YouTube was their own video codec, VP9.
While HEVC, ratified in 2013, seemingly offered the next logical step, royalty issues put a major stick in its wheels. Add to this the concern over increased complexity, and delay in 4K adoption which was assumed to be the main use case for HEVC, and you get quite a grim picture. This situation triggered a strong desire in the industry to create an independent, royalty free, codec. Significantly reduced timelines in release of new video codec standards were thrown onto this fire and we find ourselves somewhat like Alice in Wonderland: signs leading us forward in various directions – but which do we follow?
Let’s begin by presenting our contenders for the “codec with significant market share in future video applications” competition:
We will not discuss LC-EVC (MPEG-5 Part 2), as it is a codec add-on rather than an alternative stand-alone video codec. If you want to learn more about it, https://lcevc.com/ is a good place to start.
If you are hoping that we will crown a single winner in this article – sorry to disappoint: It is becoming apparent that we are not headed towards a situation of one codec to rule them all. What we will do is provide information, highlight some features of each of the codecs, share some insights and opinions and hopefully help arm you for the ongoing codec wars.
Origin
The first point of comparison we will address is the origin, where each codec is coming from and what that implies. To date, most of the widely adopted video codecs have been standards created by the Joint Video Expert Team combing the efforts of the ITU-T Video Coding Expert Group (VCEG) and the ISO Moving Picture Experts Group (MPEG) to create joint standards. AVC and HEVC were born through this process, which involves clear procedures, from the CfP (Call for Proposals), through teams performing evaluation of the compression efficiency and performance requirements of each proposed tool, and up to creating a draft of the proposed standard. A few rounds of editing and fixes yields a final draft which is ratified to provide the final standard. This process is very well organized and has a long and proven track record of resulting in stable and successful video codecs. AVC, HEVC and VVC are all codecs created in this manner.
The EVC codec is an exception in that it is coming only from MPEG, without the cooperation of ITU-T. This may be related to the ITU VCEG traditionally not being in favor of addressing royalty issues as part of the standardization process, while for the EVC standard, as we will see, this was a point of concern.
Another source for video codecs is specific companies. A particularly successful example is the VP9 codec, developed by Google as a successor to VP8, that was created by On2 technologies (later acquired by Google). In addition, some companies have tried to push open source, royalty free, proprietary codecs, such as Daala by Mozilla or Dirac by BBC Research.
A third source of codecs is when a consortium or group of several companies that works independently, outside of official international standards bodies such as the ISO or ITU. AV1 is the perfect example of such a codec, where multiple companies have joined forces through the Alliance for Open Media (AOM), to create a royalty-free open-source video coding format, specifically designed for video transmissions over the Internet. AOM founding members include Google (who contributed their VP9 technology), Microsoft, Amazon, Apple, Netflix, FB, Mozilla and others, along with classic “MPEG supporters” such as Cisco & Samsung. The AV1 encoder was built from ‘experiments’, where each considered tool was added into the reference software along with a toggle to turn the experiment on or off, allowing flexibility during the decision process as to which tools will be used for each of the eventual profiles.
Timeline
An easy point of comparison between the codecs is the timeline. AVC was completed back in May 2003. HEVC was finalized almost 10 years later in April 2013. AV1 bitstream freeze was in March 2018, with validation in June of that year and Errata-1 published in January 2019. As of the 130th MPEG meeting in April 2020, VVC and EVC are both in Final Draft of International Standard (FDIS) stage, and are expected to be ratified this year.
Royalties
The next point of comparison is the painful issue of royalties. Unless you have been living under a rock you are probably aware that this is a pivotal issue in the codec wars. AVC royalty issues are well resolved and a known and inexpensive licensing model is in place, but for HEVC the situation is more complex. While HEVC Advance unifies many of the patent holders for HEVC, and is constantly bringing more on-board, MPEG LA still represents some others. Velos Media unify yet more IP holders and a few are still unaffiliated and not taking part in any of these pools. Despite the pools finally publishing reasonable licensing models over the last couple of years (over five years after HEVC finalization), the industry is for the most part taking a ‘once bitten, twice shy’ approach to HEVC royalties with some concern over the possibility of other entities coming out of the woodwork with yet further IP claims.
AV1 was a direct attempt to resolve this royalty mess, by creating a royalty-free solution, backed by industry giants, and even creating a legal defense fund to assist smaller companies that may be sued regarding the technology they contributed. Despite AOM never promising to indemnify against third party infringement, this seemed to many pretty air-tight. That is until in early March Sisvel announced a patent pool of 14 companies that hold over 1000 patents, which Sisvel claim are essential for the implementation of AV1. About a month later, AOM released a counter statement declaring AOM’s dedication to a royalty-free media ecosystem. Time, and presumably quite a few lawyers, will determine how this particular battle plays out.
VVC initially seemed to be heading down the same IP road as HEVC: According to MPEG regulations, anyone contributing IP to the standard must sign a Fair, Reasonable And Non-Discriminatory (FRAND) licensing commitment. But, as experience shows, that does not guarantee convergence to applicable patent pools. This time however the industry has taken action in the form of the Media Coding Industry Forum (MC-IF), an open industry forum established in 2018, with the purpose of furthering the adoption of MPEG standards, initially focusing on VVC. Their goal is to establish them as well-accepted and widely used standards for the benefit of consumers and industry. One of the MC-IF work groups is working on defining “sub-profiles”, which include either royalty free tools or tools for which MC-IF are able to serve as a registration authority for all relevant IP licensing. If this effort succeeds, we may yet see royalty free or royalty known sub-profiles for VVC.
EVC is tackling the royalty issue directly within the standardization process, performed primarily by Samsung, Huawei and Qualcomm, using a combination of two approaches. For EVC-Baseline, only tools which can be shown to be royalty-free are being incorporated. This generally means the technologies are 20+ years old and have the publications to prove it. While this may sound like a rather problematic constraint, once you factor in the facts that AVC technology is all 20+ years old, and a lot of non IP infringing know-how has accumulated over these years, one can conceive that this codec can still significantly exceed AVC compression efficiency. For EVC-Main a royalty-known approach has been adopted, where any entity contributing IP is committed to provide a reasonably priced licensing model within two years of the FDIS, meaning by April 2022.
Technical Features
Now that we have dealt with the elephant in the room, we will highlight some codec features and see how the different codecs compare in this regard. All these codecs use a hybrid block-based coding approach, meaning the encode is performed by splitting the frame into blocks, performing a prediction of the block pixels, obtaining a residual as the difference between the prediction and the actual values, applying a frequency transform to the residual obtaining coefficients which are then quantized, and finally entropy coding those coefficients along with additional data, such as Motion Vectors used for prediction, resulting in the bitstream. A somewhat simplified diagram of such an encoder is shown in FIG 1.
FIGURE 1: HYBRID BLOCK BASED ENCODER
The underlying theme of the codec improvements is very much a “more is better” approach. More block sizes and sub-partitioning options, more prediction possibilities, more sizes and types of frequency transforms and more additional tools such as sophisticated in-loop deblocking filters.
Partitioning
We will begin with a look at the block or partitioning schemes supported. The MBs of AVC are always 16×16, CTUs in HEVC and EVC-Baseline are up to 64×64, While for EVC-Main, AV1 and VCC block sizes of up to 128×128 are supported. As block sizes grow larger, they enable efficient encoding of smooth textures in higher and higher resolutions.
Regarding partitioning, while in AVC we had fixed-size Macro-Blocks, in HEVC the Quad-Tree was introduced allowing the Coding-Tree-Unit to be recursively partitioned into four additional sub-blocks. The same scheme is also supported in EVC-Baseline. VVC added Binary Tree (2-way) and Ternary Tree (3-way) splits to the Quad-Tree, thus increasing the partitioning flexibility, as illustrated in the example partitioning in FIG 2. EVC-Main also uses a combined QT, BT, TT approach and in addition has a Split Unit Coding Order feature, which allows it to perform the processing and predictions of the sub-blocks in Right-to-Left order as well as the usual Left-to-Right order. AV1 uses a slightly different partitioning approach which supports up to 10-way splits of each coding block.
Another evolving aspect of partitions is the flexibility in their shape. The ability to split the blocks asymmetrically and along diagonals, can help isolate localized changes and create efficient and accurate partitions. This has two important advantages: The need for fine granularity of sub-partitioning is avoided, and two objects separated by a diagonal edge can be correctly represented without introducing a “staircase” effect. The wedges partitioning introduced in AV1 and the geometric partitioning of VVC both support diagonal partitions between two prediction areas, thus enabling very accurate partitioning.
FIGURE 2: Partitioning example combining QT (blue), TT (green) and BT (red)
Prediction
A good quality prediction scheme which minimizes the residual energy is an important tool for increasing compression efficiency. All video codecs from AVC onwards employ both INTRA prediction, where the prediction is performed using pixels already encoded and reconstructed in the current frame, and INTER prediction, using pixels from previously encoded and reconstructed frames.
AVC supports 9 INTRA prediction modes, or directions in which the current block pixels can be predicted from the pixels adjacent to the block on the left, above and right-above. EVC-Baseline supports only 5 INTRA prediction modes, EVC- Main supports 33, HEVC defines 35 INTRA prediction modes, AV1 has 56 and VVC takes the cake with 65 angular predictions. While the “more is better” paradigm may improve compression efficiency, this directly impacts encoding complexity as it means the encoder has a more complex decision to make when choosing the optimal mode. AV1 and VVC add additional sophisticated options for INTRA prediction such as predicting Chroma from Luma in AV1, or the similar Cross-Component Linear Model prediction of VVC. Another interesting tool for Intra prediction is INTRA Block Copy (IBC) which allows copying of a full block from the already encoded and reconstructed part of the current frame, as the predictor for the current block. This mode is particularly beneficial for frames with complex synthetic texture, and is supported in AV1, EVC-Main and VVC. VVC also supports Multiple Reference Lines, where the number of pixels near the block used for INTRA prediction is extended.
The differences in INTER prediction are in the number of references used, Motion Vector (MV) resolution and associated sub-pel interpolation filters, supported motion partitioning and prediction modes. A thorough review of the various INTER prediction tools in each codec is well beyond the scope of this comparison, so we will just point out a few of the new features we are particularly fond of.
Overlapped Block Motion Compensation (OBMC), which was first introduced in Annex F of H.263 and in MPEG4 part 2 – but not included in any profile, is supported in AV1 and though considered for VVC, was not included in the final draft. This is an excellent tool for reducing those annoying discontinuities at prediction block borders when the block on either side uses a different MV.
FIGURE 3A: OBMC ILLUSTRATION. On the top is regular Motion Compensation which creates a discontinuity due to two adjacent blocks using different parts of reference frame for prediction, on the bottom OBMC with overlap between prediction blocks
FIGURE 3B: OBMC ILLUSTRATION. Zoom into OBMC for the border between middle and left shown blocks, showing the averaging of the two predictions at the crossover pixels.
One of the significant limitations of the block matching motion prediction approach, is its failure to represent motion that is not horizontal & vertical only, such as zoom or rotation. This is being addressed by support of warped motion compensation in AV1 and even more thoroughly with 6 Degrees-Of-Freedom (DOF) Affine Motion Compensation supported in VVC. EVC-main takes it a step further with 3 affine motion modes: merge, and both 4DOF and 6DOF Affine MC.
FIGURE 4: AFFINE MOTION PREDICTION Image credit: Cordula Heithausen – Coding of Higher Order Motion Parameters for Video Compression – ISBN-13: 978-3844057843
Another thing video codecs do is MV (Motion Vector) prediction based on previously found MV values. This reduces bits associated with MV transmission, beneficial at aggressive bitrates and/or when using high granularity motion partitions. It can also help to make the motion estimation process more efficient. While all five codecs define a process for calculating the MV Predictor (MVP), EVC-Main extends this with a history-based MVP, and VVC takes it further with improved spatial and temporal MV prediction.
Transforms
The frequency transforms applied to the residual data are another arena for the “more is better” approach. AVC uses 4×4 and 8×8 Discrete Cosine Transform (DCT), while EVC-Baseline adds more transform sizes ranging from 2×2 to 64×64. HEVC added the complementary Discrete Sine Transform (DST) and supports multi-size transforms ranging from 4×4 to 32×32. AV1, VVC and EVC-Main all use DCT and DST based transforms with a wide range of sizes including non-square transform kernels.
Filtering
In-loop filters have a crucial contribution to improving the perceptual quality of block-based codecs, by removing artifacts created in the separated processing and decisions applied to adjacent blocks. AVC uses a relatively simple in loop adaptive De-Blocking (DB) filter, which is also the case for EVC-Baseline which uses the filter from H.263 Annex J. HEVC adds an additional Sample Adaptive Offset (SAO) filter, designed to allow for better reconstruction of the original signal amplitudes by applying offsets stored in a lookup table in the bitstream, resulting in increased picture quality and reduction of banding and ringing artifacts. VVC uses similar DB and SAO filters, and adds an Adaptive Loop Filter (ALF) to minimize the error between the original and decoded samples. This is done by using Wiener-based adaptive filters, with suitable filter coefficients determined by the encoder and explicitly signaled to the decoder. EVC-main uses an ADvanced Deblocking Filter (ADDB) as well as ALF, and further introduces a Hadamard Transform Domain Filter (HTDF) performed on decoded samples right after block reconstruction using 4 neighboring samples. Wrapping up with AV1, a regular DB filter is used as well as a Constrained Directional Enhancement Filter (CDEF) which removes ringing and basis noise around sharp edges, and is the first usage of a directional filter for this purpose by a video codec. AV1 also uses a Loop Restoration filter, for which the filter coefficients are determined by the encoder and signaled to the decoder.
Entropy Coding
The entropy coding stage varies somewhat among the codecs, partially due to the fact that the Context Adaptive Binary Arithmetic Coding (CABAC) has associated royalties. AVC offers both Context Adaptive Variable Length Coding (CAVLC) and CABAC modes. HEVC and VVC both use CABAC, with VVC adding some improvements to increase efficiency such as better initializations without need for a LUT, and increased flexibility in Coefficient Group sizes. AV1 uses non-binary (multi symbol) arithmetic coding – this means that the entropy coding must be performed in two sequential steps, which limits parallelization. EVC-Baseline uses the Binary Arithmetic Coder described in JPEG Annex D combined with run-level symbols, while EVC-Main employs a bit-plane ADvanced Coefficient Coding (ADCC) approach.
To wrap up the feature highlights section, we’d like to note some features that are useful for specific scenarios. For example, EVC-main and VVC support Decoder side MV Refinement (DMVR), which is beneficial for distributed systems where some of the encoding complexity is offloaded to the decoder. AV1 and VVC both have tools well suited for screen content, such as support of Palette coding, with AV1 supporting also the Paeth prediction used in PNG images. Support of Film Grain Synthesis (FGS), first introduced in HEVC but not included in any profile, is mandatory in AV1 Professional profile, and is considered a valuable tool for high quality, low bitrate compression of grainy films.
Codec Comparison
Compression Efficiency
Probably the most interesting question is how do the codecs compare in actual video compression, or what is the Compression Efficiency (CE) of each codec: What bitrate is required to obtain a certain quality or inversely – what quality will be obtained at a given bitrate. While the question is quite simple and well defined, answering it is anything but. The first challenge is defining the testing points – what content, at what bitrates, in what modes. As a simple example, when screen content coding tools exist, the codec will show more of an advantage on that type of content. Different selections of content, rate control methodologies if used (which are outside the scope of the standards), GOP structures and other configuration parameters, have a significant impact on the obtained results.
Another obstacle on the way to a definitive answer stems from how to measure the quality. PSNR is sadly often still used in these comparisons, despite its poor correlation with perceptual quality. But even more sophisticated objective metrics, such as SSIM or VMAF, do not always accurately represent the perceptual quality of the video. On the other hand, subjective evaluation is costly, not always practical at scale, and results obtained in one test may not be repeated when tests are performed with other viewers or in other locations.
So, while you can find endless comparisons available, which might be slightly different and sometimes even entirely contradicting, we will take a more conservative approach, providing estimates based on a cross of multiple comparisons in the literature. There seems no doubt that among these codecs, AVC has the lowest compression efficiency, while VVC tops the charts. EVC-Baseline seemingly has a compression efficiency which is about 30% higher than AVC, not far from the 40% improvement attributed to HEVC. AV1 and EVC-Main are close, with the decision re which one is superior very dependent on who performed the comparisons. They are both approximately 5-10% behind VVC in their compression efficiency.
Computational Complexity
Now, a look at the performance or computational complexity of each of the candidates. Again, this comparison is rather naïve, as the performance is so heavily dependent on the implementation and testing conditions, rather than on the tools defined by the standard. The ability to parallelize the encoding tasks, the structure of the processor used for testing, the content type such as low or high motion or dark vs. bright are just a few examples of factors that can heavily impact the performance analysis. For example, taking the exact same preset of x264 and running it on the same content with low and high target bitrates, can cause a 4x difference in encode runtime. In another example, in the Beamr5 epic face off blog post, the Beamr HEVC encoder is on average 1.6x faster than x265 on the same content with similar quality, and the range of the encode FPS across files for each encoder is order of 1.5x. Having said all that, what we will try to do here is provide a very coarse, ball-park estimate as to the relative computational complexity of each of the reviewed codecs. AVC is definitely the lowest complexity of the bunch, with EVC-Baseline only very slightly more complex. HEVC has higher performance demands for both the encoder and decoder. VVC has managed to keep the decoder complexity almost on par with that of the HEVC decoder, but encoding complexity is significantly higher and probably the highest of all 5 reviewed codecs. AV1 is also known for its high complexity, with early versions having introduced the unit Frame Per Minute (FPM) for encoding performance, rather than the commonly used Frames Per Second (FPS). Though recent versions have gone a long way to making matters better, it is still safe to say that complexity is significantly higher than HEVC, and probably still higher than EVC-Main.
Summary
In the table below, we have summarized some of the comparison features which we outlined in this blog post.
The Bottom Line
So, what is the bottom line? Unfortunately, life is getting more complicated, and the case of one or two dominant codecs covering almost all the industry – will be no more. Only time will tell which will have the highest market share in 5 years’ time, but one easy assessment is that with AVC current market share estimated at around 70%, this one is not going to disappear anytime soon. AV1 is definitely gaining momentum, and with the giants backing we expect to see it used a fair bit in online streaming. Regarding the others, it is safe to assume that the improved compression efficiency offered by VVC and EVC-Main, and the attractive royalty situation of EVC-Baseline, along with growing number of devices that support HEVC in HW, mean that having to support a plurality of codecs in many video streaming applications is the new reality for all of us.