The proliferation of AI-generated visual content is creating a new market for media optimization services, with companies like Beamr well positioned to help businesses optimize their video content for reduced storage, faster delivery, and better user experiences.
We are living in a brave new world, where any image and video content we can imagine is at our fingertips, merely a prompt and AI based content generation engine away. Platforms like Wochit, Synthesia, Wibbitz, and D-ID are using AI technology to automate the video creation process. Using these tools makes it almost trivial for businesses to create engaging video content at scale. These platforms allow users to create tailored content quickly and efficiently, with minimal time and cost.
Wochit, for example, offers a library of pre-made templates that users can customize with their own branding and messaging. The platform’s AI technology can also automatically generate videos from text, images, and video clips, making it easier for businesses to create engaging video content without needing specialized video production skills.
However, as businesses increasingly rely on AI-generated content to reach their audiences, and can create a multitude of ‘perfect fit’ videos, the struggle with storage and bit rates becomes a significant factor in their operations. When dealing with bandwidth gobbling video, companies need to ensure that their videos are optimized for fast delivery, high quality, and optimal user experiences. That’s where Beamr comes in.
Beamr’s technology uses advanced compression algorithms to automatically optimize image and video content for fast delivery over any network or device, without compromising quality. This means that you will get to keep the full look and feel of the content, and maintain standard compatibility, but reduce the file sizes or bitrates – without having to do anything manually. The underlying, patented and Emmy Award winning technology will guarantee that the perceptual quality is preserved while any unnecessary bits and bytes are removed. This allows businesses to deliver high-quality content that engages their audience and drives results, while also minimizing the impact on network resources and reducing storage and delivery costs.
To demonstrate the synergy between AI based video content generation and Beamr’s optimization technology we went to Wochit and created a magnificent video showcasing Ferrari above. We then applied the Beamr optimization technology, and received the reduced size perceptually identical optimized video, with file size down from the original 8.8MB to 5.4MB, offering saving of almost 38%.
For our next experiment we took the title of this blog, went to D-ID, and turned the text into a promotional video, using all the default settings. This resulted in the source video shared below.
With an easy drag & drop into the Beamr optimization utility, a completely equivalent video file – using the same codec, resolution and perceptual quality was obtained, except its size was reduced by 48%.
Image synthesis using AI is also becoming more and more common. Along with the already commonplace AI based image generators such as DALL-E (2), many additional platforms are becoming available including Midjourney, DreamStudio and Images.ai.
Feeling the tug of the Land-Down-Under we headed to https://images.ai/prompt/ and requested an image showing ‘a koala eating ice-cream’. The adorable result is shown below on the left. Then we put it through Beamr optimization software and obtained an image with the exact same quality, but reduced from the original 212 KB JPEG, to a mere 49 KB perceptually identical fully standard compliant JPEG image.
Original versionOptimized version
Beamr is also preparing to launch a new SaaS platform that leverages Nvidia’s accelerated video encoding technology, to further speed up the video optimization process. This will allow businesses to optimize their video content even faster than traditional video encoding services, giving them a competitive edge in the rapidly evolving market for AI-generated video content.
For businesses that use Wochit to create their videos, Beamr’s technology can be integrated into the delivery process, ensuring that the videos are optimized for fast delivery and high quality. This allows businesses to stay ahead of the curve in this rapidly evolving market, and keeps their audiences coming back for more. As the demand for AI-generated video content continues to grow, media optimization services like Beamr will become increasingly important for businesses that want to deliver high-quality image and video content that engages their audience and drives results ensuring that they stay ahead of the curve in this rapidly evolving market.
2023 is a very exciting year for Beamr. In February Beamr became a public company on NASDAQ:BMR on the premise of making our video optimization technology globally available as a SaaS. This month we are already announcing a second milestone for 2023: Release of the Nvidia driver that enables running our technology on the Nvidia platform. This is a result of a 2 year joint project, where Beamr engineers worked alongside the amazing engineering team at Nvidia to ensure that the Beamr solution can be integrated with all Nvidia codecs – AVC, HEVC and AV1.
The new NVENC driver, just now made public, provides an API that allows external control over NVENC, enabling Nvidia partners such as Beamr to tightly integrate with the NVENC H/W encoders for AVC, HEVC and AV1. Beamr is excited to have been a design partner for development of this API and to be the first company that uses it, to accelerate and reduce costs of video optimization.
This milestone with Nvidia offers some important benefits. A significant cost reduction is achieved when performing Beamr video optimization using this platform. For example, for 4Kp60 encoded with advanced codecs, when using the Beamr video optimization on GPU the costs of video optimization can be cut by a factor of x10, compared to running on CPU.
Using the Beamr solution integrated on GPU means that the encoding can be performed using the built in H/W codecs, which offer very fast, high frame rate, encoding. This means the combined solution can support live and real time video encoding which is a new use case for the Beamr video optimization technology.
In addition, Nvidia recently announced their AV1 codec, considered to be the highest quality AV1 HW accelerated encoder. In this comparison Jarred Walton concluded that “”From an overall quality and performance perspective, Nvidia’s latest Ada Lovelace NVENC hardware comes out as the winner with AV1 as the codec of choice”. When using the new driver to combine the Beamr video optimization with this excellent AV1 implementation, a very competitive solution is obtained, with video encoding abilities exceeding other AV1 encoders on the market.
So, how does the new driver actually allow the integration of NVENC codecs with Beamr video optimization technology?
Above you can see a high level illustration of the system flow. The user video is ingested, and for each video frame the encoding parameters are controlled by the Beamr Quality Control block instructing NVENC on how to encode the frame, to reach the target quality while minimizing bit consumption. The New NVENC API layer is what enables the interactions between the Beamr Quality Control and the encoder to create the reduced bitrate, target optimized video. As part of the efforts towards the integrated solution, Beamr also ported its quality measurement IP to GPU and redesigned it to match the performance of NVENC, thus placing the entire solution on the GPU.
Beamr uses the new API to control the encoder and perform optimization which can reduce bitrate of an input video, or of a target encode, while guaranteeing the perceptual quality is preserved, thus creating encodes with the same perceptual quality at lower bitrates or file sizes.
The Beamr optimization may also be used for automatic, quality guaranteed codec modernization, where input content can be converted to a modern codec such as AV1, while guaranteeing each frame of the optimized encode is perceptually identical to the source video. This allows for faster migration to modern codecs, for example from AVC to HEVC or AVC to AV1, in an automated, always safe process – with no loss of quality.
In the below examples the risk of blind codec modernization is clearly visible, showcasing the advantage of using Beamr technology for this task. In these examples, we took AVC sources and encoded them to HEVC, to benefit from the increased compression efficiency offered by the more advanced coding standard. On the test set we used, Beamr reduced the source clips by 50% when converting to perceptually identical HEVC streams. We compare these encodes to the results obtained when performing ‘brute force’ compression to HEVC, using 50% lower bitrates. As is clear in these examples, using the blind conversion, shown on the left, can introduce disturbing artifacts compared to the source, shown in the middle. The Beamr encodes however, shown on the right, preserve the quality perfectly.
This driver release and the technology enablement it offers, while a significant milestone, is just the beginning. Beamr is now building a new SaaS that will allow a scalable, no code, implementation of its technology for reducing storage and networking costs. This service is planned to be publicly available in Q3 of 2023. In addition Beamr is looking for design partners that will get early access to its service and help us build the best experiences for our customers.
At the same time Beamr will continue to strengthen relationships with existing users by offering them low level API’s for enhanced controls and specific workflow adaptations.
For more information please contact us at info@beamr.com
We are thrilled to share with you that Beamr has won the Seagate Lyve Innovator of the Year competition!
The competition was organized by Seagate Lyve Innovation Labs, which is a collaboration platform that Seagate utilizes to work with entrepreneurs, startup companies and enterprises to create joint solutions based on the flow of data. Lyve Labs currently operates exclusively in Israel, and plans to open additional Labs around the world in the future. Lyve Labs and its partners explore industry challenges, and work together to develop simple, secure, and efficient ways to move and optimize data.
Optimize data? Well, that’s exactly what we do here at Beamr – optimizing images and videos, by reducing their size as much as possible while retaining full quality. So when we heard that Lyve Labs is holding a competition, seeking the most innovative company in this field – we immediately seized this opportunity and registered!
After an initial screening process of several dozen candidate companies, we were informed that Beamr made it to the finals, where 8 companies were selected to pitch their technology in front of senior Seagate executives.
From the start, it was clear that Lyve Labs is putting on a highly professional event: First, we were asked to prepare slides for a 3-minute company pitch, and invited to a preparation session with Dana Ashkenazi, a senior consultant on innovation, leadership and pitching. Dana gave a one hour presentation to all companies about how to structure a 3 minute pitch, from opening with a bang to closing with a smile, including some very useful tips on slide content and design. Then, each company got a private 20 minute session with Dana and with Ruti Arazi, who handles business development at the Seagate Israel Innovation Center. In this session we went over our draft pitch, and got some useful suggestions for improvement. We were also asked to prepare a 30-second “informal” video to introduce ourselves and the company.
A week before the actual event, we were invited to record our 3-minute pitches at Ynet Studios in Israel. Ynet is Israel’s leading online news portal, owned by Israel’s largest newspaper, and they operate professional studios that broadcast live TV news programs daily. It was a real treat to record our pitch in these studios, with top-of-the-line cameras, lighting, and even a teleprompter – so memorizing our pitch was unnecessary…
On Monday November 15th we gathered again in the same studios for the live event. 6 top Seagate executives joined us via Zoom: Jeff Fochtman, Seagate SVP of Marketing; BS Teh, Seagate EVP of Global Sales & Sales Operations; KF Chong, Seagate SVP of Global Operations; Ravi Naik, Seagate CIO & SVP of Storage Services; Shanye Hudson, Seagate SVP of Investor Relations and Treasury; and Patricia Frost, Seagate SVP & Chief of HR. Their role was to evaluate the pitches, ask follow-on questions, and assess the innovation of each company.
For each company , the judges first saw the 30-second “informal” video, then watched the 3-minute pre-recorded pitch presentation, and immediately after had 5 minutes to ask live questions. Since Beamr was first to present, I didn’t quite know what was coming, and obviously didn’t see the questions ahead of time. But the Q&A session went pretty well, the 5 minutes passed quickly, and then I went back to the “green room” to watch the pitches and Q&A sessions of the other companies. I must say that all of them were well-prepared, presented their case quite clearly, and bravely handled all questions thrown at them by the judges. I guess that’s the nature of entrepreneurs…
After all the presentations and Q&A sessions were completed, the judges took 15 minutes to consult, and finally we all gathered in the studio, shoulder to shoulder. The atmosphere was very tense, and the organizers told us that they have no idea who is the winner – they would get notified by the judges at the last minute. Don’t be fooled by the smiles you see in this picture – we were really anxious to hear the judges’ decision at this point…
Finally, they announced the winner – Beamr!!! I was very excited, and stepped up to receive a trophy and an oversized cheque to the amount of $10,000. All the others cheered and shook my hand, and I felt very proud that Beamr was the winner!
Right on the heels of the Emmy ceremony that took place earlier this month, and our 50th patent awarded in July, it feels like Beamr is on a roll… I am very proud of the recognition we have recently received: The 50 patents recognizing our IP, the Technology and Engineering Emmy® award recognizing our contribution to the TV industry, and the Seagate Lyve Innovator of the Year 2021 award recognizing the innovative nature of our technology. And I am very proud of the Beamr team for developing this amazing technology!
Below you can watch the video of Beamr’s appearance in the competition.
A few weeks ago Beamr reached a historic milestone, which got everyone in the company excited. It was triggered by a rather formal announcement from the US Patent Office, in their typical “dry” language: “THE APPLICATION IDENTIFIED ABOVE HAS BEEN EXAMINED AND IS ALLOWED FOR ISSUANCE AS A PATENT”. We’ve received such announcements many times before, from the USPTO and from other national patent offices, but this one was special: It meant that the Beamr patent portfolio has now grown to 50 granted patents!
We have always believed that a strong IP portfolio is extremely important for an innovative technology company, and invested a lot of human and capital resources over the years to build it. So we thought that this anniversary would be a good opportunity to reflect back on our IP journey, and share some lessons we learned along the way, which might come in handy to others who are pursuing similar paths.
Starting With Image Optimization
Beamr was established in 2009, and the first technology we developed was for optimizing images – reducing their file size while retaining their subjective quality. In order to verify that subjective quality is preserved, we needed a way to accurately measure it, and since existing quality metrics at the time were not reliable enough (e.g. PSNR, SSIM), we developed our own quality metric, which was specifically tuned to detect the artifacts of block-based compression.
Our first patent applications covered the components of the quality measure itself, and its usage in a system for “recompressing” images or video frames. The system takes a source image or a video frame, compresses it at various compression levels, and then compares the compressed versions to the source. Finally, it selects the compressed version that is smallest in file size, but still retains the full quality of the source, as measured by our quality metric.
After these initial patent applications which covered the basic method we were using for optimization, we submitted a few more patent applications which covered additional aspects of the optimization process. For example, we found that sometimes when you increase the compression level, the quality of the image increases, and vice versa. This is counter-intuitive, since typically increasing the compression reduces image quality, but it does happen in certain situations. It means that the relationship between quality and compression is not “monotonic”, which makes finding the optimal compression level quite challenging. So we devised a method to solve this issue of non-monotony, and filed a separate patent application for it.
Another issue we wanted to address was the fact that some images could not be optimized – every compression level we tried would result in quality reduction, and eventually we just copied the source image to the output. In order to save CPU cycles, we wanted to refrain from even trying to optimize such images. Therefore, we developed an algorithm which determines whether the source image is “highly compressed” (meaning that it can’t be optimized without compromising quality), based on analyzing the source image itself. And of course – we submitted a patent application on this algorithm as well.
As we continued to develop the technology, we found that some images required special treatment due to specific content or characteristics of the images. So we filed additional patent applications on algorithms we developed for configuring our quality metric for specific types of images, such as synthetic (computer-generated) images and images with vivid colors (chroma-rich).
Extending to Video Optimization
Optimizing images turned out to be very valuable for improving the workflow of professional photographers, reducing page load time for web services, and improving the UX for mobile photo apps. But with video reaching 80% of total Internet bandwidth, it was clear that we needed to extend our technology to support optimizing full video streams. As our technology evolved, so did our patent portfolio: We filed patent applications on the full system of taking a source video, decoding it, encoding each frame with several candidate compression levels, selecting the optimal compression level for that frame, and moving on to the next frame. We also filed patent applications on extending the quality measure with additional components that were designed specifically for video: For example, a temporal component that measures the difference in the “temporal flow” of two successive frames using different compression levels. Special handling of real or simulated “film grain”, which is widely used in today’s movie and TV productions, was the subject of another patent application.
When integrating our quality measure and control mechanism (which sets the candidate compression levels) with various video encoders, we came to the conclusion that we needed a way to save and reload a “state” of the encoder without modifying the encoder internals, and of course – patented this method as well. Additional patents were filed on a method to optimize video streams on the basis of a GOP (Group of Pictures) rather than a frame, and on a system that improves performance by determining the optimal compression level based on sampled segments instead of optimizing the whole stream.
Embracing Video Encoding
In 2016 Beamr acquired Vanguard Video, the leading provider of software H.264 and HEVC encoders. We integrated our optimization technology into Vanguard Video’s encoders, creating a system that optimized video while encoding it. We call this CABR, and obviously we filed a patent on the integrated system. For more information about CABR, see our blog post “A Deep Dive into CABR”.
With the acquisition of Vanguard, we didn’t just get access to the world’s best SW encoders. We also gained a portfolio of video encoding patents developed by Vanguard Video, which we continued to extend in the years since the acquisition. These patents cover unique algorithms for intra prediction, motion estimation, complexity analysis, fading and scene change analysis, adaptive pre-processing, rate control, transform and block type decisions, film grain estimation and artifact elimination.
In addition to encoding and optimization, we’ve also filed patents on technologies developed for specific products. For example, some of our customers wanted to use our image optimization technology while creating lower-resolution preview images, so we patented a method for fast and high-quality resizing of an image. Another patent application was filed on an efficient method of generating a transport stream, which was used in our Beamr Optimizer and Beamr Transcoder products.
The chart below shows the split of our 50 patents by the type of technology.
Patent Strategy – Whether and Where to File
Our patent portfolio was built to protect our inventions and novel developments, while at the same time establish the validity of our technology. It’s common knowledge that filing for a patent is a time and money consuming endeavor. Therefore, prior to filing each patent application we ask ourselves: Is this a novel solution to an interesting problem? Is it important to us to protect it? Is it sufficiently tangible (and explainable) to be patentable? Only when the answer to all these questions is a resounding yes, we proceed to file a corresponding patent application.
Geographically speaking, you need to consider where you plan to market your products, because that’s where you want your inventions protected. We have always been quite heavily focused on the US market, making that a natural jurisdiction for us. Thus, all our applications were submitted to the US Patent Office (USPTO). In addition, all applications that were invented in Beamr’s Israeli R&D center were also submitted to the Israeli Patent Office (ILPTO). Early on, we also submitted some of the applications in Europe and Japan, as we expanded our sales activities to these markets. However, our experience showed that the additional translation costs (not only of the patent application itself, but also of documents cited by an Office Action to which we needed to respond), as well as the need to pay EU patent fees in each selected country, made this choice less cost effective. Therefore, in recent years we have focused our filings mainly on the US and Israel.
The chart below shows the split of our 50 patents by the country in which they were issued.
Patent Process – How to File
The process which starts with an idea, or even an implemented system based on that idea, and ends in a granted patent – is definitely not a short or easy one.
Many patents start their lifecycle as Provisional Applications. This type of application has several benefits: It doesn’t require writing formal patent claims or an Information Disclosure Statement (IDS), it has a lower filing fee than a regular application, and it establishes a priority date for subsequent patent filings. The next step can be a PCT, which acts as a joint base for submission in various jurisdictions. Then the search report and IDS are performed, followed by filing national applications in the selected jurisdictions. Most of our initial patent applications went through the full process described above, but in some cases, particularly when time was of the essence, we skipped the provisional or PCT steps, and directly filed national applications.
For a national application, the invention needs to be distilled into a set of claims, making sure that they are broad enough to be effective, while constrained enough to be allowable, and that they follow the regulations of the specific jurisdiction regarding dependencies, language etc. This is a delicate process, and at this stage it is important to have a highly experienced patent attorney that knows the ins and outs of filing in different countries. For the past 12 years, since filing our first provisional patent, we were very fortunate to work with several excellent patent attorneys at the Reinhold Cohen Group, one of the leading IP firms in Israel, and we would like to take this opportunity to thank them for accompanying us through our IP journey.
After finalizing the patent claims, text and drawings, and filing the national application, what you need most is – patience… According to the USPTO, the average time between filing a non-provisional patent application and receiving the first response from the USPTO is around 15-16 months, and the total time until final disposition (grant or abandonment) is around 27 months. Add this time to the provisional and PCT process, and you are looking at several years between filing the initial provisional application and receiving the final grant notice. In some cases it’s possible to speed up the process by using the option of a modified examination in one jurisdiction, after the application gained allowance in another jurisdiction.
The chart below shows the number of granted patents Beamr has received in each passing year.
Sometimes, the invention, description and claims are straightforward enough that the examiner is convinced and simply allows the application as filed. However, this is quite a rare occurrence. Usually there is a process of Office Actions – where the examiner sends a written opinion, quoting prior art s/he believes is relevant to the invention and possibly rejecting some or even all the claims based on this prior art. We review the Office Action and decide on the next step: In some cases a simple clarification is required in order to make the novelty of our invention stand out. In others we find that adding some limitation to the claims makes it distinctive over the prior art. We then submit a response to the examiner, which may result either in acceptance or in another Office Action. Occasionally we choose to perform an interview with the examiner to better understand the objections, and discuss modifications that can bring the claims into allowance.
Finally, after what is sometimes a smooth, and sometimes a slightly bumpy route, hopefully a Notice Of Allowance is received. This means that once filing fees are paid – we have another granted patent! In some cases, at this point we decide to proceed with a divisional application, a continuation or continuation in part – which means that we claim additional aspects of the described invention in a follow up application, and then the patent cycle starts once again…
Summary
Receiving our 50th patent was a great opportunity to reflect back on the company’s IP journey over the past 12 years. It was a long and winding road, which will hopefully continue far into the future, with more patent applications, office actions and new grants to come.
Speaking of new grants – as this blog post went to press, we were informed that our 51st patent was granted! This patent covers “Auto-VISTA”, a method of “crowdsourcing” subjective user opinions on video quality, and aggregating the results to obtain meaningful metrics. You can learn more about Auto-VISTA in Episode 34 of The Video Insiders podcast.
AV1, the open source video codec developed by the Alliance for Open Media, is the most efficient open-source encoder available today. AV1’s compression efficiency has been found to be 30% better than VP9, the previous generation open source codec, meaning that AV1 can reach the same quality as VP9 with 30% less bits. Having an efficient codec is especially important now that video consumes over 80% of Internet bandwidth, and the usage of video for both entertainment and business applications is soaring due to social distancing measures.
Beamr’s Emmy® award-winning CABR technology reduces video bitrates by up to 50% while preserving perceptual quality. The technology creates fully-compliant standard video streams, which don’t require any proprietary decoder or add-on on the playback side. We have applied our CABR technology in the past to H.264, HEVC and VP9 codecs, using both software and hardware encoder implementations.
In this blog post we present the results of applying Beamr’s CABR technology to the AV1 codec, by integrating our CABR library with the libaom open source implementation of AV1. This integration results in a further 25-40% reduction in the bitrate of encoded streams, without any visible reduction in subjective quality. The reduced-bitrate streams are of course fully AV1 compatible, and can be viewed with any standard AV1 player.
CABR In Action
Beamr’s CABR (Content Adaptive BitRate) technology is based on our BQM (Beamr Quality Measure) metric, which was developed over 10 years of intensive research, and features very high correlation with subjective quality as judged by humans. BQM is backed by 37 granted patents, and has recently won the 2021 Technology and Engineering Emmy® award from the National Academy of Television Arts & Sciences.
Beamr’s CABR technology and the BQM quality measure can be integrated with any software or hardware video encoder, to create more bitrate-efficient encodes without sacrificing perceptual quality. In the integrated solution, the video encoder encodes each frame with additional compression levels, also known as QP values. The first QP (for the initial encode) is determined by the encoder’s own rate control mechanism, which can be either VBR, CRF or fixed QP. The other QPs (for the candidate encodes) are provided by the CABR library. The BQM quality measure then compares the quality of the initial encoded frame to the quality of the candidate encoded frames, and selects the encoded frame which has the smallest size in bits, but is still perceptually identical to the initial encoded frame. Finally, the selected frame is written to the output stream. Due to our adaptive method of searching for candidate QPs, in most cases a single candidate encode is sufficient to find a near-optimal frame, so the performance penalty is quite manageable.
Integrating Beamr’s CABR module with a video encoder
By applying this process to each and every video frame, the CABR mechanism ensures that each frame fully retains the subjective quality of the initial encode, while bitrate is reduced by up to 50% compared to encoding the videos using the encoders’ regular rate control mechanism.
Beamr’s CABR rate control library is integrated into Beamr 4 and Beamr 5, our software H.264 and HEVC encoder SDKs, and is also available as a standalone library that can be integrated with any software or hardware encoder. Beamr is now implementing BQM in silicon hardware, enabling massive scale content-adaptive encoding of user-generated content, surveillance videos and cloud gaming streams.
CABR Integration with libaom
When we approached the task of integrating our CABR technology with an AV1 encoder, we examined several available open source implementations of AV1, and eventually decided to integrate with libaom, the reference open source implementation of the AV1 encoder, developed by the members of the Alliance of Open Media. libaom was selected due to a good quality-speed tradeoff at the higher quality working points, and a well defined frame encode interface which made the integration more straightforward.
To apply CABR technology to any encoder, the encoder should be able to re-encode the same input frame with different QPs, a process that we call “roll-back”. Fortunately, the libaom AV1 encoder already includes a re-encode loop, designed for the purpose of meeting bitrate constraints. We were able to utilize this mechanism to enable the frame re-encode process needed for CABR.
Another important aspect of CABR integration is that although CABR reduces the actual bitrate relative to the requested “target” bitrate, we need the encoder’s rate control to believe that the target bitrate has actually been reached. Otherwise, it will try to compensate for the bits saved by CABR, by increasing bit allocation in subsequent frames, and this will undermine the process of CABR’s bitrate reduction. Therefore, we have modified the VBR rate-control feedback, reporting the bit-consumption of the initial encode back to the RC module, instead of the actual bit consumption of the selected output frame.
An additional point of integration between an encoder and the CABR library is that CABR uses “complexity” data from the encoder when calculating the BQM metric. The complexity data is based on the per-block QP and bit consumption reported by the encoder. In order to expose this information, we added code that extracts the QP and bit consumption per block, and sends it to the CABR library.
The current integration of CABR with libaom supports 8 bit encoding, in both fixed QP and single pass VBR modes. 10-bit encoding (including HDR) and dual-pass VBR encoding are already supported with CABR in our own H.264 and HEVC encoders, and can be easily added to our libaom integration as well.
Integration Challenges
Every integration has its challenges, and indeed we encountered several of them while integrating CABR with libaom. For example, the re-encode loop in libaom initiates prior to the deblocking and other loop-filters, so the frame it generates is not the final reconstructed frame. To overcome this issue, we moved the in-loop filters and applied them prior to evaluating the candidate frame quality.
Another challenge we encountered was that the CABR complexity data is based on the QP values and bit consumption per 16×16 block, while within the libaom encoder this information is only available for bigger blocks. To resolve this, we had to process the actual data in order to generate the QP and bit consumption at the required resolution.
The concept of non-display frames, which is unique to VP9 and AV1, also posed a challenge to our integration efforts. The reason is that CABR only compares quality for frames that are actually displayed to the end user. So we had to take this into account when computing the BQM quality measure and calculating the bits per frame.
Finally, while the QP range in H.264 and HEVC is between 0 and 51, in AV1 it is between 0 and 255. We have an algorithm in CABR called “QP Search” which finds the best candidate QPs for each frame, and it was tuned for the QP range of 0-51, since it was originally developed for H.264 and HEVC encoders. We addressed this discrepancy by performing a simple mapping of values, but in the future we may perform some additional fine tuning of the QP Search algorithm in order to better utilize the increased dynamic range.
Benchmarking Process
To evaluate the results of Beamr’s CABR integration with the libaom AV1 encoder, we selected 20 clips from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license. The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post.
We encoded the selected video clips with libaomx, our version of libaom integrated with the CABR library. The videos were encoded using libaom cpu-used=9, which is the fastest speed available in libaom, and therefore the most practical in terms of encoding time. We believe that using lower speeds, which provide improved encoding quality, can result in even higher savings.
Each clip was encoded twice: once using the regular VBR rate control without the CABR library, and a second time using the CABR rate control mode. In both cases, we used 3 target bitrates for each resolution: A high, medium and low bitrate, as specified in the table below.
Target bitrates used in the CABR-AV1 benchmark
Below is the command line we used to encode the files.
aomencx --cabr=<0 or 1> -w <width> -h <height> --fps=<fps>/1 --disable-kf --end-usage=vbr --target-bitrate=<bitrate in kbps> --cpu-used=9 -p 1 -o <outfile>.ivf <inputFIFO>.yuv
After we completed the encodes in both rate control modes, we compared the bitrate and subjective quality of both encodes. We calculated the % of difference in bitrate between the regular VBR encode and the CABR encode, and visually compared the quality of the clips to determine whether both encodes are perceptually identical to each other when viewed side by side in motion.
Benchmark Results
The table below shows the VBR and CABR bitrates for each file, and the savings obtained, which is calculated as (VBR bitrate – CABR bitrate) / VBR bitrate. As expected, the savings are higher for high bitrate clips, but still significant even for the lowest bitrates we used. Average savings are 26% for the low bitrates, 33% for the medium bitrates, and 40% for the high bitrates.
Note that savings differ significantly across different clips, even when they are encoded at the same resolution and target bitrate. For example, if you look at 1080p clips encoded to the lowest bitrate target (2 Mbps), you will find that some clips have very low savings (less than 3%), while other clips have very high savings (over 60%). This shows the content-adaptive nature of our technology, which is always committed to quality, and reduces the bitrate only in clips and frames where such reduction does not compromise quality.
Also note that the VBR bitrate may differ from the target bitrate. The reason is that the rate control does not always converge to the target bitrate, due to the short length of the clips. But in any case, the savings were calculated between the VBR bitrate and the CABR bitrate.
Savings – Low BitratesSavings – Medium BitratesSavings – High Bitrates
In addition to calculating the bitrate savings, we also performed subjective quality testing by viewing the videos side by side, using the YUView player software. In these viewings we verified that indeed for all clips, the VBR and CABR encodes are perceptually identical when viewed in motion at 100% zoom. Below are a few screenshots from these side-by-side viewings.
Conclusions
In this blog post we presented the results of integrating Beamr’s Content Adaptive BitRate (CABR) technology with the libaom implementation of the AV1 encoder. Even though AV1 is the most efficient open source encoder available, using CABR technology can reduce AV1 bitrates by a further 25-40% without compromising perceptual quality. The reduced bitrate can provide significant savings in storage and delivery costs, and enable reaching wider audiences with high-quality, high-resolution video content.
Appendix
The VBR and CABR encoded files can be found here. The source files can be downloaded directly from the YouTube UGC Dataset, using the links below.
The attention of Internet users, especially the younger generation, is shifting from professionally-produced entertainment content to user-generated videos and live streams on YouTube, Facebook, Instagram and most recently TikTok. On YouTube, creators upload 500 hours of video every minute, and users watch 1 billion hours of video every day. Storing and delivering this vast amount of content creates significant challenges to operators of user-generated content services. Beamr’s CABR (Content Adaptive BitRate) technology reduces video bitrates by up to 50% compared to regular encodes, while preserving perceptual quality and creating fully-compliant standard video streams that don’t require any proprietary decoder on the playback side. CABR technology can be applied to any existing or future block-based video codec, including AVC, HEVC, VP9, AV1, EVC and VVC.
In this blog post we present the results of a UGC encoding test, where we selected a sample database of videos from YouTube’s UGC dataset, and encoded them both with regular encoding and with CABR technology applied. We compare the bitrates, subjective and objective quality of the encoded streams, and demonstrate the benefits of applying CABR-based encoding to user-generated content.
Beamr CABR Technology
At the heart of Beamr’s CABR (Content-Adaptive BitRate) technology is a patented perceptual quality measure, developed during 10 years of intensive research, which features very high correlation with human (subjective) quality assessment. This correlation has been proven in user testing according to the strict requirements of the ITU BT.500 standard for image quality testing. For more information on Beamr’s quality measure, see our quality measure blog post.
When encoding a frame, Beamr’s encoder first applies a regular rate control mechanism to determine the compression level, which results in an initial encoded frame. Then, the Beamr encoder creates additional candidate encoded frames, each one with a different level of compression, and compares each candidate to the initial encoded frame using the Beamr perceptual quality measure. The candidate frame which has the lowest bitrate, but still meets the quality criteria of being perceptually identical to the initial frame, is selected and written to the output stream.
This process repeats for each video frame, thus ensuring that each frame is encoded to the lowest bitrate, while fully retaining the subjective quality of the target encode. Beamr’s CABR technology results in video streams that are up to 50% lower in bitrate than regular encodes, while retaining the same quality as the full bitrate encodes. The amount of CPU cycles required to produce the CABR encodes is only 20% higher than regular encodes, and the resulting streams are identical to regular encodes in every way except their lower bitrate. CABR technology can also be implemented in silicon for high-volume video encoding use cases such as UGC video clips, live surveillance cameras etc.
For more information about Beamr’s CABR technology, see our CABR Deep Dive blog post.
CABR for UGC
Beamr’s CABR technology is especially suited for User-Generated Content (UGC), due to the high diversity and variability of such content. UGC content is captured on different types of devices, ranging from low-end cellular phones to high-end professional cameras and editing software. The content itself varies from “talking head” selfie videos, to instructional videos shot in a home or classroom, to sporting events and even rock band performances with extreme lighting effects.
Encoding UGC content with a fixed bitrate means that such a bitrate might be too low for “difficult” content, resulting in degraded quality, while it may be too high for “easy” content, resulting in wasted bandwidth. Therefore, content-adaptive encoding is required to ensure that the optimal bitrate is applied to each UGC video clip.
Some UGC services use the Constant Rate Factor (CRF) rate control mode of the open-source x264 video encoder for processing UGC content, in order to ensure a constant quality level while varying the actual bitrate according to the content. However, CRF bases its compression level decisions on heuristics of the input stream, and not on a true perceptual quality measure that compares candidate encodes of a frame. Therefore, even CRF encodes waste bits that are unnecessary for a good viewing experience. Beamr’s CABR technology, which is content-adaptive at the frame level, is perfectly suited to remove these remaining redundancies, and create encodes that are smaller than CRF-based encodes but have the same perceptual quality.
Evaluation Methodology
To evaluate the results of Beamr’s CABR algorithm on UGC content, we used samples from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license, which was created to assist in video compression and quality assessment research. The dataset includes around 1500 source video clips (raw video), with a duration of 20 seconds each. The resolution of the clips ranges from 360p to 4K, and they are divided into 15 different categories such as animation, gaming, how-to, music videos, news, sports, etc.
To create the database used for our evaluation, we randomly selected one clip in each resolution from each category, resulting in a total of 67 different clips (note that not all categories in the YouTube UGC set have clips in all resolutions). The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post. As typical user-generated videos, many of the videos suffer from perceptual quality issues in the source, such as blockiness, banding, blurriness, noise, jerky camera movements, etc. which makes them specifically difficult to encode using standard video compression techniques.
We encoded the selected video clips using Beamr 4x, Beamr’s H.264 software encoder library, version 5.4. The videos were encoded using speed 3, which is typically used to encode VoD files in high quality. Two rate control modes were used for encoding: The first is CSQ mode, which is similar to x264 CRF mode – this mode aims to provide a Constant Subjective Quality level, and varies the encoded bitrate based on the content to reach that quality level. The second is CSQ-CABR mode, which creates an initial (reference) encode in CSQ mode, and then applies Beamr’s CABR technology to create a reduced-bitrate encode which has the same perceptual quality as the target CSQ encode. In both cases, we used a range of six CSQ values equally spaced from 16 to 31, representing a wide range of subjective video qualities.
After we completed the encodes in both rate control modes, we compared three attributes of the CSQ encodes to the CSQ-CABR encodes:
File Size – to determine the amount of bitrate savings achievable by the CABR-CSQ rate control mode
BD-Rate – to determine how the two rate control modes compare in terms of the objective quality measures PSNR, SSIM and VMAF, computed between each encode and the source (uncompressed) video
Subjective quality – to determine whether the CSQ encode and the CABR-CSQ encode are perceptually identical to each other when viewed side by side in motion.
Results
The table below shows the bitrate savings of CABR-CSQ vs. CSQ for various values of the CSQ parameter. As expected, the savings are higher for low CSQ values, which correlate with higher subjective quality and higher bitrates. As the CSQ increases, quality decreases, bitrate decreases, and the savings of the CABR-CSQ algorithm are decreased as well.
Table 1: Savings by CSQ value
The overall average savings across all clips and all CSQ values is close to 26%. If we average the savings only for the high CSQ values (16-22), which correspond to high quality levels, the average savings are close to 32%. Obviously, saving one quarter or one third of the storage cost, and moreover the CDN delivery cost, can be very significant for UGC service providers.
Another interesting analysis would be to look at how the savings are distributed across specific UGC genres. Table 2 shows the average savings for each of the 15 content categories available on the YouTube UGC Dataset.
Table 2: Savings by Genre
As we can see, simple content such as lyric videos and “how to” videos (where the camera is typically fixed) get relatively higher savings, while more complex content such as gaming (which has a lot of detail) and live music (with many lights, flashes and motion) get lower savings. However, it should be noted that due to the relatively low number of selected clips from each genre (one in each resolution, for a total of 2-5 clips per genre), we cannot draw any firm conclusions from the above table regarding the expected savings for each genre.
Next, we compared the objective quality metrics PSNR, SSIM and VMAF for the CSQ encodes and the CABR-CSQ encodes, by creating a BD-Rate graph for each clip. To create the graph, we computed each metric between the encodes at each CSQ value and the source files, resulting in 6 points for CSQ and 6 points for CABR-CSQ (corresponding to the 6 CSQ values used in both encodes). Below is an example of the VMAF BD-Rate graph comparing CSQ with CABR-CSQ for one of clips in the lyric video category.
Figure 1: CSQ vs. CSQ-CABR VMAF scores for the 1920×1080 LyricVIdeo file
As we can see, the BD-Rate curve of the CABR-CSQ graph follows the CSQ curve, but each CSQ point on the original graph is moved down and to the left. If we compare, for example, the CSQ 19 point to the CABR-CSQ 19 point, we find that CSQ 19 has a bitrate of around 8 Mbps and a VMAF score of 95, while the CABR-CSQ 19 point has a bitrate of around 4 Mbps, and a VMAF score of 91. However, when both of these files are played side-by-side, we can see that they are perceptually identical to each other (see screenshot from the Beamr View side by side player below). Therefore, the CABR-CSQ 19 encode can be used as a lower-bitrate proxy for the CSQ 19 encode.
Figure 2: Side-by-side comparison in Beamr View of CSQ 19 vs. CSQ-CABR 19 encode for the 1920×1080 LyricVIdeo file
Finally, to verify that the CSQ and CABR-CSQ encodes are indeed perceptually identical, we performed subjective quality testing using the Beamr VISTA application. Beamr VISTA enables visually comparing pairs of video sequences played synchronously side by side, with a user interface for indicating the relative subjective quality of the two video sequences (for more information on Beamr VISTA, listen to episode 34 of The Video Insiders podcast). The set of target comparison pairs comprised 78 pairs of 10 second segments of Beamr4x CSQ encodes vs. corresponding Beamr4x CABR-CSQ encodes. 30 test rounds were performed, resulting in 464 valid target pair views (e.g. by users who correctly recognized mildly distorted control pairs), or on average 6 views per pair. The results show that on average, close to 50% of the users selected CABR-CSQ as having lower quality, while a similar percentage of users selected CSQ as having lower quality, therefore we can conclude that the two encodes are perceptually identical with a statistical significance exceeding 95%.
Figure 3: Percentage of users who selected CABR-CSQ as having lower quality per file
Conclusions
In this blog post we presented the results of applying Beamr’s Content Adaptive BitRate (CABR) encoding to a random selection of user-generated clips taken from the YouTube UGC Dataset, across a range of quality (CSQ) values. The CABR encodes had 25% lower bitrate on average than regular encodes, and at high quality values, 32% lower bitrate on average. The Rate-Distortion graph is unaffected by applying CABR technology, and the subjective quality of the CABR encodes is the same as the subjective quality of the regular encodes. By shaving off a quarter of the video bitrate, significant storage and delivery cost savings can be achieved, and the strain on today’s bandwidth-constrained networks can be relieved, for the benefit of all netizens.
Appendix
Below are links to all the source clips used in the Beamr 4x CABR UGC test.
There are several different video codecs available today for video streaming applications, and more will be released this year. This creates some confusion for video services who need to select their codec of choice for delivering content to their users at the best quality and lowest bitrate, also taking into account the encode compute requirements. For many years, the choice of video codecs was quite simple to make: Starting from MPEG-2 (H.262) when it took over digital TV in the late 90s, through MPEG-4 part 2 (H.263) dominating video conferencing early in the millennia and followed by MPEG4 part 10 or AVC (H.264) which has been enjoying significant market share for many years now in most video applications and markets including delivery, conferencing and surveillance. Simultaneously, Google’s natural choice for YouTube was their own video codec, VP9.
While HEVC, ratified in 2013, seemingly offered the next logical step, royalty issues put a major stick in its wheels. Add to this the concern over increased complexity, and delay in 4K adoption which was assumed to be the main use case for HEVC, and you get quite a grim picture. This situation triggered a strong desire in the industry to create an independent, royalty free, codec. Significantly reduced timelines in release of new video codec standards were thrown onto this fire and we find ourselves somewhat like Alice in Wonderland: signs leading us forward in various directions – but which do we follow?
Let’s begin by presenting our contenders for the “codec with significant market share in future video applications” competition:
We will not discuss LC-EVC (MPEG-5 Part 2), as it is a codec add-on rather than an alternative stand-alone video codec. If you want to learn more about it, https://lcevc.com/ is a good place to start.
If you are hoping that we will crown a single winner in this article – sorry to disappoint: It is becoming apparent that we are not headed towards a situation of one codec to rule them all. What we will do is provide information, highlight some features of each of the codecs, share some insights and opinions and hopefully help arm you for the ongoing codec wars.
Origin
The first point of comparison we will address is the origin, where each codec is coming from and what that implies. To date, most of the widely adopted video codecs have been standards created by the Joint Video Expert Team combing the efforts of the ITU-T Video Coding Expert Group (VCEG) and the ISO Moving Picture Experts Group (MPEG) to create joint standards. AVC and HEVC were born through this process, which involves clear procedures, from the CfP (Call for Proposals), through teams performing evaluation of the compression efficiency and performance requirements of each proposed tool, and up to creating a draft of the proposed standard. A few rounds of editing and fixes yields a final draft which is ratified to provide the final standard. This process is very well organized and has a long and proven track record of resulting in stable and successful video codecs. AVC, HEVC and VVC are all codecs created in this manner.
The EVC codec is an exception in that it is coming only from MPEG, without the cooperation of ITU-T. This may be related to the ITU VCEG traditionally not being in favor of addressing royalty issues as part of the standardization process, while for the EVC standard, as we will see, this was a point of concern.
Another source for video codecs is specific companies. A particularly successful example is the VP9 codec, developed by Google as a successor to VP8, that was created by On2 technologies (later acquired by Google). In addition, some companies have tried to push open source, royalty free, proprietary codecs, such as Daala by Mozilla or Dirac by BBC Research.
A third source of codecs is when a consortium or group of several companies that works independently, outside of official international standards bodies such as the ISO or ITU. AV1 is the perfect example of such a codec, where multiple companies have joined forces through the Alliance for Open Media (AOM), to create a royalty-free open-source video coding format, specifically designed for video transmissions over the Internet. AOM founding members include Google (who contributed their VP9 technology), Microsoft, Amazon, Apple, Netflix, FB, Mozilla and others, along with classic “MPEG supporters” such as Cisco & Samsung. The AV1 encoder was built from ‘experiments’, where each considered tool was added into the reference software along with a toggle to turn the experiment on or off, allowing flexibility during the decision process as to which tools will be used for each of the eventual profiles.
Timeline
An easy point of comparison between the codecs is the timeline. AVC was completed back in May 2003. HEVC was finalized almost 10 years later in April 2013. AV1 bitstream freeze was in March 2018, with validation in June of that year and Errata-1 published in January 2019. As of the 130th MPEG meeting in April 2020, VVC and EVC are both in Final Draft of International Standard (FDIS) stage, and are expected to be ratified this year.
Royalties
The next point of comparison is the painful issue of royalties. Unless you have been living under a rock you are probably aware that this is a pivotal issue in the codec wars. AVC royalty issues are well resolved and a known and inexpensive licensing model is in place, but for HEVC the situation is more complex. While HEVC Advance unifies many of the patent holders for HEVC, and is constantly bringing more on-board, MPEG LA still represents some others. Velos Media unify yet more IP holders and a few are still unaffiliated and not taking part in any of these pools. Despite the pools finally publishing reasonable licensing models over the last couple of years (over five years after HEVC finalization), the industry is for the most part taking a ‘once bitten, twice shy’ approach to HEVC royalties with some concern over the possibility of other entities coming out of the woodwork with yet further IP claims.
AV1 was a direct attempt to resolve this royalty mess, by creating a royalty-free solution, backed by industry giants, and even creating a legal defense fund to assist smaller companies that may be sued regarding the technology they contributed. Despite AOM never promising to indemnify against third party infringement, this seemed to many pretty air-tight. That is until in early March Sisvel announced a patent pool of 14 companies that hold over 1000 patents, which Sisvel claim are essential for the implementation of AV1. About a month later, AOM released a counter statement declaring AOM’s dedication to a royalty-free media ecosystem. Time, and presumably quite a few lawyers, will determine how this particular battle plays out.
VVC initially seemed to be heading down the same IP road as HEVC: According to MPEG regulations, anyone contributing IP to the standard must sign a Fair, Reasonable And Non-Discriminatory (FRAND) licensing commitment. But, as experience shows, that does not guarantee convergence to applicable patent pools. This time however the industry has taken action in the form of the Media Coding Industry Forum (MC-IF), an open industry forum established in 2018, with the purpose of furthering the adoption of MPEG standards, initially focusing on VVC. Their goal is to establish them as well-accepted and widely used standards for the benefit of consumers and industry. One of the MC-IF work groups is working on defining “sub-profiles”, which include either royalty free tools or tools for which MC-IF are able to serve as a registration authority for all relevant IP licensing. If this effort succeeds, we may yet see royalty free or royalty known sub-profiles for VVC.
EVC is tackling the royalty issue directly within the standardization process, performed primarily by Samsung, Huawei and Qualcomm, using a combination of two approaches. For EVC-Baseline, only tools which can be shown to be royalty-free are being incorporated. This generally means the technologies are 20+ years old and have the publications to prove it. While this may sound like a rather problematic constraint, once you factor in the facts that AVC technology is all 20+ years old, and a lot of non IP infringing know-how has accumulated over these years, one can conceive that this codec can still significantly exceed AVC compression efficiency. For EVC-Main a royalty-known approach has been adopted, where any entity contributing IP is committed to provide a reasonably priced licensing model within two years of the FDIS, meaning by April 2022.
Technical Features
Now that we have dealt with the elephant in the room, we will highlight some codec features and see how the different codecs compare in this regard. All these codecs use a hybrid block-based coding approach, meaning the encode is performed by splitting the frame into blocks, performing a prediction of the block pixels, obtaining a residual as the difference between the prediction and the actual values, applying a frequency transform to the residual obtaining coefficients which are then quantized, and finally entropy coding those coefficients along with additional data, such as Motion Vectors used for prediction, resulting in the bitstream. A somewhat simplified diagram of such an encoder is shown in FIG 1.
FIGURE 1: HYBRID BLOCK BASED ENCODER
The underlying theme of the codec improvements is very much a “more is better” approach. More block sizes and sub-partitioning options, more prediction possibilities, more sizes and types of frequency transforms and more additional tools such as sophisticated in-loop deblocking filters.
Partitioning
We will begin with a look at the block or partitioning schemes supported. The MBs of AVC are always 16×16, CTUs in HEVC and EVC-Baseline are up to 64×64, While for EVC-Main, AV1 and VCC block sizes of up to 128×128 are supported. As block sizes grow larger, they enable efficient encoding of smooth textures in higher and higher resolutions.
Regarding partitioning, while in AVC we had fixed-size Macro-Blocks, in HEVC the Quad-Tree was introduced allowing the Coding-Tree-Unit to be recursively partitioned into four additional sub-blocks. The same scheme is also supported in EVC-Baseline. VVC added Binary Tree (2-way) and Ternary Tree (3-way) splits to the Quad-Tree, thus increasing the partitioning flexibility, as illustrated in the example partitioning in FIG 2. EVC-Main also uses a combined QT, BT, TT approach and in addition has a Split Unit Coding Order feature, which allows it to perform the processing and predictions of the sub-blocks in Right-to-Left order as well as the usual Left-to-Right order. AV1 uses a slightly different partitioning approach which supports up to 10-way splits of each coding block.
Another evolving aspect of partitions is the flexibility in their shape. The ability to split the blocks asymmetrically and along diagonals, can help isolate localized changes and create efficient and accurate partitions. This has two important advantages: The need for fine granularity of sub-partitioning is avoided, and two objects separated by a diagonal edge can be correctly represented without introducing a “staircase” effect. The wedges partitioning introduced in AV1 and the geometric partitioning of VVC both support diagonal partitions between two prediction areas, thus enabling very accurate partitioning.
FIGURE 2: Partitioning example combining QT (blue), TT (green) and BT (red)
Prediction
A good quality prediction scheme which minimizes the residual energy is an important tool for increasing compression efficiency. All video codecs from AVC onwards employ both INTRA prediction, where the prediction is performed using pixels already encoded and reconstructed in the current frame, and INTER prediction, using pixels from previously encoded and reconstructed frames.
AVC supports 9 INTRA prediction modes, or directions in which the current block pixels can be predicted from the pixels adjacent to the block on the left, above and right-above. EVC-Baseline supports only 5 INTRA prediction modes, EVC- Main supports 33, HEVC defines 35 INTRA prediction modes, AV1 has 56 and VVC takes the cake with 65 angular predictions. While the “more is better” paradigm may improve compression efficiency, this directly impacts encoding complexity as it means the encoder has a more complex decision to make when choosing the optimal mode. AV1 and VVC add additional sophisticated options for INTRA prediction such as predicting Chroma from Luma in AV1, or the similar Cross-Component Linear Model prediction of VVC. Another interesting tool for Intra prediction is INTRA Block Copy (IBC) which allows copying of a full block from the already encoded and reconstructed part of the current frame, as the predictor for the current block. This mode is particularly beneficial for frames with complex synthetic texture, and is supported in AV1, EVC-Main and VVC. VVC also supports Multiple Reference Lines, where the number of pixels near the block used for INTRA prediction is extended.
The differences in INTER prediction are in the number of references used, Motion Vector (MV) resolution and associated sub-pel interpolation filters, supported motion partitioning and prediction modes. A thorough review of the various INTER prediction tools in each codec is well beyond the scope of this comparison, so we will just point out a few of the new features we are particularly fond of.
Overlapped Block Motion Compensation (OBMC), which was first introduced in Annex F of H.263 and in MPEG4 part 2 – but not included in any profile, is supported in AV1 and though considered for VVC, was not included in the final draft. This is an excellent tool for reducing those annoying discontinuities at prediction block borders when the block on either side uses a different MV.
FIGURE 3A: OBMC ILLUSTRATION. On the top is regular Motion Compensation which creates a discontinuity due to two adjacent blocks using different parts of reference frame for prediction, on the bottom OBMC with overlap between prediction blocks
FIGURE 3B: OBMC ILLUSTRATION. Zoom into OBMC for the border between middle and left shown blocks, showing the averaging of the two predictions at the crossover pixels.
One of the significant limitations of the block matching motion prediction approach, is its failure to represent motion that is not horizontal & vertical only, such as zoom or rotation. This is being addressed by support of warped motion compensation in AV1 and even more thoroughly with 6 Degrees-Of-Freedom (DOF) Affine Motion Compensation supported in VVC. EVC-main takes it a step further with 3 affine motion modes: merge, and both 4DOF and 6DOF Affine MC.
FIGURE 4: AFFINE MOTION PREDICTION Image credit: Cordula Heithausen – Coding of Higher Order Motion Parameters for Video Compression – ISBN-13: 978-3844057843
Another thing video codecs do is MV (Motion Vector) prediction based on previously found MV values. This reduces bits associated with MV transmission, beneficial at aggressive bitrates and/or when using high granularity motion partitions. It can also help to make the motion estimation process more efficient. While all five codecs define a process for calculating the MV Predictor (MVP), EVC-Main extends this with a history-based MVP, and VVC takes it further with improved spatial and temporal MV prediction.
Transforms
The frequency transforms applied to the residual data are another arena for the “more is better” approach. AVC uses 4×4 and 8×8 Discrete Cosine Transform (DCT), while EVC-Baseline adds more transform sizes ranging from 2×2 to 64×64. HEVC added the complementary Discrete Sine Transform (DST) and supports multi-size transforms ranging from 4×4 to 32×32. AV1, VVC and EVC-Main all use DCT and DST based transforms with a wide range of sizes including non-square transform kernels.
Filtering
In-loop filters have a crucial contribution to improving the perceptual quality of block-based codecs, by removing artifacts created in the separated processing and decisions applied to adjacent blocks. AVC uses a relatively simple in loop adaptive De-Blocking (DB) filter, which is also the case for EVC-Baseline which uses the filter from H.263 Annex J. HEVC adds an additional Sample Adaptive Offset (SAO) filter, designed to allow for better reconstruction of the original signal amplitudes by applying offsets stored in a lookup table in the bitstream, resulting in increased picture quality and reduction of banding and ringing artifacts. VVC uses similar DB and SAO filters, and adds an Adaptive Loop Filter (ALF) to minimize the error between the original and decoded samples. This is done by using Wiener-based adaptive filters, with suitable filter coefficients determined by the encoder and explicitly signaled to the decoder. EVC-main uses an ADvanced Deblocking Filter (ADDB) as well as ALF, and further introduces a Hadamard Transform Domain Filter (HTDF) performed on decoded samples right after block reconstruction using 4 neighboring samples. Wrapping up with AV1, a regular DB filter is used as well as a Constrained Directional Enhancement Filter (CDEF) which removes ringing and basis noise around sharp edges, and is the first usage of a directional filter for this purpose by a video codec. AV1 also uses a Loop Restoration filter, for which the filter coefficients are determined by the encoder and signaled to the decoder.
Entropy Coding
The entropy coding stage varies somewhat among the codecs, partially due to the fact that the Context Adaptive Binary Arithmetic Coding (CABAC) has associated royalties. AVC offers both Context Adaptive Variable Length Coding (CAVLC) and CABAC modes. HEVC and VVC both use CABAC, with VVC adding some improvements to increase efficiency such as better initializations without need for a LUT, and increased flexibility in Coefficient Group sizes. AV1 uses non-binary (multi symbol) arithmetic coding – this means that the entropy coding must be performed in two sequential steps, which limits parallelization. EVC-Baseline uses the Binary Arithmetic Coder described in JPEG Annex D combined with run-level symbols, while EVC-Main employs a bit-plane ADvanced Coefficient Coding (ADCC) approach.
To wrap up the feature highlights section, we’d like to note some features that are useful for specific scenarios. For example, EVC-main and VVC support Decoder side MV Refinement (DMVR), which is beneficial for distributed systems where some of the encoding complexity is offloaded to the decoder. AV1 and VVC both have tools well suited for screen content, such as support of Palette coding, with AV1 supporting also the Paeth prediction used in PNG images. Support of Film Grain Synthesis (FGS), first introduced in HEVC but not included in any profile, is mandatory in AV1 Professional profile, and is considered a valuable tool for high quality, low bitrate compression of grainy films.
Codec Comparison
Compression Efficiency
Probably the most interesting question is how do the codecs compare in actual video compression, or what is the Compression Efficiency (CE) of each codec: What bitrate is required to obtain a certain quality or inversely – what quality will be obtained at a given bitrate. While the question is quite simple and well defined, answering it is anything but. The first challenge is defining the testing points – what content, at what bitrates, in what modes. As a simple example, when screen content coding tools exist, the codec will show more of an advantage on that type of content. Different selections of content, rate control methodologies if used (which are outside the scope of the standards), GOP structures and other configuration parameters, have a significant impact on the obtained results.
Another obstacle on the way to a definitive answer stems from how to measure the quality. PSNR is sadly often still used in these comparisons, despite its poor correlation with perceptual quality. But even more sophisticated objective metrics, such as SSIM or VMAF, do not always accurately represent the perceptual quality of the video. On the other hand, subjective evaluation is costly, not always practical at scale, and results obtained in one test may not be repeated when tests are performed with other viewers or in other locations.
So, while you can find endless comparisons available, which might be slightly different and sometimes even entirely contradicting, we will take a more conservative approach, providing estimates based on a cross of multiple comparisons in the literature. There seems no doubt that among these codecs, AVC has the lowest compression efficiency, while VVC tops the charts. EVC-Baseline seemingly has a compression efficiency which is about 30% higher than AVC, not far from the 40% improvement attributed to HEVC. AV1 and EVC-Main are close, with the decision re which one is superior very dependent on who performed the comparisons. They are both approximately 5-10% behind VVC in their compression efficiency.
Computational Complexity
Now, a look at the performance or computational complexity of each of the candidates. Again, this comparison is rather naïve, as the performance is so heavily dependent on the implementation and testing conditions, rather than on the tools defined by the standard. The ability to parallelize the encoding tasks, the structure of the processor used for testing, the content type such as low or high motion or dark vs. bright are just a few examples of factors that can heavily impact the performance analysis. For example, taking the exact same preset of x264 and running it on the same content with low and high target bitrates, can cause a 4x difference in encode runtime. In another example, in the Beamr5 epic face off blog post, the Beamr HEVC encoder is on average 1.6x faster than x265 on the same content with similar quality, and the range of the encode FPS across files for each encoder is order of 1.5x. Having said all that, what we will try to do here is provide a very coarse, ball-park estimate as to the relative computational complexity of each of the reviewed codecs. AVC is definitely the lowest complexity of the bunch, with EVC-Baseline only very slightly more complex. HEVC has higher performance demands for both the encoder and decoder. VVC has managed to keep the decoder complexity almost on par with that of the HEVC decoder, but encoding complexity is significantly higher and probably the highest of all 5 reviewed codecs. AV1 is also known for its high complexity, with early versions having introduced the unit Frame Per Minute (FPM) for encoding performance, rather than the commonly used Frames Per Second (FPS). Though recent versions have gone a long way to making matters better, it is still safe to say that complexity is significantly higher than HEVC, and probably still higher than EVC-Main.
Summary
In the table below, we have summarized some of the comparison features which we outlined in this blog post.
The Bottom Line
So, what is the bottom line? Unfortunately, life is getting more complicated, and the case of one or two dominant codecs covering almost all the industry – will be no more. Only time will tell which will have the highest market share in 5 years’ time, but one easy assessment is that with AVC current market share estimated at around 70%, this one is not going to disappear anytime soon. AV1 is definitely gaining momentum, and with the giants backing we expect to see it used a fair bit in online streaming. Regarding the others, it is safe to assume that the improved compression efficiency offered by VVC and EVC-Main, and the attractive royalty situation of EVC-Baseline, along with growing number of devices that support HEVC in HW, mean that having to support a plurality of codecs in many video streaming applications is the new reality for all of us.
Post COVID-19, the work from home trend will continue, and this will extend the pressure on the Internet from video traffic. Even with the EU Commissioner’s call for video services to reduce their traffic by 25%, as Internet traffic patterns shift from corporate networks to mobile, fixed wireless, and broadband networks, the need to reduce video bandwidths will continue beyond COVID-19. Consumers will still demand the highest quality, and those streaming services meeting their expectations while delivering video in as small a footprint as possible will dominate the market. Now is the time for the streaming video industry to play an active role in adopting more efficient codecs and content-adaptive bitrate technology so that streaming video services can ensure a great user experience without disrupting the Internet.
https://youtu.be/ltYNDQkiyl8
The Internet is a shared resource to preserve.
For the video streaming industry, Thursday, March 19th, marked the day of reckoning for runaway bitrates and seemingly never-ending network capacity. On March 19th, Thierry Breton, the European Commissioner for the Internal Market tweeted, “let’s #SwitchToStandard definition when HD is not necessary.” The result is that most of the best known US video services, including Facebook and Instagram, agreed to a 25% reduction in bandwidth used for video delivered in Europe, UK, and Israel. With other countries rumored to follow suit.
We can blame COVID-19 for the strain as a result of closed schools and businesses, leading to increased use of video conferencing, streaming video services, and cloud gaming. Verizon reported that total web traffic is up 22% between March 12th and March 19th, while week-over-week usage patterns for streaming video services increased by 12%. However, it’s easily predictable that these numbers are trending even higher as the quarantine and shelter in place orders expanded, as evidenced by Cloudflare reporting Internet traffic is 40% higher than pre COVID-19 levels.
The purpose of this article is to provide a framework for how video streaming services may want to think about the Internet post COVID-19 where video streaming services and video-centric applications will need to consider their utilization of the Internet as a shared and not an unlimited resource.
Content-Adaptive Encoding is no longer a nice to have for streaming services.
There are multiple technical and technology options available for reducing video bitrate. The fastest to implement, however, is to drop the resolution of the video. By manipulating the video playlist (called a manifest) that organizes the various resolutions and bit rates that enable the video player to adapt to the speed of the network, a video service can achieve immediate savings by merely serving a lighter weight version of the video. Standard-definition (SD) instead of high-definition (HD). This approach is what most of the complying services have taken, but, it is not a sustainable answer since dropping resolution impacts video experience negatively.
A more advanced technique known as Content-Adaptive Encoding works by guiding the encoder to adapt the bitrate allocation to the needs of the video content.
Reducing resolution is not what consumers want, and this will make content-adaptive encoding essential for many video encoding workflows. Because content-adaptive encoding solutions require integration, for some services, it was relegated to the “nice to have” list. But now, with the sweeping changes to video consumption that is driving network saturation, those services that must compete with high visual quality, are shifting the priority to “must-have.”
Effective tools and methods to be a good citizen of the Internet.
If we are going to be a good citizen of the Internet, we should understand what tools and methods are available to preserve this precious shared resource while delivering a suitable UX and visual quality.
Engineering for video encoding is about tradeoffs. The three primary levers are 1) bitrate, 2) resolution, 3) encoder performance. These levers are interconnected and dependent. For example, it’s not possible to achieve high bitrate efficiency at higher resolutions without affecting encoder performance (increasing CPU cycles).
From a video quality perspective, lever one and lever two are the levers available to most video encoding engineers. While from an operational point of view, the third lever is what most impacts bitrate and quality.
The tools that we can use to reduce bandwidth include the use of advanced video codecs such as HEVC. HEVC (H.265) provides up to 50% reduction in bitrate at the same quality level as H.264, the current dominant codec used around the world. The other tool available is advanced technology, such as content-adaptive encoding, implemented inside the encoder.
Beamr’s Content-Adaptive Bitrate (CABR) rate-control is an example of advanced technology that brings an additional 20-40% reduction in bitrate. Using HEVC and CABR, a 4K HDR video file can be as small as 10Mbps, an added savings of as much 6Mbps without CABR. With the promise of a 50% bitrate reduction using HEVC, and over 2 billion devices supporting HEVC decoding in hardware, it’s the obvious thing to do for a video service concerned about the sustainability of the Internet.
If a technical integration of a new codec is not possible, the three most popular methods for reducing bitrate are Per-Category, Per-Title, and Per-Frame Encoding optimization.
Per-Category Encoding optimization.
The Per-Category Encoding approach is least practical for premium movies, and TV shows since the range of encoding complexity within a category can vary significantly. Animated videos are typically easier to compress than video captured from a camera sensor, given the wide range of complexity. Animation techniques are highly diverse, from hand-drawn to 2D to 3D, and that makes it challenging to create an encoding ladder that works across animated content equally.
Per-Category Encoding is the easiest of all the methods to implement, but also produces the lowest real bitrate reduction because of the variability of scenes. For example, a sports broadcast may include talking head in-studio shots along with fast action gameplay and slow-motion recaps, each requiring different bitrate values to preserve the quality level.
Per-Title Encoding optimization.
Per-Title Encoding received a big boost when Netflix published a blog post explaining their encoding schema that creates a custom encoding ladder for each video file. The system performs a series of test encodes at different CRF levels and resolutions that are analyzed using the Video Multimethod Assessment Fusion (VMAF) quality metric. Netflix uses the scores to identify the best quality resolution at each applicable data rate.
Though many video services have adopted their variation, Per-Title Encoding, or some variation of it can now be found in many video encoding workflows. It’s a great way to rethink fixed ABR recipes that are the primary source of wasted bandwidth, or poor video quality. Per-Title Encoding only works when you have a smaller library as it requires extensive computing resources to run the hundreds of fractional encodes needed for each title.
Per-Title Encoding helps to reduce bitrate but is limited in its ability since the rudimentary VBR rate-control bounds the encoder QP setting with no additional intelligence.
Per-Frame Encoding optimization.
The weakness of a category or title based optimization method is that this approach cannot adapt to the specific needs of the video at the frame level. Only by steering the encoder decisions frame by frame is it possible to achieve the ultimate result of producing high quality with the least number of bits required.
Beamr’s CABR technology is the primary feature of the Beamr 4x and Beamr 5x encoding engines. CABR operates at the frame level to deliver the smallest possible size for each video frame while ensuring the highest overall quality of each frame within the video sequence. This approach avoids transient quality issues in other optimization techniques. The Beamr Quality Measure Analyzer has a higher correlation with subjective results than existing quality measures such as PSNR, and SSIM. CABR is protected by the majority of Beamr’s 48 granted patents.
To learn more about Beamr’s Content-Adaptive Bitrate technology, you can hear Tamar Shoham, Head of Algorithms at Beamr, explain CABR here.
We must all play our part in preserving the integrity of the Internet.
Just as environmental sustainability is an essential initiative for companies who want to be good citizens of the world, in the COVID-19 world that we are living in, video sustainability is now an equally vital initiative. And this is likely to be unchanged in the future as the work from home and virtual meeting trends continue post COVID-19. Now is the time for the streaming video industry to play an active role in adopting more efficient codecs and content-adaptive bitrate technology so that we can ensure a great user experience without disrupting the Internet.
TL;DR: Beamr CABR operating with the Intel Media SDK hardware encoder powered by Intel GPUs is the perfect video encoding engine for cloud gaming services like Google Stadia. The Intel GPU hardware encoder reaches real-time performance with a power envelope that is 90% less than a CPU based software solution. When combined with Beamr CABR (Content-Adaptive Bitrate) technology, the required bandwidth for cloud gaming is reduced by as much as 49% while delivering higher quality 65% of the time. Using the Intel hardware encoder combined with Beamr CABR enables players to enjoy a gaming experience that is competitive to a console and able to be streamed by cloud gaming platforms. Get more information about how CABR works.
The era of cloud gaming.
With the launch of Google Stadia, we have entered a new era in the games industry called cloud gaming. Just as streaming video services opened media and entertainment content to a broader audience by freeing it from the fixed frameworks of terrestrial (over-the-air), cable, and satellite distribution, so to will cloud gaming open gameplay to a larger audience. Besides extending gameplay to virtually anywhere the user has a network-connected device, the ability for a player to access an extensive library of games without needing to use a specific piece of hardware will push 25.9 million players to cloud gaming platforms by 2023, according to the media research group Kagan.
In addition to opening up gameplay to an “anywhere/anytime” experience. A major user experience benefit of cloud gaming is that players will not necessarily need to purchase a game, but in many cases will be free to access a vast library of their choosing instantaneously. Cloud gaming services promise the quality of a console or PC experience, but without the need to own expensive hardware and the configuration and software installation work that comes with that.
The one constraint that could cause cloud gaming to never catch up with the console experience.
With the wholesale transition of video entertainment content from traditional broadcast and physical media to streaming distribution, it is not hard to project the same pattern will occur for games. Except now, unlike the early days of video streaming where a 3Mbps home Internet connection was “high speed,” and the number of devices able to decode and reliably play back H.264 video was limited, even the lowest cost smartphone can stream video with acceptable quality.
Yet, there is a fundamental constraint that must be overcome for cloud gaming to reach its full market potential, and that is the bandwidth required to deliver a competitive video experience at 1080p60 or 4kp60 resolution. To better understand the bandwidth squeeze that is unique to cloud gaming, let’s examine the data and signal flow.
In FIGURE 1 we see the cloud gaming architecture moves compute-intensive operations, like the graphics rendering engine, to the cloud.
FIGURE 1
Shifting the compute-intensive function to the cloud eliminates device technical capability from being a bottleneck. However, as a result of the video rendering and encoding function not being local to the user, it means the video stream needs to be delivered over the network, with latency in the tens of milliseconds. And, at a framerate that is double the entertainment video frame rate of 24, 25, or 30 frames per second. Additionally, video game resolutions need to be HD with 4K preferable. Also, HDR is an increasingly important capability for many AAA game titles.
None of these requirements are impossible to meet, except as a result of needing fast encoding speed, the encoder must be operated in a mode that makes it difficult to produce high-quality and with small stream size. Because of the added time needed for the encoder to create B frames, and without the benefit of a look-ahead buffer, producing high quality with low bitrate is not possible. Hence why cloud gaming services require a significantly higher bitrate than what is possible with traditional video on demand streaming video services.
Beamr has been innovating in the area of performance, allowing us to encode H.264 and HEVC in software with breathtaking speed, even when running our most advanced Content-Adaptive Bitrate (CABR) rate-control. For video applications where a single encoder can serve hundreds of thousands or even millions of users, the compute requirement to do this in software, given the tremendous benefits of lower bitrate and higher quality, makes it easy to justify. But, in an application like cloud gaming, where the video encoder is matched 1:1 to every user, the computing cost to do this in software makes it uneconomical. The answer is to use a hardware encoder controlled by software, and running a content-adaptive optimization process which can deliver the additional bitrate savings needed.
FIGURE 2 illustrates the required Google Stadia bitrates.
FIGURE 2
The answer is to leverage hardware and software.
The Intel Media SDK and GPU engines occupy a well-established position in the market, with many video services relying on its included HEVC hardware encoder for real-time encoding. However, using the VBR rate-control only, there is a limit to the quality available when bitrate efficiency is essential. The advantage of Beamr’s next-generation rate-control technology, CABR (Content-Adaptive Bitrate), combined with Intel GPUs, is the secret to delivering bitrate efficiency and quality, in real-time, with 90% less power than software alone.
In verified testing, Beamr has shown that the Intel Media SDK hardware encoder controlled by CABR will produce the same perceptual quality as VBR encodes, with a confidence level greater than 95%. Using CABR gives a meaningful impact on user experience. 65% of the time, the player will perceive better quality at the same bandwidth, even while the gaming platform experiences up to a 49% reduction in the bandwidth required to provide the same quality level.
Watch Beamr Founder Sharon Carmel present Beamr CABR integrated with Intel Gen 11 hardware encoder at Intel Experience Day October 29, 2019 in Moscow.
Proof of performance.
As an image science company, Beamr is committed to proof of performance with all claims. For this reason, the industry recognizes that all technology, products, and solutions which carry the Beamr name, represent the pinnacle of quality. For this reason, it was insufficient to integrate CABR with the Intel Media SDK without being able to prove that the original quality of the stream is always preserved and that the user experience is improved. Testing comprised corresponding 10-second segments extracted from clips created with the Intel hardware encoder using VBR, and clips encoded using the Intel hardware encoder but with the integrated Beamr CABR rate-control.
The only way to test perceptual quality is with subjective techniques. We used a process similar to forced-choice double stimulus (FCDS), and closely approximating the ITU BT.500 method. Using the Beamr Auto-VISTA framework, we recruited anonymous viewers from Amazon Mechanical Turk where each viewer was shown corresponding segment pairs and asked to select which video had lower quality. The VBR and CABR encoded files were placed at random on the left and right sides. Validation pairs were used to verify the user’s capabilities with visible artifacts inserted, and only test results for users who correctly answered all four validation pairs were incorporated into the analysis. The viewers had up to five attempts to view the pairs before making a decision. Each viewer watched 20 segment pairs consisting of sixteen actual CABR, and VBR encodes, and four validation pairs.
Games used for testing were: CSGO, Fallout, and GTA5. To reflect realistic bitrates, we only tested the middle four bitrates out of the six bitrates provided. This was because the bitrate for the top layer was very high, and the bottom layer quality was very low. The four bitrates tested were spaced one JND (just noticeable difference) apart. Each target test pair was viewed 13 to 21 times by valid users, with a total of 800 target pair viewings, or about 17 viewings per pair on average. The total number of valid test sessions were 50, completed by more than 40 unique viewers.
Peeling back the data, you will notice that the per-pair statistical distribution is quite symmetrical above and below 50%. With the sampling base, this phenomenon is no surprise; human perception varies. The overall results had 800 views of 48 pairs, which make the statistical certainty higher, indicating that CABR is not compromising perceptual quality.
FIGURE 4 shows CABR encodes had the same perceptual quality as VBR and with a confidence level of more than 95%.
FIGURE 4
Better quality, lower bitrate.
Beamr CABR encoded streams offer higher quality when compared subjectively to a VBR equivalent encode, while offering a bitrate savings of up to 49%. Benefits of CABR for cloud gaming or any live streaming service, are quantified by better quality, greater bandwidth savings, and a reduction in storage cost. For the files that we tested, the aggregated metrics were as follows:
65% of the time, users will experience better quality for a given bandwidth.
40% bandwidth savings on average across all three titles (GTA5 had a savings of 49%).
30% overall storage savings.
FIGURE 5, 6, and 7 illustrate for the three video samples used that for a given User Bandwidth, CABR provides higher quality. You will interpret the chart by observing that where VBR is blue, CABR is BLACK (higher quality), and where VBR is turquoise, CABR is BLUE.
FIGURE 5
FIGURE 6FIGURE 7
Conclusion.
Beamr CABR controlling the Intel Media SDK hardware encoder is the perfect video encoding engine for cloud gaming services like Google Stadia. The Beamr CABR rate-control and optimization process works with all Intel codecs, including AVC, HEVC, VP9, and AV1. All bitstreams produced by the Intel + Beamr CABR solution are fully standard-compliant and work with every player in the field today. Beamr CABR is proven and protected by 46 International patents, meaning there is no other solution that can reduce bitrate by as much as 49% while working in real-time using a closed-loop perceptually aligned quality measure to guarantee the original quality.
The single most important technical hurdle for anyone building or operating a cloud gaming service or platform is the bandwidth consumption required to deliver a player experience on par with the console. Now, with Intel + Beamr CABR, the ideal solution is here; one that can reach the performance and density needed for cloud gaming at scale, so that more players can enjoy a premium gaming experience. Streaming video upended the media and entertainment business, with the rise of Netflix, Hulu, Amazon Prime Video, Disney+, Apple TV Plus, and dozens of other tier-one streaming services. In the same way, cloud gaming will create new service platforms, gaming experiences, and business models.
To experience the power of Beamr CABR controlling the Intel hardware encoder, send an email to info@beamr.com.
It has been two years since we published a comparison of the two leading HEVC software encoder SDKs; Beamr 5, and x265. In this article you will learn how Beamr’s AVC and HEVC software codec SDKs have widened the computing performance gap further over x264 and x265 for live broadcast quality streaming.
Why Performance Matters
With the performance of our AVC (Beamr 4) and HEVC (Beamr 5) software encoders improving several orders of magnitude over the 2017 results, it felt like the right time to refresh our benchmarks, this time with real-time operational data.
It’s no secret that x264 and x265 have benefited greatly, as open-source projects, from having thousands of developers working on the code. This is what makes x264 and x265 a very high bar to beat. Yet even with so many bright and talented engineers donating tens of thousands of development hours to the code base, the architectural constraints of how these encoders were built limit the performance on multicore processors as you will see in the data below.
Creative solutions have been developed which enable live encoding workflows to be built using open-source. But it’s no secret that they come with inherent flaws that include being overly compute intensive and encumbered with quality issues as a result of not being able to encode a full ABR stack, or even a single 4K profile, on a single machine.
The reason this matters is because the resolutions that video services deliver continues to increase. And as a result of exploding consumer demand for video, data consumption is consuming network bandwidth to the point that Cisco reports in 2022, 82% of Internet traffic will be video.
Cisco says in their Visual Networking Index that by 2022 SD resolution video will comprise just 23.4% of Internet video traffic, compared to the 60.2% of Internet video traffic that SD video consumed in 2017. What use to represent the middle-quality tier, 480p (SD), has now become the lowest rung of the ABR ladder for many video distributors.
1080p (HD) will makeup 56.2% of Internet video traffic by 2022, an increase from 36.1% in 2017. And if you thought the resolution expansion was going to end with HD, Cisco is claiming in 2022, 4K (UHD) will comprise 20.3% of all Internet-delivered video.
Live video services are projected to expand 15x between 2017 and 2022, meaning within the next three years, 17.1% of all Internet video traffic will be comprised of live streams.
These trends demonstrate the industry’s need to prepare for this shift to higher resolution video and real-time delivery with software encoding solutions that can meet the requirement for live broadcast quality 4K.
Blazing Software Performance on Multicore Processors
The Beamr 5 software encoder utilizes an advanced thread management architecture. This represents a key aspect of how we can achieve such fantastic speed over x265 at the same quality level.
x265 works by creating software threads and adding them to a thread pool where each task must wait its turn. In contrast, Beamr 5 tracks all the serial dependencies involved with the video coding tasks it must perform, and it creates small micro-tasks which are efficiently distributed across all of the CPU cores in the system. This allows the Beamr codec to utilize each available core at almost 100% capacity.
All tasks added to the Beamr codec thread pool may be executed immediately so that no hardware thread is wasted on tasks where the data is not yet available. Interestingly, under certain conditions, x265 can appear to have higher CPU utilization. But, this utilization includes software threads which are not doing any useful work. This means they are “active” but not processing data that is required for the encoding process.
Adding to Beamr encoders thread efficiency, we have implemented patented algorithms for more effective and efficient video encoding, including a fast motion estimation process and a heuristic early-termination algorithm which enables the encoder to reach a targeted quality using fewer compute resources (cycles). Furthermore, Beamr encoders utilize the latest AVX-512 SIMD instruction set for squeezing even more performance out of advanced CPUs.
The end result of the numerous optimizations found in the Beamr 4 (AVC) and Beamr 5 (HEVC) software encoders is that they are able to operate nearly twice as fast as x264 and x265 at the same quality, and with similar settings and tools utilization.
Video streaming services can benefit from this performance advantage in many ways, such as higher density (more channels per server) which reduces operational costs. To illustrate what this performance advantage can do for you- consider that at the top-end, Beamr 5 is able to encode 4K, 10-bit video at 60 FPS in real-time using just 9 Intel Xeon Scalable cores where x265 is unable to achieve this level of performance with any number of computing cores (at least on a single machine). And, as a result of being twice as efficient, Beamr 4 and Beamr 5 can deliver higher quality at the same computing envelope as x264 and x265.
The Test Results
For our test to be as real-world as possible, we devised two methodologies. In the first, we measured the compute performance of an HEVC ABR stack operating both Beamr 5 and x265 at live speed. And for the second test, our team measured the number of simultaneous live streams at 1080p, comparing Beamr 4 with x264, and Beamr 5 with x265; and for 4K comparing Beamr 5 with x265. All tests were run on a single machine.
Live HEVC ABR Stack: Number of ABR Profiles (Channels)
This test was designed to find the maximum number of full ABR channels which can be encoded live by Beamr 5 and x265 on an AWS EC2 c5.24xlarge instance.
Each AVC channel comprises 4 layers of 8-bit 60 FPS video starting from 1080p, and the HEVC channel comprises either 4 layers of 10-bit 60 FPS video (starting from 1080p), or 5 layers of 10-bit 60 FPS video (starting from 4K).
Live HEVC ABR Stack Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Presets:
Beamr 5: INSANELY_FAST
x265: ultrafast
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all layers): 60 FPS
Encoded Bit Depth (all layers): 10-bit
Encoded Resolutions and Bitrates:
4Kp60@18000 Kbps (only in 4K ABR stack)
1080p60@3750 Kbps
720p60@2500 Kbps
576p60@1250 Kbps
360p@625 Kbps
Live HEVC ABR Stack Test – RESULTS
NOTES:
(1) When encoding 2 full ABR stacks with Beamr 5, 25% of the CPU is unused and available for other tasks.
(2) x265 cannot encode even a single 4K ABR stack channel at 60 FPS. The maximum FPS for the 4K layer of a single 4K ABR stack channel using x265 is 35 FPS.
Live AVC & HEVC Single-Resolution: Number of Channels (1080p & 4K)
In this test, we are trying to discover the maximum number of single-resolution 4K and HD channels that can be encoded live by Beamr 4 and Beamr 5 as compared with x264 and x265, on a c5.24xlarge instance. As with the Live ABR Channels test, the quality between the two encoders as measured by PSNR, SSIM and VMAF was always found to be equal, and in some cases better with Beamr 4 and Beamr 5 (see the “Quality Results” section below).
Live AVC Beamr 4 vs. x264 Channels Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Speeds / Presets:
Beamr 4: speed 3
x264: preset medium
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all channels): 60 FPS
Encoded Bit Depth (all channels): 8-bit
Channel Resolutions and Bitrates:
1080p60@5000 Kbps
Live AVC Beamr 4 vs. x264 Channels Test – RESULTS
Live HEVC Beamr 5 vs. x265 Channels Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Speeds / Presets:
Beamr 5: INSANELY_FAST
x265: ultrafast
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all channels): 60 FPS
Encoded Bit Depth (all channels): 10-bit
Channel Resolutions and Bitrates:
4K@18000 Kbps
1080p60@3750 Kbps
Live HEVC Beamr 5 vs. x265 Channels Test – RESULTS
NOTES:
(1) x265 was unable to reach 60 FPS for a single 4K channel, achieving just 35 FPS at comparable quality.
Quality Comparisons (PSNR, SSIM, VMAF)
Beamr 5 vs. x265
NOTES:
As previously referenced, x265 was unable to reach 4Kp60 and thus PSNR, SSIM, and VMAF scores could not be calculated, hence the ‘N/A’ designation in the 3840×2160 cells.
Video engineers are universally focused on the video encoding pillars of computing efficiency (performance), bitrate efficiency, and quality. Even as technology has enabled each of these pillars to advance with new tool sets, it’s well known that there is still a tradeoff between each that is required.
On one hand, bitrate efficiency requires tools that sap performance, and on the other hand, to reach a performance (speed) target, tools which could positively affect quality cannot be used without harming the performance characteristics of the encoding pipeline. As a result, many video encoding practitioners have adapted to the reality of these tradeoffs and simply accept them for what they are. Now, there is a solution…
The impact of adopting Beamr 4 for AVC and Beamr 5 for HEVC transcends a TCO calculation. With Beamr’s high-performance software encoders, services can achieve bitrate efficiency and performance, all without sacrificing video quality.
The use of Beamr 4 and Beamr 5 opens up improved UX with an increase in resolution or frame-rate which means it is now possible for everyone to stream higher quality video. As the competitive landscape for video delivery services continues to evolve, never has the need been greater for an AVC and HEVC codec implementation that can deliver the best of all three pillars: performance, bitrate efficiency, and quality. With the performance data presented above, it should be clear that Beamr 4 and Beamr 5 continue to be the codec implementations to beat.