AI cabr GPU NVENC Video Encoding Video Quality Video Trends

Is the Future of Video Processing Destined for GPU?

Posted on March 17, 2025by Sharon Carmel

My Journey Through the Evolution of Video Processing: From Low-Quality Streaming to HD and 4K Becoming a Commodity, and Now the AI-Powered Video Revolution

Digital video has been my primary focus for the past three decades. I have built software codecs, designed ASICs, and now optimize GPU encoders with advanced GPU software.

My journey in video processing has been transformative, starting with low-resolution streaming, advancing into HD and 4K, as they shift from a rare event to an everyday expectation. And now, we stand at the next frontier—AI is redefining how we create, deliver, and experience video like never before.

My journey into this field began with the introduction of QuickTime 1.0 in 1991, when I was in my 20s. It looked to me like magic — a compressed movie playing smoothly on a single-speed CD-ROM (150 KB/s, 1.2 Mbps). At the time, I had no understanding of video encoding, but I was fascinated. At that moment I knew this is the field I wish to dive into.

Apple QuickTime Version 1.0 Demo

Chapter 1: The Challenge of Streaming Video with Low-Resolution, Low-Quality Videos

The early days of streaming, in the mid 90s, were characterized by low-resolution video, low frame rates (12-15 fps), and low bitrates — 28.8 kbps, 33 kbps, or 56 kbps — two to three orders of magnitude 100x – 1000x lower bitrate than today’s standards. This was the reality of digital video in 1996 and the years followed.

By 1996, I was one of 4 co-founders of Emblaze – We developed a vector-based graphic tool called “Emblaze Creator” – think of it as Adobe Flash before Adobe Flash.

We soon realized we needed video support. We started by downloading videos in the background. Obviously, the longer the video was, the more time it took to download, which was frustrating to wait for. So we limited the videos to just 30 seconds.

Early solutions, like RealNetworks and VideoNet, required dedicated video servers — an expensive and complex infrastructure. It seemed to me like a very long and costly journey to streaming enablement.

Adding video to our offerings quickly was crucial for our company’s survival, so we persistently tackled this challenge. I remember the nights spent experimenting and exploring solutions, but all paths seemed to converge on the RealNetworks approach, which we couldn’t adopt in the short term.

We had to find a way to solve the challenge of streaming video efficiently for very low bandwidth. And while it was hard to stream files, you could slice them. So in 1997, I came up with an idea and worked with my team at Emblaze on the following solution:

Take a video file and divide it into numbered slices.
Create an index file with the order of the slices, and place it on a standard HTTP server.
The player will read that index file and pull the segments from a web server in the order as in the index file.

Just to make it more real, here is the patent we submitted in 1998, and granted in 2002:

But that was not enough, why not create time synchronized slices, so the player will be able to pull the optimal chucks based on the specific bandwidth characteristics when playing the files?

The player will read the index file from the server and choose a level to read, decide on a slice, and based on the bitrate move up and down the bitrate ladder.

If that reminds you of HLS – then it was HLS many years before HLS was out.

We demonstrated this live with EarthLink at the Easter Egg Roll at the White House in 1998. Our systems were made of H.263 and then H.264 encoders, and a patented streaming protocol. We had a track with 10 Compaq workstations running 8 cameras that day.

When you build a streaming solution, you need a player. Without it, all that effort is meaningless. At Emblaze, we had a Java-based player that required no installation—a major advantage at the time.

Back then, mobile video was in its infancy, and we saw an opportunity. Feature phones simply couldn’t play video, but the Nokia Communicator 9110 could. It had everything—a grayscale screen, a 33MHz 32-bit CPU, and wireless data access—a powerhouse by late ‘90s standards.

In 1999, I demonstrated a software video decoder running on the Nokia 9110 to Samsung Mobile CEO. This was a game-changer—it proved that video streaming on mobile devices was possible. Samsung, being a leader in CDMA 2000, wanted to showcase this capability at the 2000 Olympics and needed working prototypes.

Samsung challenged us to build a mobile ASIC capable of decoding streaming video on just 100mW of power. We delivered. The solution was announced at the Olympics, and by 2001, it was in mass production.

This phone featured The Emblaze Multimedia Application Co-Processor, working alongside the baseband chip to enable seamless video playback over CDMA 2000 networks—a groundbreaking achievement at the time.

Chapter 2: HD Becomes the Standard, 4K HDR Becomes Common

HD television was introduced in the U.S. during the second half of the 90s, but it wasn’t until 2003 that satellite and cable providers really started broadcasting in HD.

I still remember 2003, staying at the Mandarin Oriental Hotel in NYC, where I had a 30-inch LCD screen with HD broadcasting. Standing close to the screen, taking in the crisp detail, was an eye-opening moment—the clarity, the colors, the sharpness. It was a huge leap forward from standard definition, and definitely better than DVDs.

But even then, it felt like just the beginning. HD was here, but it wasn’t everywhere yet. It took a few more years for Netflix to introduce streaming.

Beamr is Born

In early 2008, the startup I led, which focused on online backup, was acquired. By the end of the year, I found myself out of work. And so, I sent an email to Steve Jobs, pointing out that Time Machine’s performance was lacking, and that I believed I could help fix it. That email led to a meeting in Cupertino with the head of MobileMe—what we now know as iCloud.

That visit to Apple in early 2009 was fascinating. I learned that storing iPhone photos was becoming an enormous challenge. The sheer volume of images was straining Apple’s data centers, and they were running into power limitations just to keep up with demand.

With this realization, Beamr was born!

The question that intrigued us was: Can we make images smaller, while making sure they look exactly the same?

After about one year of research, we ended up founding Beamr instead of becoming a part of MobileMe. And the leader of this – the brains behind our technology that is here today.

During the first year of Beamr, we explored this idea. And we came out with our first product called JPEGmini, which does exactly that. This was achieved through the amazing innovation of our wonderful CTO, Tamar Shoham.

JPEGmini is a wonderful tool, and hundreds of thousands of content creators around the world use it.

After optimizing photos, we wanted to take on video compression. That’s when we developed our gravity defier—CABR, Content Adaptive BitRate technology. This quality-driven process can cut every high-quality video by 30% to 50% while preserving every frame’s visual integrity.

But our innovation comes with challenges:

Lightning-fast encoding without CABR, but with CABR it is slower and can’t run live at 4Kp60.
Running CABR is more expensive than non-CABR encoding.

In the year 2018, we came to the conclusion that we needed a hardware acceleration solution – to improve our density, our speed and the cost of processing.

We started by integrating with Intel GPUs, and it worked very well. We even demoed it at Intel Experience Day in 2019.

We had wonderful relationships with Intel and they had a good video encoding engine. We invested about two years of effort, and it did not materialize as an Intel GPU for the Data Center didn’t happen – a wasted opportunity.

Then, we thought of developing our own chip:

Its power will be a function of CPU or GPU
We will be able to put four 8Kp60 CABR chips on a single PCI card (for AC/HEVC and AV1).
It will cost less than a GPU and have 3X density.

Here’s a slide that shows that we were serious. We also started a discussion about raising funds to build that chip using 12nm technology.

But then, we looked at our plan and wondered: does this chip support the needs of the future?

How would you innovate on this platform?
What if you would like to run smarter algorithms or a new version of CABR?
Our design included programmable parts for customization. We even thought of adding GPU cores – but who is going to develop for it?

This was a key moment in 2020, when we understood that innovation is so fast that every silicon generation takes at least two years to build and that is too slow.

There is a scale that VPU solutions will be more efficient than GPU, but that cannot compete with the current pace of change. It may come that even the biggest social networks will abandon VPUs due to the need for AI and video to work together.

Chapter 3: GPUs and the Future of Video Processing

By 2021, NVIDIA invited us to bring CABR to GPUs. This was a three-year journey, requiring a complete rewrite of our technology for NVENC. NVIDIA fully supported us, integrating CABR into all encoding modes across AVC, HEVC, and AV1.

In May 2023, the first driver was out: NVENC SDK 12.1!

At the same time, Beamr went public on NASDAQ (under the ticker BMR), on the premise of a high-quality large-scale video encoding platform enabled on NVIDIA GPUs.

Since September 2024. Beamr CABR is running LIVE video optimization on NVIDIA GPUs at 4Kp60 across 3 codecs AVC, HEVC and AV1. It is 10X faster at 1/10 of the cost for AVC, and the ratio for HEVC is double – and you can double that again for AV1.

All of our challenges for bringing CABR to the masses are solved.

But the story doesn’t end here.

What we didn’t fully anticipate was how AI-driven innovation is transforming the way we interact with video, and the opportunities are even greater than we imagined, thanks to the shift to GPUs.

Let me give you a couple of examples:

In the last Olympics, I was watching windsurfing, and on-screen, I saw a real-time overlay showing the planned routes of each surfer, the wind speed and forward tactics, and the predictions on how they would converge at the finish line.

It was seamless, intuitive, and AI-driven—a perfect example of how AI enriches the viewing experience.

Or think about social media: AI plays a huge role in processing video behind the scenes. As videos are uploaded, VPUs (Video Processing Units) handle encoding, while AI algorithms simultaneously analyze content—deciding whether it’s appropriate, identifying trends, and determining who should see it.

But the processes used by many businesses are slow and inefficient. For every AI-powered video workflow, you need:

Load the video.
Decode it.
Process it (either for AI analysis or encoding).
Sync and converge the process.

Traditionally, these steps happened separately, often with significant latency.

But on a GPU?

Single load, single decode, shared memory buffer.
AI and video processing run in parallel.
Everything is synced and optimized.

And just like that—you’re done. It’s faster, more efficient, and more cost-effective. This is the winning architecture for the future of AI and video at scale.

Want to learn more? Start a free trial with Beamr Cloud or Talk to us!

AI AOM AV1 cabr NVIDIA Oracle Cloud Infrastructure (OCI)

Beamr Now Offering Oracle Cloud Infrastructure Customers 30% Faster Video Optimization

Posted on June 23, 2024by Tamar Shoham

Beamr’s Content Adaptive Bit Rate solution enables significantly decreasing video file size or bitrates without changing the video resolution or compromising perceptual quality. Since the optimized file is fully standard compliant, it can be used in your workflow seamlessly, whatever your use case, be it video streaming, playback or even part of an AI workflow.

Beamr first launched Beamr cloud earlier this year, and we are now super excited to announce that our valued partnership with Oracle Cloud Infrastructure (OCI) is enabling us to offer to OCI customers more features and better performance.

The performance improvements are due in part to the availability of the powerful NVIDIA L40S GPUs on OCI. In preliminary testing we found that running our video encoding workflows can be up to 30% faster when using these cards, than when running on the cards we currently use in the Beamr Cloud solution.

This was derived from testing AVC and HEVC NVENC driven encodes for a set of nine 1080p classic test clips with eight different configurations, and comparing encoding wall times on an A10G vs. a L40S GPU. Speedup factors of up to 55% were observed, with an average just above 30%. The full test data is available here.

Another exciting feature about these cards is that they support AV1 encoding, which means Beamr Cloud will now offer to turn your videos into optimized AV1 encodes, offering even higher bitrate/file size savings.

What’s the fuss about AV1?

In order to store and transmit video, substantial compression is needed. From the very earliest efforts to standardize video compression in the 90s, there has been a constant effort to create video compression standards offering increasing efficiency – meaning that the same video quality can be achieved with smaller files or lower bitrates.

As shown in the schematic illustration below, AV1 has come a long way in improving over H.264/AVC, the most widely adopted standard today, despite being 20 years old. However, the increased compression efficiency is not free – the computational complexity of newer codecs is also significantly higher, motivating the adoption of hardware accelerated encoding options.

With the demand and need for Video AI workflows continuing to rise, the ability to perform fully automatic, fast, efficient, optimized video encoding is an important enabler.

The Beamr GPU powered video compression and optimization occur within the GPU on OCI, right at the heart of these AI workflows, making them extremely well placed to offer benefits to such workflows. We have previously shown in a number of case studies that there is no negative impact on inference or training results when using the optimized files – making the integration of this optimization process into AI workflows a natural choice for cost savvy developers.

AOM AV1 Broadcast cabr Content-Adaptive Live NVIDIA

Real-time Video Optimization with Beamr CABR and NVIDIA Holoscan for Media

Posted on April 15, 2024by Dan Julius

This year at the NAB Show 2024 in Las Vegas, we are excited to demonstrate our Content-Adaptive Bitrate (CABR) technology on the NVIDIA Holoscan for Media platform. By implementing CABR as a GStreamer plugin, we have, for the first time, made bitrate optimization of live video streams easily achievable in the cloud or premise.

Building on the NVIDIA DeepStream software development kit, which can extends GStreamer’s capabilities, significantly reduced the amount of code required to develop the Holoscan for Media based application. Using DeepStream components for real-time video processing and NMOS (Networked Media Open Specifications) signaling, we were able to keep our focus on the CABR technology and video processing.

The NVIDIA DeepStream SDK provides an excellent framework for developers to build and customize dynamic video processing pipelines. DeepStream provides pipeline components that make it very simple to build and deploy live video processing pipelines that utilize the hardware decoders and encoders available on all NVIDIA GPUs.

Beamr CABR dynamically adjusts video bitrate in real-time, optimizing quality and bandwidth use. It reduces data transmission without compromising video quality, making the video streaming more efficient. Recently we released our GPU implementation which uses the NVIDIA NVENC, encoder, providing significantly higher performance compared to previous solutions.

Taking our GPU implementation for CABR to the next level, we have built a GStreamer Plugin. With our GStreamer Plugin, users can now easily and seamlessly incorporate the CABR solution into their existing DeepStream pipelines as a simple drop-in replacement to their current encoder component.

Holoscan For Media

A GStreamer Pipeline Example

To illustrate the simplicity of using CABR, consider a simple DeepStream transcoding pipeline that reads and writes from files.

Simple DeepStream Pipeline:

gst-launch-1.0 -v \
  filesrc location="video.mp4" ! decodebin ! nvvideoconvert ! queue \
  nvv4l2av1enc bitrate=4500 ! mp4mux ! filesink location="output.mp4"

By simply replacing the nvv4l2av1enc component with our CABR component, the encoding bitrate is adapted in real-time, according to the content, ensuring optimal bitrate usage for each frame, without any loss of perceptual quality.

CABR-Enhanced DeepStream Pipeline:

gst-launch-1.0 -v \
  filesrc location="video.mp4" ! decodebin ! nvvideoconvert ! queue \
  beamrcabvav1 bitrate=4500 ! mp4mux ! filesink location="output_cabr.mp4"

Similarly, we can replace the encoder component used in a live streaming pipeline with the CABR component to optimize live video streams, dynamically adjusting the output bitrate and offering up to a 50% reduction in data usage without sacrificing video quality.

Simple DeepStream Pipeline:

gst-launch-1.0 -v \
  rtmpsrc location=rtmp://someurl live=1 ! decodebin ! queue ! \ 
  nvvideoconvert ! queue ! nvv4l2av1enc bitrate=3500 ! \
  av1parse ! rtpav1pay mtu=1300 ! srtsink uri=srt://:8888

CABR-Enhanced DeepStream Pipeline:

gst-launch-1.0 -v \
  rtmpsrc location=rtmp://someurl live=1 ! decodebin ! queue ! \
  nvvideoconvert ! queue ! beamrcabrav1 bitrate=3500 ! \
  av1parse ! rtpav1pay mtu=1300 ! srtsink uri=srt://:8888

The Broad Horizons of CABR Integration in Live Media

Beamr CABR, demonstrated using NVIDIA Holoscan for Media at NAB show, marks just the beginning. This technology is an ideal fit for applications running on NVIDIA RTX GPU-powered accelerated computing and sets a new standard for video encoding.

Lowering the video bitrate reduces the required bandwidth when ingesting video to the cloud, creating new possibilities where high resolution or quality were previously costly or not even possible. Similarly, reduced bitrate when encoding on the cloud allows for streaming of higher quality videos at lower cost.

From file-based encoding to streaming services — the potential use cases are diverse, and the integration has never before been so simple. Together, let’s step into the future of media
streaming, where quality and efficiency coexist without compromise.

cabr Cloud Gaming Codec Performance Content-Adaptive Intel Gen 11 Graphics Intel Media SDK

How To Cut Cloud Gaming Bitrates In Half So That Twice As Many Users Can Play

Posted on November 19, 2019by Dror Gill

TL;DR: Beamr CABR operating with the Intel Media SDK hardware encoder powered by Intel GPUs is the perfect video encoding engine for cloud gaming services like Google Stadia. The Intel GPU hardware encoder reaches real-time performance with a power envelope that is 90% less than a CPU based software solution. When combined with Beamr CABR (Content-Adaptive Bitrate) technology, the required bandwidth for cloud gaming is reduced by as much as 49% while delivering higher quality 65% of the time. Using the Intel hardware encoder combined with Beamr CABR enables players to enjoy a gaming experience that is competitive to a console and able to be streamed by cloud gaming platforms. Get more information about how CABR works.

The era of cloud gaming.

With the launch of Google Stadia, we have entered a new era in the games industry called cloud gaming. Just as streaming video services opened media and entertainment content to a broader audience by freeing it from the fixed frameworks of terrestrial (over-the-air), cable, and satellite distribution, so to will cloud gaming open gameplay to a larger audience. Besides extending gameplay to virtually anywhere the user has a network-connected device, the ability for a player to access an extensive library of games without needing to use a specific piece of hardware will push 25.9 million players to cloud gaming platforms by 2023, according to the media research group Kagan.

In addition to opening up gameplay to an “anywhere/anytime” experience. A major user experience benefit of cloud gaming is that players will not necessarily need to purchase a game, but in many cases will be free to access a vast library of their choosing instantaneously. Cloud gaming services promise the quality of a console or PC experience, but without the need to own expensive hardware and the configuration and software installation work that comes with that.

The one constraint that could cause cloud gaming to never catch up with the console experience.

With the wholesale transition of video entertainment content from traditional broadcast and physical media to streaming distribution, it is not hard to project the same pattern will occur for games. Except now, unlike the early days of video streaming where a 3Mbps home Internet connection was “high speed,” and the number of devices able to decode and reliably play back H.264 video was limited, even the lowest cost smartphone can stream video with acceptable quality.

Yet, there is a fundamental constraint that must be overcome for cloud gaming to reach its full market potential, and that is the bandwidth required to deliver a competitive video experience at 1080p60 or 4kp60 resolution. To better understand the bandwidth squeeze that is unique to cloud gaming, let’s examine the data and signal flow.

In FIGURE 1 we see the cloud gaming architecture moves compute-intensive operations, like the graphics rendering engine, to the cloud.

Shifting the compute-intensive function to the cloud eliminates device technical capability from being a bottleneck. However, as a result of the video rendering and encoding function not being local to the user, it means the video stream needs to be delivered over the network, with latency in the tens of milliseconds. And, at a framerate that is double the entertainment video frame rate of 24, 25, or 30 frames per second. Additionally, video game resolutions need to be HD with 4K preferable. Also, HDR is an increasingly important capability for many AAA game titles.

None of these requirements are impossible to meet, except as a result of needing fast encoding speed, the encoder must be operated in a mode that makes it difficult to produce high-quality and with small stream size. Because of the added time needed for the encoder to create B frames, and without the benefit of a look-ahead buffer, producing high quality with low bitrate is not possible. Hence why cloud gaming services require a significantly higher bitrate than what is possible with traditional video on demand streaming video services.

Beamr has been innovating in the area of performance, allowing us to encode H.264 and HEVC in software with breathtaking speed, even when running our most advanced Content-Adaptive Bitrate (CABR) rate-control. For video applications where a single encoder can serve hundreds of thousands or even millions of users, the compute requirement to do this in software, given the tremendous benefits of lower bitrate and higher quality, makes it easy to justify. But, in an application like cloud gaming, where the video encoder is matched 1:1 to every user, the computing cost to do this in software makes it uneconomical. The answer is to use a hardware encoder controlled by software, and running a content-adaptive optimization process which can deliver the additional bitrate savings needed.

FIGURE 2 illustrates the required Google Stadia bitrates.

The answer is to leverage hardware and software.

The Intel Media SDK and GPU engines occupy a well-established position in the market, with many video services relying on its included HEVC hardware encoder for real-time encoding. However, using the VBR rate-control only, there is a limit to the quality available when bitrate efficiency is essential. The advantage of Beamr’s next-generation rate-control technology, CABR (Content-Adaptive Bitrate), combined with Intel GPUs, is the secret to delivering bitrate efficiency and quality, in real-time, with 90% less power than software alone.

In verified testing, Beamr has shown that the Intel Media SDK hardware encoder controlled by CABR will produce the same perceptual quality as VBR encodes, with a confidence level greater than 95%. Using CABR gives a meaningful impact on user experience. 65% of the time, the player will perceive better quality at the same bandwidth, even while the gaming platform experiences up to a 49% reduction in the bandwidth required to provide the same quality level.

Watch Beamr Founder Sharon Carmel present Beamr CABR integrated with Intel Gen 11 hardware encoder at Intel Experience Day October 29, 2019 in Moscow.

Proof of performance.

As an image science company, Beamr is committed to proof of performance with all claims. For this reason, the industry recognizes that all technology, products, and solutions which carry the Beamr name, represent the pinnacle of quality. For this reason, it was insufficient to integrate CABR with the Intel Media SDK without being able to prove that the original quality of the stream is always preserved and that the user experience is improved. Testing comprised corresponding 10-second segments extracted from clips created with the Intel hardware encoder using VBR, and clips encoded using the Intel hardware encoder but with the integrated Beamr CABR rate-control.

The only way to test perceptual quality is with subjective techniques. We used a process similar to forced-choice double stimulus (FCDS), and closely approximating the ITU BT.500 method. Using the Beamr Auto-VISTA framework, we recruited anonymous viewers from Amazon Mechanical Turk where each viewer was shown corresponding segment pairs and asked to select which video had lower quality. The VBR and CABR encoded files were placed at random on the left and right sides. Validation pairs were used to verify the user’s capabilities with visible artifacts inserted, and only test results for users who correctly answered all four validation pairs were incorporated into the analysis. The viewers had up to five attempts to view the pairs before making a decision. Each viewer watched 20 segment pairs consisting of sixteen actual CABR, and VBR encodes, and four validation pairs.

Games used for testing were: CSGO, Fallout, and GTA5. To reflect realistic bitrates, we only tested the middle four bitrates out of the six bitrates provided. This was because the bitrate for the top layer was very high, and the bottom layer quality was very low. The four bitrates tested were spaced one JND (just noticeable difference) apart. Each target test pair was viewed 13 to 21 times by valid users, with a total of 800 target pair viewings, or about 17 viewings per pair on average. The total number of valid test sessions were 50, completed by more than 40 unique viewers.

Peeling back the data, you will notice that the per-pair statistical distribution is quite symmetrical above and below 50%. With the sampling base, this phenomenon is no surprise; human perception varies. The overall results had 800 views of 48 pairs, which make the statistical certainty higher, indicating that CABR is not compromising perceptual quality.

FIGURE 4 shows CABR encodes had the same perceptual quality as VBR and with a confidence level of more than 95%.

Better quality, lower bitrate.

Beamr CABR encoded streams offer higher quality when compared subjectively to a VBR equivalent encode, while offering a bitrate savings of up to 49%. Benefits of CABR for cloud gaming or any live streaming service, are quantified by better quality, greater bandwidth savings, and a reduction in storage cost. For the files that we tested, the aggregated metrics were as follows:

65% of the time, users will experience better quality for a given bandwidth.
40% bandwidth savings on average across all three titles (GTA5 had a savings of 49%).
30% overall storage savings.

FIGURE 5, 6, and 7 illustrate for the three video samples used that for a given User Bandwidth, CABR provides higher quality. You will interpret the chart by observing that where VBR is blue, CABR is BLACK (higher quality), and where VBR is turquoise, CABR is BLUE.

Conclusion.

Beamr CABR controlling the Intel Media SDK hardware encoder is the perfect video encoding engine for cloud gaming services like Google Stadia. The Beamr CABR rate-control and optimization process works with all Intel codecs, including AVC, HEVC, VP9, and AV1. All bitstreams produced by the Intel + Beamr CABR solution are fully standard-compliant and work with every player in the field today. Beamr CABR is proven and protected by 46 International patents, meaning there is no other solution that can reduce bitrate by as much as 49% while working in real-time using a closed-loop perceptually aligned quality measure to guarantee the original quality.

The single most important technical hurdle for anyone building or operating a cloud gaming service or platform is the bandwidth consumption required to deliver a player experience on par with the console. Now, with Intel + Beamr CABR, the ideal solution is here; one that can reach the performance and density needed for cloud gaming at scale, so that more players can enjoy a premium gaming experience. Streaming video upended the media and entertainment business, with the rise of Netflix, Hulu, Amazon Prime Video, Disney+, Apple TV Plus, and dozens of other tier-one streaming services. In the same way, cloud gaming will create new service platforms, gaming experiences, and business models.

To experience the power of Beamr CABR controlling the Intel hardware encoder, send an email to info@beamr.com.

cabr Content-Adaptive quality measure

The Patented Visual Quality Measure that was Designed to Drive Higher Compression Efficiency

Posted on September 11, 2019by Tamar Shoham

At the heart of Beamr’s closed-loop content-adaptive encoding solution (CABR) is a patented quality measure. This measure compares the perceptual quality of each candidate encoded frame to the initial encoded frame. The quality measure guarantees that when the bitrate is reduced the perceptual quality of the target encode is preserved. In contrast to general video quality measures – which aim to quantify any difference between video streams resulting from bit errors, noise, blurring, change of resolution, etc. – Beamr’s quality measure was developed for a very specific task. It reliably and quickly quantifies the perceptual quality loss introduced in a video frame due to artifacts of block-based video encoding. In this blog post, we present the components of our patented video quality measure, as shown in Figure 1.

Pre-analysis

Before determining the quality of an encoded frame, the quality measure component performs some pre-analysis on the source and initial encoded frames to extract data used in the quality measure calculation and to collect information used to configure the quality measure. The analysis consists of two parts, where part I of the analysis is performed on the source frame and part II of the analysis is performed on an initial encoded frame.

beamr closed loop perceptual quality measure functional block diagram

Figure 1. A block diagram of the video quality measure used in Beamr’s CABR engine

The goal of part I of the pre-analysis is to characterize the content, the frame, and areas of interest within a given frame. In this phase, we can determine whether the frame has skin and face areas, rich chroma information typical of 3D animation, or highly localized movement with static background, found in cell animation content. The algorithms used are designed for low CPU overhead. For example, our facial detection algorithm applies a full detection mechanism at scene changes and a unique, low complexity adaptive-tracking mechanism in other frames. For skin detection, we use an AdaBoost classifier, which we trained on a marked dataset we created. The classifier uses YUV pixel values and 4×4 Luma variance values input. At this stage, we also calculate the edge map which we employ in the Edge-Loss-Factor score component described below.

Part II of the pre-analysis is used to analyze the characteristics of the frame after the initial encoding. In this phase, we may determine if the frame has grain and estimate the amount of grain, and use it to configure the quality measure calculation. We also collect information about the complexity of each block, which is indicated, for example, by the bit usage and block quantization level used to encode each block. At this stage, we also calculate the density of local textures in each block or area of the frame, which is used for the texture preservation score component described below.

Quality Measure Process and Components

The quality measure evaluates the quality of a target frame when compared to a reference frame. In the context of CABR, the reference frame is the initial encoded frame and the target frame is the candidate frame of a specific iteration. After performing the two phases of the pre-analysis, we proceed to the actual quality measure calculation, which is described next.

Tiling

After completing the two phases of the pre-analysis stage, each of the reference and target frames is partitioned into corresponding tiles. The location and dimensions of these tiles are adapted according to the frame resolution and other frame characteristics. For example, we will use smaller tiles in a frame which has highly localized motion. Tiles are also sometimes partitioned further into sub-tiles, for at least some of the quality measure components. A quality metric score is calculated for each tile, and these per-tile scores are perceptually pooled to obtain a frame quality score.

The quality score for each tile is calculated as a weighted geometric average of the values calculated for each quality measure component. The components include a local similarity component which determines a pixel-wise difference, an added artifactual edges component, a texture distortion component, an edge loss factor, and a temporal component. We now provide a brief review of these five components of Beamr’s quality measure.

Local Similarity

The local similarity component evaluates the level of similarity between pixels at the same position in the reference and target tiles. This component is somewhat similar to PSNR, but uses adaptive sub-tiling, pooling, and thresholding, to provide results that are more perceptually oriented than regular PSNR. In some cases, such as when pre-analysis determined that the frame contains rich chroma content, the calculation of pixel similarity for chroma planes is also included in this component, but in most cases, only luma is used. For each sub-tile, regular PSNR is calculated. To give greater weight to low-quality sub-tiles, which are located in tiles that have far superior quality, we perform the pooling using only values which are below a threshold that depends on the lowest sub-tile PSNR values. This can happen when there are changes only in a small area, even just a few pixels. We then scale the pooled value using a factor which is adapted according to the level of brightness in the tile, since distortion in dark areas is more perceptually disturbing than in bright areas. Finally, we clip the local similarity component score so that it lies in the range [0,1], where 1 indicates that the target and reference tiles are perceptually identical.

Added Artifactual Edges (AAE)

The Added Artifactual Edges score component evaluates additional blockiness introduced in the target tile compared to reference tile. Blockiness in video coding is a well-known artifact introduced by the independent encoding done on each block. Many previous attempts have been made to avoid this blockiness artifact, mainly using de-blocking filters which are integral parts of modern video encoders such as AVC and HEVC. However, our focus in the AAE component is to quantify the extent of this artifact rather than eliminate it. Since we are interested only in the added blockiness in the target frame relative to the reference frame, we evaluate this component of the quality measure on the difference between the target and reference frames. For each horizontal and vertical coding block boundary in the difference block, we evaluate the change or gradient across the coding block border and compare it to the local gradient within the coding block on either side. For example, for AVC encoding this is done along the 16×16 grid of the full-frame. We apply soft thresholding to the blockiness value, using adaptive threshold values, adapted according to information from the pre-analysis stage. For example, in an area recognized as skin, where human vision is more sensitive to artifacts, we will use tighter thresholds so that mild blockiness artifacts are more heavily penalized. These calculations result in an AAE scores map, containing values in the range of [0, 1] for each horizontal and vertical block border point. We average the values per block border, and then average these per-block-border average values, excluding or giving low weight to block borders with no added blockiness. The value is then scaled according to the percent of extremely disturbing blockiness artifacts, i.e. cases where the original blockiness value prior to thresholding was very high, and finally is clipped to the range [0,1] with 1 indicating no added artifactual edges in the target tile relative to the reference tile.

Texture Distortion

The texture distortion score component quantifies how well texture is preserved in the target tile. Most block-based codecs, including AVC and HEVC, use a frequency transform such as DCT and perform quantization of the transform coefficients, usually applying more aggressive quantization to the high-frequency components. This can cause two different textural artifacts. The first artifact is a loss of texture detail, or over-smoothing, due to loss of energy in high-frequency coefficients. The second artifact is known as “ringing,” and is characterized by the noise around edges or sharp changes in the image. Both these artifacts cause a change in the local variance of the pixel values: over-smoothing causes a decrease in pixel variance, while added ringing or other high-frequency noise, causes an increase in pixel variance. Therefore, we measure the local deviation, in corresponding blocks in the reference and target frame tiles, and compare their values. This process yields a texture tile score in the range [0,1] with 1 indicating no visible texture distortion in the target image tile.

Temporal consistency

The temporal score component evaluates the preservation of temporal flow in the target video sequence compared to the temporal flow in the reference video sequence. This is the only component of the quality measure that also requires the preceding target and reference frames to be leveraged. In this component, we measure two kinds of changes: “new” information introduced in the reference frame which is missing in the target frame, and “new” information in target frame where there was no “new” information in the reference frame. In this context, “new” information refers to information that exists in the current frame but doesn’t exist in the preceding frame. We calculate the Sum of Absolute Differences (SAD) between each co-located 8×8 block in the reference frame and the preceding reference frame, and the SAD between each co-located 8×8 block in the target frame and the preceding target frame. The local (8×8) score is derived from the relation between these two SAD values, and also according to the value of the reference SAD, which indicates whether the block is dynamic or static in nature. Figure 2 illustrates the value of the local score for different combinations of the reference and target SAD values. After all local temporal scores are calculated, they are pooled to obtain a tile temporal score component in the range [0,1].

Figure 2. local temporal score as a function of reference SAD and target SAD values

Edge Loss Factor (ELF)

The Edge Loss Factor score component reflects how well edges in the reference image are preserved in the target image. This component uses the input image edge map, generated during part I of the pre-analysis. In part II of the pre-analysis, the strength of the edge at each edge point in the reference frame is calculated, as the most substantial absolute difference between the edge pixel value and its 8 closest neighbors. We can optionally discard pixels which are considered false edges, by comparing the reference frame edge strength of the pixel to a threshold, which can be adapted, for example, to be higher in a frame which contains film grain. Once values for all edge pixels have been accumulated the final value is scaled to provide an ELF tile score component, in the range [0,1] with 1 indicating perfect edge preservation.

Combining the Score Components

The five tile score components described above are combined into a tile score using weighted geometric averaging, where the weights can be adapted according to the codec used or according to the pre-analysis stage. For example, in codecs with good in-loop deblocking filters we can lower the weight of the blockiness component, while in frames with high levels of film grain (as determined by the pre-analysis stage) we can reduce the weight of the texture distortion component.

Tile Pooling

In the final step of the frame quality score calculation, the tile scores are perceptually pooled to yield a single frame score value. The perceptual pooling uses weights which are dependent on importance (derived from the pre-analysis stages, such as the presence of face and/or skin in the tile), and on the complexity of blocks in the tile compared to average complexity of the frame. The weights are also dependent on tile score values – we give more weight to low scoring tiles, in the same way, human viewers are drawn to quality drops even if they occur in isolated areas.

Score Configurator

The score configurator block is used to configure the calculations for different use cases. For example, in implementations where latency or performance are tightly bounded, the configurator can apply a fast score calculation which skips some of the stages of pre-analysis and uses a somewhat reduced complexity score. To still guarantee a perceptually identical result, the score calculated in this fast mode can be scaled or compensated to account for the slightly lower perceptual accuracy, and this scaling may in some cases slightly reduce savings.

To learn more about CABR, continue reading “A Deep Dive into CABR, Beamr’s Content-Adaptive Rate Control.”

Authors: Dror Gill & Tamar Shoham

Beamr 5 cabr Content-Adaptive

A Deep Dive into CABR, Beamr’s Content-Adaptive Rate Control

Posted on September 11, 2019by Tamar Shoham

Going Inside Beamr’s Frame-Level Content-Adaptive Rate Control for Video Coding

When it comes to video, the tradeoff between quality and bitrate is an ongoing dance. Content producers want to maximize quality for viewers, while storage and delivery costs drive the need to reduce bitrate as much as possible. Content-adaptive encoding addresses this challenge, by striving to reach the “optimal” bitrate for each unique piece of content, be it a full clip or a single scene. Our CABR technology takes it a step further by adapting the encoding at the frame level. CABR is a closed-loop content-adaptive rate control mechanism enabling video encoders to lower the bitrate of their encode, while simultaneously preserving the perceptual quality of the higher bitrate encode. As a low-complexity solution, CABR also works for live or real-time encoding.

All Eyes are on Video

According to Grand View Research, the global video streaming market is expected to grow at a CAGR of 19.6% from 2019 to 2025. This shift, fueled by the increasing popularity of direct-to-consumer streaming services such as Netflix, Amazon and Hulu, the growth of video on social media networks and user-generated video platforms such as Facebook and YouTube, and other applications like online education & video surveillance, has all eyes on video workflows. Therefore, efficient video encoding, in terms of encoding and delivery costs, and meeting the viewer’s rising quality expectations, are at the forefront of video service provider’s minds. Beamr’s CABR solution can reduce bitrates without compromising quality while keeping a low computational overhead to enhance video services.

Comparing Content-Adaptive Encoding Solutions

Instead of using fixed encoding parameters, content-adaptive encoding configures the video encoder according to the content of the video clip to reach the optimal tradeoff between bitrate and quality. Various content-adaptive encoding techniques have been used in the past to provide a better user experience with reduced delivery costs. Some of them have been entirely manual, where encoding parameters are hand-tuned for each content category and sometimes, like in the case of high-volume Blu-ray titles, at the scene level. Manual content-adaptive techniques are restricted in the sense that they can’t be scaled, and they don’t provide granularity lower than the scene level.

Other techniques, such as those used by YouTube and Netflix, use “brute force” encoding of each title by applying a wide range of encoding parameters, and then by employing rate-distortion models or machine learning techniques, try to select the best parameters for each title or scene. This approach requires a lot of CPU resources since many full encodes are performed on each title, at different resolutions and bitrates. Such techniques are suitable for diverse content libraries that are limited in size, such as premium content including TV series and movies. These methods do not apply well to vast repositories of videos such as user-generated content, and are not applicable to live encoding.

Beamr’s CABR solution is different from the techniques described above in that it works in a closed-loop and adapts the encoding per frame. The video encoder first encodes a frame using a configuration based on its regular rate control mechanism, resulting in an initial encode. Then, Beamr’s CABR rate control instructs the encoder to encode the same frame again with various values of encoding parameters, creating candidate encodes. Using a patented perceptual quality measure, each candidate encode is compared with the initial encode, and then the best candidate is selected and placed in the output stream. The best candidate is the one that has the lowest bitrate but still has the same perceptual quality as the initial encode.

Taking Advantage of Beamr’s CABR Rate Control

In order for Beamr’s CABR technology to encode video to the minimal bitrate and still retain the perceptual quality of a higher bitrate encode, it compresses each video frame to the maximum extent that provides the same visual quality when the video is viewed in motion. Figure 1 shows a block diagram of an encoding solution which incorporates CABR technology.

Figure 1 – A block diagram of the CABR encoding solution

An integrated CABR encoding solution consists of a video encoder and the CABR rate control engine. The CABR engine is comprised of the CABR control module responsible for managing the optimization process and a module which evaluates video quality.

As seen in Figure 2, the CABR encoding process consists of multiple steps. Some of these steps are performed once for each encoding session, some are performed once for each frame, and some are performed for each iteration of candidate frame encoding. When starting a content-adaptive encoding session, the CABR engine and the encoder are initialized. At this stage, we set system-level parameters such as the maximum number of iterations per frame. Then, for each frame, the encoder rate control module selects the frame types by applying its internal logic.

Figure 2. A block diagram of a video encoder incorporating Content Adaptive Bit-Rate encoding.

The encoder provides the CABR engine with each original input frame for pre-analysis within the quality measure calculator. The encoder performs an initial encode of the frame, using its own logic for bit allocation, motion estimation, mode selections, Quantization Parameters (QPs), etc. After encoding the frame, the encoder provides the CABR engine with the reconstructed frame corresponding to this initially encoded frame, along with some side information – such as the frame size in bits and the QP selected for each MacroBlock or Coding Tree Unit (CTU).

In each iteration, the CABR control module first decides if the frame should be re-encoded at all. This is done, for example, according to the frame type, the bit consumption of the frame, the quality of previous frames or iterations, and according to the maximum number of iterations set for the frame. In some cases, the CABR control module may decide not to re-encode a frame at all – in that case, the initial encoded frame becomes the output frame, and the encoder continues to the next frame. When the CABR control module decides to re-encode, the CABR engine provides the encoder with modified encoding parameters, for example, a proposed average QP for the frame, or the difference from the QP used for the initial encode. Note that the QP or delta QP values are an average value, and QP modulation for each encoding block can still be performed by the encoder. In more sophisticated implementations a QP map of value per encoding block may be provided, as well as additional encoder configuration parameters.

The encoder performs a re-encode of the frame with the modified parameters. Note that this re-encode is not a full encode, since it can utilize many encoding decisions from the initial encode. In fact, the encoder may perform only re-quantization of the frame, reusing all previous motion vectors and mode decisions. Then, the encoder provides the CABR engine with the reconstructed re-encoded frame, which becomes one of the candidate frames. The quality measure module then calculates the quality of the candidate re-encoded frame relative to the initially encoded frame, and this quality score, along with the bit consumption reported by the encoder is provided to the CABR control module, which again determines if the frame should be re-encoded. When that is the case, the CABR control module sets the encoding parameters for the next iteration, and the above process is repeated. If the control module decides that the search for the optimal frame parameters is complete, it indicates which frame, among all previously encoded versions of this frame, should be used in the output video stream. Note that the encoder rate control module receives its feedback from the initial encode of the current frame, and in this way the initial encode of the next frames (which determines the target quality of the bitstream) is not affected.

The CABR engine can operate in either a serial iterative approach or a parallel approach. In the serial approach, the results from previous iterations can be used to select the QP value for the next iteration. In the parallel approach, all candidate QP values are provided simultaneously and encodes are done in parallel – which reduces latency.

Integrating the CABR Engine with Software & Hardware Encoders

Beamr has integrated the CABR engine into its AVC software encoder, Beamr 4, and into its HEVC software encoder, Beamr 5. However, the CABR engine can be integrated with any software or hardware video encoder, supporting any block-based video standard such as MPEG-2, AVC, HEVC, EVC, VVC, VP9, and AV1.

To integrate the CABR engine with a video encoder, the encoder should support several requirements. First and foremost, the encoder should be able to re-encode an input frame (that has already been encoded) with several different encoding parameters (such as QP values), and save the “state” of each of these encodes, including the initial encode. The reason for saving the state is that when the CABR control module selects one of the candidate frame encodes (or the initial encode) as the one to use in the output stream, the encoder’s state should correspond to the state it was right after encoding that candidate frame. Encoders that support multi-threaded operation and hardware encoders typically have this capability, since each frame encode is performed by a stateless unit.

Second, the encoder should support an interface to provide the reconstructed frame and the per block QP and bit consumption information for the encoded frame. To improve compute performance, we also recommend that the encoder supports a partial re-encode mode, where information related to motion estimation, partitioning and mode decisions found in the initial encode can be re-used for re-encoding without being computed again, and only the quantization and entropy coding stages are repeated for each candidate encode. This results in a minimal encoding efficiency drop for the optimized encoding result, with significant speed-up compared to full re-encode. As described above, we recommend that the encoder will use the initial encoded data (QPs, compressed size, etc.) for its Rate Control state update. However, the selected frame and accompanying data must be used for reference frames and other reference data, such as temporal MV predictors, as it is the only data available in the bitstream for decoding.

When integrating with hardware encoders that support parallel encoding with no increase in latency, we recommend using the parallel search approach where multiple QP values per frame are evaluated simultaneously. If the hardware encoder can perform parallel partial encodes (for example, re-quantization and entropy coding only), while all parallel encodes use the analysis stage of the initial encode, such as motion estimation and mode decisions, better CPU performance will be achieved.

Sample Results

Below, we provide two sample results of the CABR engine, when integrated with Beamr 5, Beamr’s HEVC software encoder, each illustrating different aspects of CABR.

For the first example, we encoded various 4K 24 FPS source clips to a target bitrate of 10 Mbps. Sample frames from each of the clips can be seen in Figure 3. The clips vary in their content complexity: “Crowd Run” has very high complexity since it has great detail and very significant motion of the runners. “StEM” has medium complexity, with some video compression challenges such as different lighting conditions and reasonably high film grain. Finally, a promotional clip of JPEGmini by Beamr has low complexity due to relatively low motion and simple scenes.

Figure 3. Sample frames from the test clips. top: crowd-run, bottom left: StEM bottom right: JPEGmini.

We encoded 500 frames from each clip to a target bitrate of 10 Mbps, using the VBR mode of the Beamr 5 HEVC encoder, which performs regular encoding, and using the CABR mode, which creates a lower bit-rate, perceptually identical stream. For the high complexity clip “Crowd Run,” where providing excellent quality at such an aggressive bitrate is very challenging, CABR reduced the bitrate by only 3%. For the intermediate complexity clip “StEM,” bitrate savings were higher and reached 17%. For the lowest complexity clip “JPEGmini,” CABR reduced the bitrate by a staggering 45%, while still obtaining excellent quality which matches the quality of the 10 Mbps VBR encode. This extensive range of bitrate reduction percentage demonstrates the fully automatic content-adaptive nature of CABR-enhanced encoder, which reaches a different final bitrate, according to the content complexity.

The second example uses a 500 frame 1080p 24 FPS clip from the well-known “Tears Of Steel” movie by the Blender open movie project. The same clip was encoded using the VBR and CABR modes of the Beamr 5 HEVC software encoder, with three target bitrates: 1.5, 3 and 5 Mbps. Savings, in this case, were 13% for the lowest bitrate resulting in a 1.4 Mbps encode, 44% for the intermediate bitrate resulting in an encode of 1.8 Mbps, and 62% for the highest bitrate, resulting in a 2 Mbps encode. Figures 4 and 5 show sample frames from the encoded clips with VBR encoding on the left vs. CABR encoding on the right. The top two images are from encodes to a bitrate of 5 Mbps, while the bottom two were taken from the 1.5 Mbps encodes. As can be seen here, both 5 Mbps target encodes preserve the details, such as the texture of the bottom lip or the two hairs on the forehead above the right eye, while in the lower bitrate encodes these details are somewhat blurred. This is the reason that when starting from different target bitrates, CABR does not converge to the same bitrate. We also see, however, that the more generous the initial encoding, generally the more savings can be obtained. This example shows that CABR adapts not only to the content complexity, but also to the quality of the target encode, and preserves perceptual quality in motion while offering significant savings.

Figure 4. A sample from the “Tears of Steel” 1080p 24 FPS encode to 5 Mbps (top) and 1.5 Mbps (bottom), encoded in VBR mode (left) and CABR mode (right)

Figure 5. Closer view of the face in Figure 4, showing detail of lips and forehead from the encode to 5 Mbps (top) and 1.5 Mbps (bottom), encoded in VBR mode (left) and CABR mode (right).

To learn how our CABR solution leverages our patented quality measure, continue to “The patented visual quality measure that makes all the difference.”

Authors: Dror Gill & Tamar Shoham