Is the Future of Video Processing Destined for GPU?

My Journey Through the Evolution of Video Processing: From Low-Quality Streaming to HD and 4K Becoming a Commodity, and Now the AI-Powered Video Revolution

Digital video has been my primary focus for the past three decades. I have built software codecs, designed ASICs, and now optimize GPU encoders with advanced GPU software.

My journey in video processing has been transformative, starting with low-resolution streaming, advancing into HD and 4K, as they shift from a rare event to an everyday expectation. And now, we stand at the next frontier—AI is redefining how we create, deliver, and experience video like never before.

My journey into this field began with the introduction of QuickTime 1.0 in 1991, when I was in my 20s. It looked to me like magic — a compressed movie playing smoothly on a single-speed CD-ROM (150 KB/s, 1.2 Mbps). At the time, I had no understanding of video encoding, but I was fascinated. At that moment I knew this is the field I wish to dive into.

Apple QuickTime Version 1.0 Demo

Chapter 1: The Challenge of Streaming Video with Low-Resolution, Low-Quality Videos

The early days of streaming, in the mid 90s, were characterized by low-resolution video, low frame rates (12-15 fps), and low bitrates — 28.8 kbps, 33 kbps, or 56 kbps — two to three orders of magnitude 100x – 1000x lower bitrate than today’s standards. This was the reality of digital video in 1996 and the years followed.

By 1996, I was one of 4 co-founders of Emblaze – We developed a vector-based graphic tool called “Emblaze Creator” – think of it as Adobe Flash before Adobe Flash.

We soon realized we needed video support. We started by downloading videos in the background. Obviously, the longer the video was, the more time it took to download, which was frustrating to wait for. So we limited the videos to just 30 seconds.

Early solutions, like RealNetworks and VideoNet, required dedicated video servers — an expensive and complex infrastructure. It seemed to me like a very long and costly journey to streaming enablement. 

Adding video to our offerings quickly was crucial for our company’s survival, so we persistently tackled this challenge. I remember the nights spent experimenting and exploring solutions, but all paths seemed to converge on the RealNetworks approach, which we couldn’t adopt in the short term.

We had to find a way to solve the challenge of streaming video efficiently for very low bandwidth. And while it was hard to stream files, you could slice them. So in 1997, I came up with an idea and worked with my team at Emblaze on the following solution:

  1. Take a video file and divide it into numbered slices. 
  2. Create an index file with the order of the slices, and place it on a standard HTTP server.
  3. The player will read that index file and pull the segments from a web server in the order as in the index file.

Just to make it more real, here is the patent we submitted in 1998, and granted in 2002:

But that was not enough, why not create time synchronized slices, so the player will be able to pull the optimal chucks based on the specific bandwidth characteristics when playing the files?

The player will read the index file from the server and choose a level to read, decide on a slice, and based on the bitrate move up and down the bitrate ladder.

If that reminds you of HLS – then it was HLS many years before HLS was out.

We demonstrated this live with EarthLink at the Easter Egg Roll at the White House in 1998. Our systems were made of H.263 and then H.264 encoders, and a patented streaming protocol. We had a track with 10 Compaq workstations running 8 cameras that day.

When you build a streaming solution, you need a player. Without it, all that effort is meaningless. At Emblaze, we had a Java-based player that required no installation—a major advantage at the time.

Back then, mobile video was in its infancy, and we saw an opportunity. Feature phones simply couldn’t play video, but the Nokia Communicator 9110 could. It had everything—a grayscale screen, a 33MHz 32-bit CPU, and wireless data access—a powerhouse by late ‘90s standards.

In 1999, I demonstrated a software video decoder running on the Nokia 9110 to Samsung Mobile CEO. This was a game-changer—it proved that video streaming on mobile devices was possible. Samsung, being a leader in CDMA 2000, wanted to showcase this capability at the 2000 Olympics and needed working prototypes.

Samsung challenged us to build a mobile ASIC capable of decoding streaming video on just 100mW of power. We delivered. The solution was announced at the Olympics, and by 2001, it was in mass production.

This phone featured The Emblaze Multimedia Application Co-Processor, working alongside the baseband chip to enable seamless video playback over CDMA 2000 networks—a groundbreaking achievement at the time.

Chapter 2: HD Becomes the Standard, 4K HDR Becomes Common 

HD television was introduced in the U.S. during the second half of the 90s, but it wasn’t until 2003 that satellite and cable providers really started broadcasting in HD.

I still remember 2003, staying at the Mandarin Oriental Hotel in NYC, where I had a 30-inch LCD screen with HD broadcasting. Standing close to the screen, taking in the crisp detail, was an eye-opening moment—the clarity, the colors, the sharpness. It was a huge leap forward from standard definition, and definitely better than DVDs.

But even then, it felt like just the beginning. HD was here, but it wasn’t everywhere yet. It took a few more years for Netflix to introduce streaming.

Beamr is Born

In early 2008, the startup I led, which focused on online backup, was acquired. By the end of the year, I found myself out of work. And so, I sent an email to Steve Jobs, pointing out that Time Machine’s performance was lacking, and that I believed I could help fix it. That email led to a meeting in Cupertino with the head of MobileMe—what we now know as iCloud.

That visit to Apple in early 2009 was fascinating. I learned that storing iPhone photos was becoming an enormous challenge. The sheer volume of images was straining Apple’s data centers, and they were running into power limitations just to keep up with demand.

With this realization, Beamr was born!

The question that intrigued us was: Can we make images smaller, while making sure they look exactly the same? 

After about one year of research, we ended up founding Beamr instead of becoming a part of MobileMe. And the leader of this – the brains behind our technology that is here today. 

During the first year of Beamr, we explored this idea. And we came out with our first product called JPEGmini, which does exactly that. This was achieved through the amazing innovation of our wonderful CTO, Tamar Shoham.

JPEGmini is a wonderful tool, and hundreds of thousands of content creators around the world use it. 

After optimizing photos, we wanted to take on video compression. That’s when we developed our gravity defier—CABR, Content Adaptive BitRate technology. This quality-driven process can cut every high-quality video by 30% to 50% while preserving every frame’s visual integrity.

But our innovation comes with challenges:

  1. Lightning-fast encoding without CABR, but with CABR it is slower and can’t run live at 4Kp60.
  2. Running CABR is more expensive than non-CABR encoding.

In the year 2018, we came to the conclusion that we needed a hardware acceleration solution – to improve our density, our speed and the cost of processing. 

We started by integrating with Intel GPUs, and it worked very well. We even demoed it at Intel Experience Day in 2019.

We had wonderful relationships with Intel and they had a good video encoding engine. We invested about two years of effort, and it did not materialize as an Intel GPU for the Data Center didn’t happen – a wasted opportunity.

Then, we thought of developing our own chip:

  • Its power will be a function of CPU or GPU
  • We will be able to put four 8Kp60 CABR chips on a single PCI card (for AC/HEVC and AV1). 
  • It will cost less than a GPU and have 3X density. 

Here’s a slide that shows that we were serious. We also started a discussion about raising funds to build that chip using 12nm technology.

But then, we looked at our plan and wondered: does this chip support the needs of the future? 

  • How would you innovate on this platform? 
  • What if you would like to run smarter algorithms or a new version of CABR?
  • Our design included programmable parts for customization. We even thought of adding GPU cores – but who is going to develop for it? 

This was a key moment in 2020, when we understood that innovation is so fast that every silicon generation takes at least two years to build and that is too slow.

There is a scale that VPU solutions will be more efficient than GPU, but that cannot compete with the current pace of change. It may come that even the biggest social networks will abandon VPUs due to the need for AI and video to work together.

Chapter 3: GPUs and the Future of Video Processing

By 2021, NVIDIA invited us to bring CABR to GPUs. This was a three-year journey, requiring a complete rewrite of our technology for NVENC. NVIDIA fully supported us, integrating CABR into all encoding modes across AVC, HEVC, and AV1.

In May 2023, the first driver was out: NVENC SDK 12.1!

At the same time, Beamr went public on NASDAQ (under the ticker BMR), on the premise of a high-quality large-scale video encoding platform enabled on NVIDIA GPUs.

Since September 2024. Beamr CABR is running LIVE video optimization on NVIDIA GPUs at 4Kp60 across 3 codecs AVC, HEVC and AV1. It is 10X faster at 1/10 of the cost for AVC, and the ratio for HEVC is double – and you can double that again for AV1.

All of our challenges for bringing CABR to the masses are solved.

But the story doesn’t end here.

What we didn’t fully anticipate was how AI-driven innovation is transforming the way we interact with video, and the opportunities are even greater than we imagined, thanks to the shift to GPUs.

Let me give you a couple of examples:

In the last Olympics, I was watching windsurfing, and on-screen, I saw a real-time overlay showing the planned routes of each surfer, the wind speed and forward tactics, and the predictions on how they would converge at the finish line.

It was seamless, intuitive, and AI-driven—a perfect example of how AI enriches the viewing experience.

Or think about social media: AI plays a huge role in processing video behind the scenes. As videos are uploaded, VPUs (Video Processing Units) handle encoding, while AI algorithms simultaneously analyze content—deciding whether it’s appropriate, identifying trends, and determining who should see it.

But the processes used by many businesses are slow and inefficient. For every AI-powered video workflow, you need:

  1. Load the video.
  2. Decode it.
  3. Process it (either for AI analysis or encoding).
  4. Sync and converge the process.

Traditionally, these steps happened separately, often with significant latency.

But on a GPU?

  • Single load, single decode, shared memory buffer.
  • AI and video processing run in parallel.
  • Everything is synced and optimized.

And just like that—you’re done. It’s faster, more efficient, and more cost-effective. This is the winning architecture for the future of AI and video at scale.

Want to learn more? Start a free trial with Beamr Cloud or Talk to us!

Beamr Now Offering Oracle Cloud Infrastructure Customers 30% Faster Video Optimization

Beamr’s Content Adaptive Bit Rate solution enables significantly decreasing video file size or bitrates without changing the video resolution or compromising perceptual quality. Since the optimized file is fully standard compliant, it can be used in your workflow seamlessly, whatever your use case, be it video streaming, playback or even part of an AI workflow.

Beamr first launched Beamr cloud earlier this year, and we are now super excited to announce that our valued partnership with Oracle Cloud Infrastructure (OCI) is enabling us to offer to OCI customers more features and better performance.

The performance improvements are due in part to the availability of the powerful NVIDIA L40S GPUs on OCI. In preliminary testing we found that running our video encoding workflows can be up to 30% faster when using these cards, than when running on the cards we currently use in the Beamr Cloud solution.

This was derived from testing AVC and HEVC NVENC driven encodes for a set of nine 1080p classic test clips with eight different configurations, and comparing encoding wall times on an A10G vs. a L40S GPU. Speedup factors of up to 55% were observed, with an average just above 30%. The full test data is available here.

Another exciting feature about these cards is that they support AV1 encoding, which means Beamr Cloud will now offer to turn your videos into optimized AV1 encodes, offering even higher bitrate/file size savings.

What’s the fuss about AV1?

In order to store and transmit video, substantial compression is needed. From the very earliest efforts to standardize video compression in the 90s, there has been a constant effort to create video compression standards offering increasing efficiency – meaning that the same video quality can be achieved with smaller files or lower bitrates.

As shown in the schematic illustration below, AV1 has come a long way in improving over H.264/AVC, the most widely adopted standard today, despite being 20 years old. However, the increased compression efficiency is not free – the computational complexity of newer codecs is also significantly higher, motivating the adoption of hardware accelerated encoding options.

With the demand and need for Video AI workflows continuing to rise, the ability to perform fully automatic, fast, efficient, optimized video encoding is an important enabler.

The Beamr GPU powered video compression and optimization occur within the GPU on OCI, right at the heart of these AI workflows, making them extremely well placed to offer benefits to such workflows. We have previously shown in a number of case studies that there is no negative impact on inference or training results when using the optimized files – making the integration of this optimization process into AI workflows a natural choice for cost savvy developers.

Real-time Video Optimization with Beamr CABR and NVIDIA Holoscan for Media

This year at the NAB Show 2024 in Las Vegas, we are excited to demonstrate our Content-Adaptive Bitrate (CABR) technology on the NVIDIA Holoscan for Media platform. By implementing CABR as a GStreamer plugin, we have, for the first time, made bitrate optimization of live video streams easily achievable in the cloud or premise.

Building on the NVIDIA DeepStream software development kit, which can extends GStreamer’s capabilities, significantly reduced the amount of code required to develop the Holoscan for Media based application. Using DeepStream components for real-time video processing and NMOS (Networked Media Open Specifications) signaling, we were able to keep our focus on the CABR technology and video processing.

The NVIDIA DeepStream SDK provides an excellent framework for developers to build and customize dynamic video processing pipelines. DeepStream provides pipeline components that make it very simple to build and deploy live video processing pipelines that utilize the hardware decoders and encoders available on all NVIDIA GPUs.

Beamr CABR dynamically adjusts video bitrate in real-time, optimizing quality and bandwidth use. It reduces data transmission without compromising video quality, making the video streaming more efficient. Recently we released our GPU implementation which uses the NVIDIA NVENC, encoder, providing significantly higher performance compared to previous solutions.

Taking our GPU implementation for CABR to the next level, we have built a GStreamer Plugin. With our GStreamer Plugin, users can now easily and seamlessly incorporate the CABR solution into their existing DeepStream pipelines as a simple drop-in replacement to their current encoder component.

Holoscan For Media


A GStreamer Pipeline Example

To illustrate the simplicity of using CABR, consider a simple DeepStream transcoding pipeline that reads and writes from files.


Simple DeepStream Pipeline:
gst-launch-1.0 -v \
  filesrc location="video.mp4" ! decodebin ! nvvideoconvert ! queue \
  nvv4l2av1enc bitrate=4500 ! mp4mux ! filesink location="output.mp4"

By simply replacing the nvv4l2av1enc component with our CABR component, the encoding bitrate is adapted in real-time, according to the content, ensuring optimal bitrate usage for each frame, without any loss of perceptual quality.


CABR-Enhanced DeepStream Pipeline:
gst-launch-1.0 -v \
  filesrc location="video.mp4" ! decodebin ! nvvideoconvert ! queue \
  beamrcabvav1 bitrate=4500 ! mp4mux ! filesink location="output_cabr.mp4"


Similarly, we can replace the encoder component used in a live streaming pipeline with the CABR component to optimize live video streams, dynamically adjusting the output bitrate and offering up to a 50% reduction in data usage without sacrificing video quality.


Simple DeepStream Pipeline:
gst-launch-1.0 -v \
  rtmpsrc location=rtmp://someurl live=1 ! decodebin ! queue ! \ 
  nvvideoconvert ! queue ! nvv4l2av1enc bitrate=3500 ! \
  av1parse ! rtpav1pay mtu=1300 ! srtsink uri=srt://:8888

CABR-Enhanced DeepStream Pipeline:
gst-launch-1.0 -v \
  rtmpsrc location=rtmp://someurl live=1 ! decodebin ! queue ! \
  nvvideoconvert ! queue ! beamrcabrav1 bitrate=3500 ! \
  av1parse ! rtpav1pay mtu=1300 ! srtsink uri=srt://:8888


The Broad Horizons of CABR Integration in Live Media

Beamr CABR, demonstrated using NVIDIA Holoscan for Media at NAB show, marks just the beginning. This technology is an ideal fit for applications running on NVIDIA RTX GPU-powered accelerated computing and sets a new standard for video encoding.

Lowering the video bitrate reduces the required bandwidth when ingesting video to the cloud, creating new possibilities where high resolution or quality were previously costly or not even possible. Similarly, reduced bitrate when encoding on the cloud allows for streaming of higher quality videos at lower cost.

From file-based encoding to streaming services — the potential use cases are diverse, and the integration has never before been so simple. Together, let’s step into the future of media
streaming, where quality and efficiency coexist without compromise.

Beamr Tech boosts Video Machine Learning: Taking a look at training

Introduction

Machine learning for Video is an expanding field, garnering vast interest, with generative AI for video picking up speed. However there are significant pain points for these technologies such as storage and bandwidth bottlenecks when dealing with video content, as well as training and inferencing speeds.

In the following case study, we show that training an AI network for action recognition using video files compressed and optimized through Beamr Content-Adaptive Bitrate technology (CABR), produces results that are as good as training the network with the original, larger files. The ability to use significantly smaller video files can accelerate machine learning (ML) training and inferencing.

Motivation

Beamr’s CABR enables significantly decreasing video file size without changing the video resolution, compression or file format or compromising perceptual quality. It is therefore a great candidate for resolving file size issues and bandwidth bottlenecks in the context of ML for video.

In a previous case study we looked at the task of people detection in video using pre-trained models. In this case study we cover the more challenging task of training a neural network for action recognition in video, comparing the outcome when using source vs optimized files. 

We will start by describing the problem we targeted, and then provide the classifier architecture used. We will continue with details on the data sets used and their optimization results, followed by the experiment results, concluding with directions for future work.

Action recognition task

When setting the scope for this case study it was essential to us to define a test case that makes full use of the fact that the content is video, as opposed to image. Therefore we selected a task which requires the temporal element of video to perform the classification – action recognition. In viewing individual frames it is not possible to differentiate between frames captured during walking and running, or between someone jumping or dancing. For this a sequence of frames is required, which is why this was our task of choice.

Target data set

For the fine tuning step we collected a set of 160 user-generated content free to use video clips, downloaded from the Pexels and Envato stock-video websites. The videos were downloaded in 720p resolution, using the website default settings. We selected videos that belong to one of the following four action classes or categories: running, martial arts, dancing and rope jumping. 

In order to use these in the selected architecture, they needed to be cropped to a square input. This was done by manually marking “ROI” in each clip, and performing the crop using OpenCV and corresponding OpenH264 encoding with default configuration and settings.

We first performed optimization of the original clip set, using the Beamr cloud optimization SaaS, obtaining an average reduction of 24%. This is beneficial when storing the test set for future use and possibly performing other manipulations on it. However, for our test we wanted to compress the set of cropped videos that were actually used for the training process. Applying the same optimization to these files, created by openCV, yielded a whopping 67% savings or average reduction.

Architecture

We selected an encoder-decoder architecture, which is commonly used for classification of video or other time series inputs. For the encoder we used ResNet-152 pre-trained with ImageNet, followed by 3 fully connected layers with sizes of 1024, 768 and 512. For the decoder we used an LSTM decoder followed by 2 fully connected layers consisting of 512 and 256 neurons.

Pre-training

We performed initial training of the network using the UCF-101 dataset which consists of 13,320 video clips, at a resolution of 240p, classified into 101 possible action classes. The data was split so that 85% of the files were used for the training and 15% for validation.

These videos were resized to 224 x 224 prior to feeding into the classifier. The training was done using a batch size of 24, and 35 epochs were performed. For the error function we used cross-entropy loss which is a popular choice for classifier training. The Adaptive Moment Estimation, or Adam, optimizer with a learning rate of 1e-3 was selected for the training process as it solves the problems of local minima, overshoot or oscillation caused by the fixed values of the learning rates during the updating of network parameters. This setup yielded a result of 83% accuracy on the validation set.

Training

We performed fine tuning of the pre-trained network described above, to learn the target data set.. 

The training was performed on 2.05 GB of cropped videos, and on 0.67 GB of cropped & optimized videos, with 76% of the files used for training and 24% for validation.

Due to the higher resolution of the input in the target data set the fine tuning training was done using a batch size of only 4. 30 epochs were performed, though we generally achieved convergence already at 9-10 epochs. Again we used cross-entropy loss and Adam optimizer with a learning rate of 1e-3.

Due to the relatively small sample size used here, a difference in one or two classifications can alter results, so we repeated the training process 10 times for each case in order to obtain confidence in the results. The obtained accuracy results for the 10 testing rounds on each of the non-optimized and optimized video sets are presented in the following table.

Minimum AccuracyAverage AccuracyMaximum Accuracy
Non-Optimized Videos56%65%69%
Optimized Videos64%67%75%

To further verify the results we collected a set of 48 additional clips, and tested these independently on each of the trained classifiers. Below we provide the full cross matrix of maximum and mean accuracy obtained for the various cases. 

Tested on Non-OptimizedTested on Optimized
Trained on Non-Optimized65%, 53%62%, 50%
Trained on Optimized62%, 50%65%, 50%

Summary & Future work

the results shared above confirm that training a neural network with significantly smaller video files, optimized by Beamr’s CABR, has no negative impact on the training process. In this experiment we even saw a slight benefit resulting from training using optimized files. However, it is unclear if this is a significant conclusion, and we intend to investigate this further. We also see that the cross testing/training has similar results in the different cases.

This test was an initial, rather small scale experiment. We are planning to expand this to larger scale testing, including distributed training setups in the cloud using GPU clusters, where we expect to see further benefits from the reduced sizes of the files used.

This research is part of our ongoing quest to accelerate adoption, and increase accessibility of machine learning and deep learning video as well as video analysis solutions.

Beamr CABR Poised to Boost Vision AI

By reducing video size but not perceptual quality, Beamr’s Content Adaptive Bit Rate optimized encoding can make video used for vision AI easier to handle thus reducing workflow complexity


Written by: Tamar Shoham, Timofei Gladyshev


Motivation 

Machine learning (ML) for video processing is a field which is expanding at a fast pace and presents significant untapped potential. Video is an incredibly rich sensor and has large storage and bandwidth requirements, making vision AI a high-value problem to solve and incredibly suited for AI and ML.

Beamr’s Content Adaptive Bit rate solution (CABR) is a solution that can significantly decrease video file size without changing the video resolution, compression or file format or compromising perceptual quality. It therefore interested us to examine how the Beamr CABR solution can be used to assist in cutting down the sizes of video used in the context of ML.

In this case study, we focus on the relatively simple task of people detection in video. We made use of the NVIDIA DeepStream SDK, a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio and image understanding. Using this SDK is a natural choice for Beamr as an NVIDIA Metropolis partner.

In the following we describe the test setup, the data set used, test performed and obtained results. Then we will present some conclusions and directions for future work.

Test Setup

In this case study, we limited ourselves to comparing detection results on source and reduced-size files by using pre-trained models, making it possible to use unlabeled data. 

We collected a set of 19 User-Generated Content, or UGC, video clips, captured on a few different iPhone models. To these we added some clips downloaded from the Pexels free stock videos website. All test clips are in the mp4 or v file format, containing AVC/H.264 encoded video, with resolutions ranging from 480p to full HD and 4K and durations ranging from 10 seconds to 1 minute. Further details on the test files can be found in Annex A.

These 14 source files were then optimized using Beamr’s storage optimization solution to obtain files that were reduced in size by 9 – 73%, with an average reduction of 40%. As mentioned above, this optimization results in output files which retain the same coding and file formats and the same resolution and perceptual quality. The goal of this case study is to show that these reduced-size, optimized files also provide aligned ML results. 

For this test, we used the NVIDIA DeepStream SDK [5] with the PeopleNet-ResNet34 detector. Once again, we calculated the mAP among the detections on the pairs of source and optimized files for an IoU threshold of 0.5.  

Results

We found that for files with predictions that align with actual people, the mAP is very high, showing that true detection results are indeed unaffected by replacing the source file with the smaller, easier-to-transfer, optimized file. 

An example showing how well they align is provided in Figure 1. This test clip resulted in a mAP[0.5] value of 0.98.

Figure 1
Figure 2
Figure 1: Detections for pexels_video_2160p_2 frame 305 using PeopleNet-ResNet34, source on top, with optimized (54% smaller) below

As the PeopleNet-ResNet34 model was developed specifically for people detection, it has quite stable results, and overall showed high mAP values with a median mAP value of 0.94. 

When testing some other models we did notice that in cases where the detections were unstable, the source and optimized files sometimes created different false positives. It is important to note that because we did not have labeled data, or a ground truth, when such detection errors occur out of sync, they have a double impact on the mAP value calculated between the detections on the source and the detections on the optimized file. This results in poorer results than the mAP values expected when calculating for detections vs. the labeled data.  

We also noticed cases where there is a detection flicker, with the person being detected only in some of the frames where they appear. This flicker is not always synchronized between the source and optimized clips, resulting once again in an ‘accumulated’ or double error in the mAP calculated among them. An example of this is shown in Figure, for a clip with a mAP[0,5] value of 0.92.

Figure 2a: Detections for frames 1170 from the clip pexels_musicians.mp4 using PeopleNet-ResNet34, source on the left and optimized (44% smaller) on the right. Note detection on the left of the stairs, present only in the source file.
Figure 2b: same for frame 1171, with no detection in either
Figure 2c: frame 1172, detected in both
Figure 2d: frame 1173, detected only in the optimized file

Summary

The experiments described above show that CABR can be applied to videos that undergo ML tasks such as object detection. We showed that when detections are stable, almost identical results will be obtained for the source and optimized clips. The advantages of reducing storage size and transmission bandwidth by using the optimized files make this possibility particularly attractive. 

Another possible use for CABR in the context of ML stems from the finding that for unstable detection results, CABR may have some impact on false positives or mis-detects. In this context, it would be interesting to view it as a possible permutation on labeled data to increase training set size. In future work, we will investigate the further potential benefits obtained when CABR is incorporated at the training stage and expand the experiments to include more model types and ML tasks. 

This research is all part of our ongoing quest to accelerate adoption and increase the accessibility of video ML/DL and video analysis solutions.

Annex A – test files

Below are details on the test files used in the above experiments. All the files, and detection results are available here 

#FilenameSourceBitrateDims WxHFPSDuration [sec]saved by CABR 
1IMG_0226iPhone 3GS3.61 M640×4801536.3335%
2IMG_0236iPhone 3GS3.56 M640×4803050.4520%
3IMG_0749iPhone 517.0 M1920×108029.911.2334%
4IMG_3316iPhone 4S21.9 M1920×108029.99.3526%
5IMG_5288iPhone SE14.9 M1920×108029.943.4029%
6IMG_5713iPhone 5c16.3 M1080×192029.948.8873%
7IMG_7314iPhone 715.5 M1920×108029.954.5350%
8IMG_7324iPhone 715.8 M1920×108029.916.4339%
9IMG_7369iPhone 617.9 M1080×192029.910.2330%
10pexels_musicianspexels10.7 M 1920×10802460.044%
11pexels_video_1080ppexels4.4 M1920×10802512.5663%
12pexels_video_2160ppexels12.2 M3840×21602515.249%
13pexels_video_2160p_2pexels15.2 M3840×21602515.8454%
14pexels_video_of_people_walking_1080ppexels3.5 M1920×108023.919.1958%

Table A1: Test files used, with the per file savings

Beamr Celebrates 50 Granted Patents

Introduction

A few weeks ago Beamr reached a historic milestone, which got everyone in the company excited. It was triggered by a rather formal announcement from the US Patent Office, in their typical “dry” language: “THE APPLICATION IDENTIFIED ABOVE HAS BEEN EXAMINED AND IS ALLOWED FOR ISSUANCE AS A PATENT”. We’ve received such announcements many times before, from the USPTO and from other national patent offices, but this one was special: It meant that the Beamr patent portfolio has now grown to 50 granted patents! 

We have always believed that a strong IP portfolio is extremely important for an innovative technology company, and invested a lot of human and capital resources over the years to build it. So we thought that this anniversary would be a good opportunity to reflect back on our IP journey, and share some lessons we learned along the way, which might come in handy to others who are pursuing similar paths.

Starting With Image Optimization

Beamr was established in 2009, and the first technology we developed was for optimizing images – reducing their file size while retaining their subjective quality. In order to verify that subjective quality is preserved, we needed a way to accurately measure it, and since existing quality metrics at the time were not reliable enough (e.g. PSNR, SSIM), we developed our own quality metric, which was specifically tuned to detect the artifacts of block-based compression. 

Our first patent applications covered the components of the quality measure itself, and its usage in a system for “recompressing” images or video frames. The system takes a source image or a video frame, compresses it at various compression levels, and then compares the compressed versions to the source. Finally, it selects the compressed version that is smallest in file size, but still retains the full quality of the source, as measured by our quality metric. 

After these initial patent applications which covered the basic method we were using for optimization, we submitted a few more patent applications which covered additional aspects of the optimization process. For example, we found that sometimes when you increase the compression level, the quality of the image increases, and vice versa. This is counter-intuitive, since typically increasing the compression reduces image quality, but it does happen in certain situations. It means that the relationship between quality and compression is not “monotonic”, which makes finding the optimal compression level quite challenging. So we devised a method to solve this issue of non-monotony, and filed a separate patent application for it. 

Another issue we wanted to address was the fact that some images could not be optimized – every compression level we tried would result in quality reduction, and eventually we just copied the source image to the output. In order to save CPU cycles, we wanted to refrain from even trying to optimize such images. Therefore, we developed an algorithm which determines whether the source image is “highly compressed” (meaning that it can’t be optimized without compromising quality), based on analyzing the source image itself. And of course – we submitted a patent application on this algorithm as well.

As we continued to develop the technology, we found that some images required special treatment due to specific content or characteristics of the images. So we filed additional patent applications on algorithms we developed for configuring our quality metric for specific types of images, such as synthetic (computer-generated) images and images with vivid colors (chroma-rich).

Extending to Video Optimization

Optimizing images turned out to be very valuable for improving the workflow of professional photographers, reducing page load time for web services, and improving the UX for mobile photo apps. But with video reaching 80% of total Internet bandwidth, it was clear that we needed to extend our technology to support optimizing full video streams. As our technology evolved, so did our patent portfolio: We filed patent applications on the full system of taking a source video, decoding it, encoding each frame with several candidate compression levels, selecting the optimal compression level for that frame, and moving on to the next frame. We also filed patent applications on extending the quality measure with additional components that were designed specifically for video: For example, a temporal component that measures the difference in the “temporal flow” of two successive frames using different compression levels. Special handling of real or simulated “film grain”, which is widely used in today’s movie and TV productions, was the subject of another patent application. 

When integrating our quality measure and control mechanism (which sets the candidate compression levels) with various video encoders, we came to the conclusion that we needed a way to save and reload a “state” of the encoder without modifying the encoder internals, and of course – patented this method as well. Additional patents were filed on a method to optimize video streams on the basis of a GOP (Group of Pictures) rather than a frame, and on a system that improves performance by determining the optimal compression level based on sampled segments instead of optimizing the whole stream. 

Embracing Video Encoding

In 2016 Beamr acquired Vanguard Video, the leading provider of software H.264 and HEVC encoders. We integrated our optimization technology into Vanguard Video’s encoders, creating a system that optimized video while encoding it. We call this CABR, and obviously we filed a patent on the integrated system. For more information about CABR, see our blog post “A Deep Dive into CABR”. 

With the acquisition of Vanguard, we didn’t just get access to the world’s best SW encoders. We also gained a portfolio of video encoding patents developed by Vanguard Video, which we continued to extend in the years since the acquisition. These patents cover unique algorithms for intra prediction, motion estimation, complexity analysis, fading and scene change analysis, adaptive pre-processing, rate control, transform and block type decisions, film grain estimation and artifact elimination.

In addition to encoding and optimization, we’ve also filed patents on technologies developed for specific products. For example, some of our customers wanted to use our image optimization technology while creating lower-resolution preview images, so we patented a method for fast and high-quality resizing of an image. Another patent application was filed on an efficient method of generating a transport stream, which was used in our Beamr Optimizer and Beamr Transcoder products. 

The chart below shows the split of our 50 patents by the type of technology.

Patent Strategy – Whether and Where to File

Our patent portfolio was built to protect our inventions and novel developments, while at the same time establish the validity of our technology. It’s common knowledge that filing for a patent is a time and money consuming endeavor. Therefore, prior to filing each patent application we ask ourselves: Is this a novel solution to an interesting problem? Is it important to us to protect it? Is it sufficiently tangible (and explainable) to be patentable? Only when the answer to all these questions is a resounding yes, we proceed to file a corresponding patent application. 

Geographically speaking, you need to consider where you plan to market your products, because that’s where you want your inventions protected. We have always been quite heavily focused on the US market, making that a natural jurisdiction for us. Thus, all our applications were submitted to the US Patent Office (USPTO). In addition, all applications that were invented in Beamr’s Israeli R&D center were also submitted to the Israeli Patent Office (ILPTO). Early on, we also submitted some of the applications in Europe and Japan, as we expanded our sales activities to these markets. However, our experience showed that the additional translation costs (not only of the patent application itself, but also of documents cited by an Office Action to which we needed to respond), as well as the need to pay EU patent fees in each selected country, made this choice less cost effective. Therefore, in recent years we have focused our filings mainly on the US and Israel. 

The chart below shows the split of our 50 patents by the country in which they were issued.

Patent Process – How to File

The process which starts with an idea, or even an implemented system based on that idea, and ends in a granted patent – is definitely not a short or easy one.

Many patents start their lifecycle as Provisional Applications. This type of application has several benefits: It doesn’t require writing formal patent claims or an Information Disclosure Statement (IDS), it has a lower filing fee than a regular application, and it establishes a priority date for subsequent patent filings. The next step can be a PCT, which acts as a joint base for submission in various jurisdictions. Then the search report and IDS are performed, followed by filing national applications in the selected jurisdictions. Most of our initial patent applications went through the full process described above, but in some cases, particularly when time was of the essence, we skipped the provisional or PCT steps, and directly filed national applications. 

For a national application, the invention needs to be distilled into a set of claims, making sure that they are broad enough to be effective, while constrained enough to be allowable, and that they follow the regulations of the specific jurisdiction regarding dependencies, language etc. This is a delicate process, and at this stage it is important to have a highly experienced patent attorney that knows the ins and outs of filing in different countries. For the past 12 years, since filing our first provisional patent, we were very fortunate to work with several excellent patent attorneys at the Reinhold Cohen Group, one of the leading IP firms in Israel, and we would like to take this opportunity to thank them for accompanying us through our IP journey.

After finalizing the patent claims, text and drawings, and filing the national application, what you need most is – patience… According to the USPTO, the average time between filing a non-provisional patent application and receiving the first response from the USPTO is around 15-16 months, and the total time until final disposition (grant or abandonment) is around 27 months. Add this time to the provisional and PCT process, and you are looking at several years between filing the initial provisional application and receiving the final grant notice. In some cases it’s possible to speed up the process by using the option of a modified examination in one jurisdiction, after the application gained allowance in another jurisdiction.

The chart below shows the number of granted patents Beamr has received in each passing year.

Sometimes, the invention, description and claims are straightforward enough that the examiner is convinced and simply allows the application as filed. However, this is quite a rare occurrence. Usually there is a process of Office Actions – where the examiner sends a written opinion, quoting prior art s/he believes is relevant to the invention and possibly rejecting some or even all the claims based on this prior art. We review the Office Action and decide on the next step: In some cases a simple clarification is required in order to make the novelty of our invention stand out. In others we find that adding some limitation to the claims makes it distinctive over the prior art. We then submit a response to the examiner, which may result either in acceptance or in another Office Action. Occasionally we choose to perform an interview with the examiner to better understand the objections, and discuss modifications that can bring the claims into allowance.

Finally, after what is sometimes a smooth, and sometimes a slightly bumpy route, hopefully a Notice Of Allowance is received. This means that once filing fees are paid – we have another granted patent! In some cases, at this point we decide to proceed with a divisional application, a continuation or continuation in part – which means that we claim additional aspects of the described invention in a follow up application, and then the patent cycle starts once again…

Summary

Receiving our 50th patent was a great opportunity to reflect back on the company’s IP journey over the past 12 years. It was a long and winding road, which will hopefully continue far into the future, with more patent applications, office actions and new grants to come.

Speaking of new grants – as this blog post went to press, we were informed that our 51st patent was granted! This patent covers “Auto-VISTA”, a method of “crowdsourcing” subjective user opinions on video quality, and aggregating the results to obtain meaningful metrics. You can learn more about Auto-VISTA in Episode 34 of The Video Insiders podcast.

Adding Beamr’s Frame-Level Content-Adaptive Rate Control to the AV1 Encoder

Introduction

AV1, the open source video codec developed by the Alliance for Open Media, is the most efficient open-source encoder available today. AV1’s compression efficiency has been found to be 30% better than VP9, the previous generation open source codec, meaning that AV1 can reach the same quality as VP9 with 30% less bits. Having an efficient codec is especially important now that video consumes over 80% of Internet bandwidth, and the usage of video for both entertainment and business applications is soaring due to social distancing measures. 

Beamr’s Emmy® award-winning CABR technology reduces video bitrates by up to 50% while preserving perceptual quality. The technology creates fully-compliant standard video streams, which don’t require any proprietary decoder or add-on on the playback side. We have applied our CABR technology in the past to H.264, HEVC and VP9 codecs, using both software and hardware encoder implementations. 

In this blog post we present the results of applying Beamr’s CABR technology to the AV1 codec, by integrating our CABR library with the libaom open source implementation of AV1. This integration results in a further 25-40% reduction in the bitrate of encoded streams, without any visible reduction in subjective quality. The reduced-bitrate streams are of course fully AV1 compatible, and can be viewed with any standard AV1 player.

CABR In Action

Beamr’s CABR (Content Adaptive BitRate) technology is based on our BQM (Beamr Quality Measure) metric, which was developed over 10 years of intensive research, and features very high correlation with subjective quality as judged by humans. BQM is backed by 37 granted patents, and  has recently won the 2021 Technology and Engineering Emmy® award from the National Academy of Television Arts & Sciences.

Beamr’s CABR technology and the BQM quality measure can be integrated with any software or hardware video encoder, to create more bitrate-efficient encodes without sacrificing perceptual quality. In the integrated solution, the video encoder encodes each frame with additional compression levels, also known as QP values. The first QP (for the initial encode) is determined by the encoder’s own rate control mechanism, which can be either VBR, CRF or fixed QP. The other QPs (for the candidate encodes) are provided by the CABR library. The BQM quality measure then compares the quality of the initial encoded frame to the quality of the candidate encoded frames, and selects the encoded frame which has the smallest size in bits, but is still perceptually identical to the initial encoded frame. Finally, the selected frame is written to the output stream. Due to our adaptive method of searching for candidate QPs, in most cases a single candidate encode is sufficient to find a near-optimal frame, so the performance penalty is quite manageable.

Integrating Beamr’s CABR module with a video encoder

By applying this process to each and every video frame, the CABR mechanism ensures that each frame fully retains the subjective quality of the initial encode, while bitrate is reduced by up to 50% compared to encoding the videos using the encoders’ regular rate control mechanism.

Beamr’s CABR rate control library is integrated into Beamr 4 and Beamr 5, our software H.264 and HEVC encoder SDKs, and is also available as a standalone library that can be integrated with any software or hardware encoder. Beamr is now implementing BQM in silicon hardware, enabling massive scale content-adaptive encoding of user-generated content, surveillance videos and cloud gaming streams. 

CABR Integration with libaom

When we approached the task of integrating our CABR technology with an AV1 encoder, we examined several available open source implementations of AV1, and eventually decided to integrate with libaom, the reference open source implementation of the AV1 encoder, developed by the members of the Alliance of Open Media. libaom was selected due to a good quality-speed tradeoff at the higher quality working points, and a well defined frame encode interface which made the integration more straightforward. 

To apply CABR technology to any encoder, the encoder should be able to re-encode the same input frame with different QPs, a process that we call “roll-back”. Fortunately, the libaom AV1 encoder already includes a re-encode loop, designed for the purpose of meeting bitrate constraints. We were able to utilize this mechanism to enable the frame re-encode process needed for CABR. 

Another important aspect of CABR integration is that although CABR reduces the actual bitrate relative to the requested “target” bitrate, we need the encoder’s rate control to believe that the target bitrate has actually been reached. Otherwise, it will try to compensate for the bits saved by CABR, by increasing bit allocation in subsequent frames, and this will undermine the process of CABR’s bitrate reduction. Therefore, we have modified the VBR rate-control feedback, reporting the bit-consumption of the initial encode back to the RC module, instead of the actual bit consumption of the selected output frame. 

An additional point of integration between an encoder and the CABR library is that CABR uses “complexity” data from the encoder when calculating the BQM metric. The complexity data is based on the per-block QP and bit consumption reported by the encoder. In order to expose this information, we added code that extracts the QP and bit consumption per block, and sends it to the CABR library.

The current integration of CABR with libaom supports 8 bit encoding, in both fixed QP and single pass VBR modes. 10-bit encoding (including HDR) and dual-pass VBR encoding are already supported with CABR in our own H.264 and HEVC encoders, and can be easily added to our libaom integration as well. 

Integration Challenges

Every integration has its challenges, and indeed we encountered several of them while integrating CABR with libaom. For example, the re-encode loop in libaom initiates prior to the deblocking and other loop-filters, so the frame it generates is not the final reconstructed frame. To overcome this issue, we moved the in-loop filters and applied them prior to evaluating the candidate frame quality.

Another challenge we encountered was that the CABR complexity data is based on the QP values and bit consumption per 16×16 block, while within the libaom encoder this information is only available for bigger blocks. To resolve this, we had to process the actual data in order to generate the QP and bit consumption at the required resolution.

The concept of non-display frames, which is unique to VP9 and AV1, also posed a challenge to our integration efforts. The reason is that CABR only compares quality for frames that are actually displayed to the end user. So we had to take this into account when computing the BQM quality measure and calculating the bits per frame.

Finally, while the QP range in H.264 and HEVC is between 0 and 51, in AV1 it is between 0 and 255. We have an algorithm in CABR called “QP Search” which finds the best candidate QPs for each frame, and it was tuned for the QP range of 0-51, since it was originally developed for H.264 and HEVC encoders. We addressed this discrepancy by performing a simple mapping of values, but in the future we may perform some additional fine tuning of the QP Search algorithm in order to better utilize the increased dynamic range.

Benchmarking Process

To evaluate the results of Beamr’s CABR integration with the libaom AV1 encoder, we selected 20 clips from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license. The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post. 

We encoded the selected video clips with libaomx, our version of libaom integrated with the CABR library. The videos were encoded using libaom cpu-used=9, which is the fastest speed available in libaom, and therefore the most practical in terms of encoding time. We believe that using lower speeds, which provide improved encoding quality, can result in even higher savings. 

Each clip was encoded twice: once using the regular VBR rate control without the CABR library, and a second time using the CABR rate control mode. In both cases, we used 3 target bitrates for each resolution: A high, medium and low bitrate, as specified in the table below.

Target bitrates used in the CABR-AV1 benchmark

Below is the command line we used to encode the files.

aomencx --cabr=<0 or 1> -w <width> -h <height> --fps=<fps>/1 --disable-kf --end-usage=vbr --target-bitrate=<bitrate in kbps> --cpu-used=9 -p 1 -o <outfile>.ivf <inputFIFO>.yuv

After we completed the encodes in both rate control modes, we compared the bitrate and subjective quality of both encodes. We calculated the % of difference in bitrate between the regular VBR encode and the CABR encode, and visually compared the quality of the clips to determine whether both encodes are perceptually identical to each other when viewed side by side in motion. 

Benchmark Results

The table below shows the VBR and CABR bitrates for each file, and the savings obtained, which is calculated as (VBR bitrate – CABR bitrate) / VBR bitrate. As expected, the savings are higher for high bitrate clips, but still significant even for the lowest bitrates we used. Average savings are 26% for the low bitrates, 33% for the medium bitrates, and 40% for the high bitrates. 

Note that savings differ significantly across different clips, even when they are encoded at the same resolution and target bitrate. For example, if you look at 1080p clips encoded to the lowest bitrate target (2 Mbps), you will find that some clips have very low savings (less than 3%), while other clips have very high savings (over 60%). This shows the content-adaptive nature of our technology, which is always committed to quality, and reduces the bitrate only in clips and frames where such reduction does not compromise quality. 

Also note that the VBR bitrate may differ from the target bitrate. The reason is that the rate control does not always converge to the target bitrate, due to the short length of the clips. But in any case, the savings were calculated between the VBR bitrate and the CABR bitrate.

Savings – Low Bitrates
Savings – Medium Bitrates
Savings – High Bitrates

In addition to calculating the bitrate savings, we also performed subjective quality testing by viewing the videos side by side, using the YUView player software. In these viewings we verified that indeed for all clips, the VBR and CABR encodes are perceptually identical when viewed in motion at 100% zoom. Below are a few screenshots from these side-by-side viewings. 

Conclusions

In this blog post we presented the results of integrating Beamr’s Content Adaptive BitRate (CABR) technology with the libaom implementation of the AV1 encoder. Even though AV1 is the most efficient open source encoder available, using CABR technology can reduce AV1 bitrates by a further 25-40% without compromising perceptual quality. The reduced bitrate can provide significant savings in storage and delivery costs, and enable reaching wider audiences with high-quality, high-resolution video content.

Appendix

The VBR and CABR encoded files can be found here.
The source files can be downloaded directly from the YouTube UGC Dataset, using the links below. 

720P/Animation_720P-620f.mkv 

1080P/Animation_1080P-3dbf.mkv 

1080P/Gaming_1080P-6dc6.mkv 

720P/HowTo_720P-37d0.mkv 

1080P/HowTo_1080P-64f7.mkv 

1080P/Lecture_1080P-0c8a.mkv 

720P/LiveMusic_720P-66df.mkv 

1080P/LiveMusic_1080P-14af.mkv 

720P/NewsClip_720P-7745.mkv 

720P/NewsClip_720P-6016.mkv 

1080P/NewsClip_1080P-5b53.mkv 

720P/Sports_720P-5bfd.mkv 

720P/Sports_720P-531c.mkv 

1080P/Sports_1080P-15d1.mkv 

2160P/Sports_2160P-1b70.mkv 

1080P/TelevisionClip_1080P-39e3.mkv 

1080P/TelevisionClip_1080P-5e68.mkv 

1080P/TelevisionClip_1080P-68c6.mkv 

720P/VerticalVideo_720P-4ca7.mkv 

1080P/VerticalVideo_1080P-3a9b.mkv 

Optimizing Bitrates of User-generated Videos with Beamr CABR

Introduction

The attention of Internet users, especially the younger generation, is shifting from professionally-produced entertainment content to user-generated videos and live streams on YouTube, Facebook, Instagram and most recently TikTok. On YouTube, creators upload 500 hours of video every minute, and users watch 1 billion hours of video every day. Storing and delivering this vast amount of content creates significant challenges to operators of user-generated content services. Beamr’s CABR (Content Adaptive BitRate) technology reduces video bitrates by up to 50% compared to regular encodes, while preserving perceptual quality and creating fully-compliant standard video streams that don’t require any proprietary decoder on the playback side. CABR technology can be applied to any existing or future block-based video codec, including AVC, HEVC, VP9, AV1, EVC and VVC. 

In this blog post we present the results of a UGC encoding test, where we selected a sample database of videos from YouTube’s UGC dataset, and encoded them both with regular encoding and with CABR technology applied. We compare the bitrates, subjective and objective quality of the encoded streams, and demonstrate the benefits of applying CABR-based encoding to user-generated content. 

Beamr CABR Technology

At the heart of Beamr’s CABR (Content-Adaptive BitRate) technology is a patented perceptual quality measure, developed during 10 years of intensive research, which features very high correlation with human (subjective) quality assessment. This correlation has been proven in user testing according to the strict requirements of the ITU BT.500 standard for image quality testing. For more information on Beamr’s quality measure, see our quality measure blog post.

When encoding a frame, Beamr’s encoder first applies a regular rate control mechanism to determine the compression level, which results in an initial encoded frame. Then, the Beamr encoder creates additional candidate encoded frames, each one with a different level of compression, and compares each candidate to the initial encoded frame using the Beamr perceptual quality measure. The candidate frame which has the lowest bitrate, but still meets the quality criteria of being perceptually identical to the initial frame, is selected and written to the output stream. 

This process repeats for each video frame, thus ensuring that each frame is encoded to the lowest bitrate, while fully retaining the subjective quality of the target encode. Beamr’s CABR technology results in video streams that are up to 50% lower in bitrate than regular encodes, while retaining the same quality as the full bitrate encodes. The amount of CPU cycles required to produce the CABR encodes is only 20% higher than regular encodes, and the resulting streams are identical to regular encodes in every way except their lower bitrate. CABR technology can also be implemented in silicon for high-volume video encoding use cases such as UGC video clips, live surveillance cameras etc. 

For more information about Beamr’s CABR technology, see our CABR Deep Dive blog post. 

CABR for UGC

Beamr’s CABR technology is especially suited for User-Generated Content (UGC), due to the high diversity and variability of such content. UGC content is captured on different types of devices, ranging from low-end cellular phones to high-end professional cameras and editing software. The content itself varies from “talking head” selfie videos, to instructional videos shot in a home or classroom, to sporting events and even rock band performances with extreme  lighting effects. 

Encoding UGC content with a fixed bitrate means that such a bitrate might be too low for “difficult” content, resulting in degraded quality, while it may be too high for “easy” content, resulting in wasted bandwidth. Therefore, content-adaptive encoding is required to ensure that the optimal bitrate is applied to each UGC video clip. 

Some UGC services use the Constant Rate Factor (CRF) rate control mode of the open-source x264 video encoder for processing UGC content, in order to ensure a constant quality level while varying the actual bitrate according to the content. However, CRF bases its compression level decisions on heuristics of the input stream, and not on a true perceptual quality measure that compares candidate encodes of a frame. Therefore, even CRF encodes waste bits that are unnecessary for a good viewing experience. Beamr’s CABR technology, which is content-adaptive at the frame level, is perfectly suited to remove these remaining redundancies, and create encodes that are smaller than CRF-based encodes but have the same perceptual quality. 

Evaluation Methodology

To evaluate the results of Beamr’s CABR algorithm on UGC content, we used samples from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license, which was created to assist in  video compression and quality assessment research. The dataset includes around 1500 source video clips (raw video), with a duration of 20 seconds each. The resolution of the clips ranges from 360p to 4K, and they are divided into 15 different categories such as animation, gaming, how-to, music videos, news, sports, etc. 

To create the database used for our evaluation, we randomly selected one clip in each resolution from each category, resulting in a total of 67 different clips (note that not all categories in the YouTube UGC set have clips in all resolutions). The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post. As typical user-generated videos, many of the videos suffer from perceptual quality issues in the source, such as blockiness, banding, blurriness, noise, jerky camera movements, etc. which makes them specifically difficult to encode using standard video compression techniques. 

We encoded the selected video clips using Beamr 4x, Beamr’s H.264 software encoder library, version 5.4. The videos were encoded using speed 3, which is typically used to encode VoD files in high quality. Two rate control modes were used for encoding: The first is CSQ mode, which is similar to x264 CRF mode – this mode aims to provide a Constant Subjective Quality level, and varies the encoded bitrate based on the content to reach that quality level. The second is CSQ-CABR mode, which creates an initial (reference) encode in CSQ mode, and then applies Beamr’s CABR technology to create a reduced-bitrate encode which has the same perceptual quality as the target CSQ encode. In both cases, we used a range of six CSQ values equally spaced from 16 to 31, representing a wide range of subjective video qualities. 

After we completed the encodes in both rate control modes, we compared three attributes of the CSQ encodes to the CSQ-CABR encodes: 

  1. File Size – to determine the amount of bitrate savings achievable by the CABR-CSQ rate control mode
  2. BD-Rate – to determine how the two rate control modes compare in terms of the objective quality measures PSNR, SSIM and VMAF, computed between each encode and the source (uncompressed) video
  3. Subjective quality – to determine whether the CSQ encode and the CABR-CSQ encode are perceptually identical to each other when viewed side by side in motion. 

Results

The table below shows the bitrate savings of CABR-CSQ vs. CSQ for various values of the CSQ parameter. As expected, the savings are higher for low CSQ values, which correlate with higher subjective quality and higher bitrates. As the CSQ increases, quality decreases, bitrate decreases, and the savings of the CABR-CSQ algorithm are decreased as well. 

Table 1: Savings by CSQ value

The overall average savings across all clips and all CSQ values is close to 26%. If we average the savings only for the high CSQ values (16-22), which correspond to high quality levels, the average savings are close to 32%. Obviously, saving one quarter or one third of the storage cost, and moreover the CDN delivery cost, can be very significant for UGC service providers.

Another interesting analysis would be to look at how the savings are distributed across specific UGC genres. Table 2 shows the average savings for each of the 15 content categories available on the YouTube UGC Dataset. 

Table 2: Savings by Genre

As we can see, simple content such as lyric videos and “how to” videos (where the camera is typically fixed) get relatively higher savings, while more complex content such as gaming (which has a lot of detail) and live music (with many lights, flashes and motion) get lower savings. However, it should be noted that due to the relatively low number of selected clips from each genre (one in each resolution, for a total of 2-5 clips per genre), we cannot draw any firm conclusions from the above table regarding the expected savings for each genre. 

Next, we compared the objective quality metrics PSNR, SSIM and VMAF for the CSQ encodes and the CABR-CSQ encodes, by creating a BD-Rate graph for each clip. To create the graph, we computed each metric between the encodes at each CSQ value and the source files, resulting in 6 points for CSQ and 6 points for CABR-CSQ (corresponding to the 6 CSQ values used in both encodes). Below is an example of the VMAF BD-Rate graph comparing CSQ with CABR-CSQ for one of clips in the lyric video category.

Figure 1: CSQ vs. CSQ-CABR VMAF scores for the 1920×1080 LyricVIdeo file

As we can see, the BD-Rate curve of the CABR-CSQ graph follows the CSQ curve, but each CSQ point on the original graph is moved down and to the left. If we compare, for example, the CSQ 19 point to the CABR-CSQ 19 point, we find that CSQ 19 has a bitrate of around 8 Mbps and a VMAF score of 95, while the CABR-CSQ 19 point has a bitrate of around 4 Mbps, and a VMAF score of 91. However, when both of these files are played side-by-side, we can see that they are perceptually identical to each other (see screenshot from the Beamr View side by side player below). Therefore, the CABR-CSQ 19 encode can be used as a lower-bitrate proxy for the CSQ 19 encode.

Figure 2: Side-by-side comparison in Beamr View of CSQ 19 vs. CSQ-CABR 19 encode for the 1920×1080 LyricVIdeo file

Finally, to verify that the CSQ and CABR-CSQ encodes are indeed perceptually identical, we performed subjective quality testing using the Beamr VISTA application. Beamr VISTA enables visually comparing pairs of video sequences played synchronously side by side, with a user interface for indicating the relative subjective quality of the two video sequences (for more information on Beamr VISTA, listen to episode 34 of The Video Insiders podcast). The set of target comparison pairs comprised 78 pairs of 10 second segments of Beamr4x CSQ encodes vs. corresponding Beamr4x CABR-CSQ encodes. 30 test rounds were performed, resulting in 464 valid target pair views (e.g. by users who correctly recognized mildly distorted control pairs), or on average 6 views per pair. The results show that on average, close to 50% of the users selected CABR-CSQ as having lower quality, while a similar percentage of users selected CSQ as having lower quality, therefore we can conclude that the two encodes are perceptually identical with a statistical significance exceeding 95%.

Figure 3: Percentage of users who selected CABR-CSQ as having lower quality per file

Conclusions

In this blog post we presented the results of applying Beamr’s Content Adaptive BitRate (CABR) encoding to a random selection of user-generated clips taken from the YouTube UGC Dataset, across a range of quality (CSQ) values. The CABR encodes had 25% lower bitrate on average than regular encodes, and at high quality values, 32% lower bitrate on average. The Rate-Distortion graph is unaffected by applying CABR technology, and the subjective quality of the CABR encodes is the same as the subjective quality of the regular encodes. By shaving off a quarter of the video bitrate, significant storage and delivery cost savings can be achieved, and the strain on today’s bandwidth-constrained networks can be relieved, for the benefit of all netizens. 

Appendix

Below are links to all the source clips used in the Beamr 4x CABR UGC test.

Animation: Animation 360p Animation 480p Animation 720p Animation 1080p Animation 2160p

CoverSong: CoverSong 360p CoverSong 480p CoverSong 720p CoverSong 1080p

Gaming: Gaming 360p Gaming 480p Gaming 720p Gaming 1080p Gaming 2160p

HDR: HDR 1080p HDR 2160p

HowTo: HowTo 360p HowTo 480p HowTo 720p HowTo 1080p

Lecture:  Lecture 360p Lecture 480p Lecture 720p Lecture 1080p

LiveMusic: LiveMusic 360p LiveMusic 480p LiveMusic 720p LiveMusic 1080p

LyricVideo: LyricVideo 360p LyricVideo 480p LyricVideo 720p LyricVideo 1080p

MusicVideo: MusicVideo 360p MusicVideo 480p MusicVideo 720p MusicVideo 1080p

NewsClip:  NewsClip 360p NewsClip 480p NewsClip 720p NewsClip 1080p

Sports:  Sports 360p Sports 480p Sports 720p Sports 1080p

TelevisionClip:  TelevisionClip 360p TelevisionClip 480p TelevisionClip 720p TelevisionClip 1080p

VR:  VR 720p VR 1080p VR 2160p

VerticalVideo:  VerticalVideo 360p VerticalVideo 480p VerticalVideo 720p VerticalVideo 1080p VerticalVideo 2160p

Vlog:  Vlog 360p Vlog 480p Vlog 720p Vlog 1080p Vlog 2160p

How to deal with the tension of video on the mobile network – Part 1

Last week, the Internet erupted in furor over Verizon’s alleged “throttling” of video streaming services over their mobile network. With a quick glance at the headlines, and to the uninitiated, this could be perceived as an example of a wireless company taking their market dominance too far. Most commenters were quick to pontificate calling “interference” by Verizon a violation of net neutrality.

But this article isn’t about the argument for, or against, network neutrality. Instead, let’s examine the tension that exists as a result of the rapid increase in video consumption on mobile devices for the OTT and video streaming industry. Let’s explore why T-Mobile, Verizon, and others that have yet to come forward, feel the need to reduce the size of the video files that are streaming across their networks.

Cisco reports that by 2021, 82% of all Internet traffic will be video, and for the mobile network video is set to explode equally so that by 2022 75% of data flowing over a mobile network will be video according to Ericsson. This increase of video over the mobile network means by 2021, the average user is set to consume a whopping 8.9GB of data every month as reported by BGR. These data points reveal why escalating consumption of video by wireless subscribers is creating tension in the ecosystem.

So what are the wireless operators trying to achieve by reducing the bitrates of video that is being delivered on their network?

Many mobile service operators offer their own entertainment video service packages, which means they are free to deliver the content in the quality that is consistent with their service level positioning. For some, this may be low to medium quality, but most viewers won’t settle for anything short of medium to high quality.

As most mobile networks have internal video distribution customers such as AT&T with DirecTV Now, at the same time, AT&T delivers video for Netflix. Which means, DirecTV Now is free to modify the encoded files to the maximum extent in order to achieve a perfect blend of quality and low bitrate, while for premium services like Netflix, the video packets cannot be touched due to DRM and the widespread adoption of HTTPS encryption. The point is, mobile carriers don’t always control the formats or quality of video that they carry over the network and for this reason, every content owner and video distributor should have an equal interest in pre-packaging (optimizing) their content for the highest quality and smallest file size possible.

As consumers grow more savvy to the difference in video and service quality between content services, many are becoming less willing to compromise. After all, you don’t invest in a top-of-the-line phone with an AMOLED screen to watch blocky low resolution video. Yet, because of the way services deliver content to mobile devices, in some cases, the full quality of the devices’ screen is unable to be realized by the consumer.

We see this point accentuated when a mobile network operator implements technology designed to reduce the resolution, or lower video complexity, in order to achieve a reduced bandwidth target. Attempts are made to make these changes while preserving the original video quality as much as possible, but it stands to reason that if you start with 1080p (full HD) and reduce the resolution to 480p (standard definition), the customer experience will suffer. Currently, the way bandwidth is being reduced on mobile networks is best described as a brute force method. In scenarios where mobile operators force 480p, the bitrate is reduced at the expense of resolution. But is this the best approach? Let’s take a look.

Beamr published a case study with findings from M-GO where our optimization solution helped to reduce buffering events by up to 50%, and improved stream start times by as much as 20%. These are impressive achievements, and indicative of the value of optimizing video for the smallest size possible, provided the original quality is retained.

A recent study “Bit rate and business model” published by Akamai in conjunction with Sensum also supports M-GO and Conviva’s Viewer Experience Report findings. In the Akamai/Sensum study, the human reaction to quality was measured and the researchers found that three out of four participants would stop using a service after even a few re-buffering events.

For the study, viewers were split into two control groups with one group exposed only to a lower resolution (quality) stream that contained at least one stream interruption (re-buffering event). This group was 20% less likely to associate a positive word with the viewing experience as compared to viewers who watched the higher quality full resolution stream that played smoothly without buffering (resolutions displayed were up to 4K). Accordingly, lower quality streams lead to a 16% increase in negative emotions, while higher quality streams led to a 20% increase in emotional engagement.

There are those who claim “you can’t see 4K”, or use phrases like “smarter pixels not more pixels.” With the complexity of the human visual system and its interconnection to our brain, the Akamai study shows that our physiological systems are able to detect differences between higher resolution and lower resolution. These disruptions were validated by changes in the viewers eye movements, breathing patterns, and increased perspiration.

Balancing the needs of the network, video distributor, and consumer.

  • Consumers expect content at their fingertips, and they also expect the total cost of the content and the service needed to deliver it, to be affordable.
  • Service providers are driven by the need to deliver higher quality video to increase viewer engagement.
  • Mobile network operators welcome any application that drives more demand for their product (data) with open arms, yet, need to face the challenge of how to deal with this expanding data which is beginning to outstrip the customers willingness to pay.

Delivering content over the Internet is not free as some assume. Since the streaming video distributor pays the CDN by the size of the package, e.g. gigabytes delivered, they are able to exploit the massive network investments made by mobile operators. Meanwhile, they (or more specifically their end-customers) carry the expectation that the capacity needed to deliver their videos to meet demand, will always be available. Thus, a network operator must invest ahead of revenues with the promise that growth will meet the investment.

All of this can be summed up by this simple statement, “If you don’t take care of the bandwidth, someone else will.”

Video codecs are evolutionary with each progressive codec being more efficient than the last. The current standard is H.264 and though this codec delivers amazing quality with reasonable performance and bitrate reduction, it’s built on a standard that is now fourteen years old. However, as even entry level mobile phones now support 1080p, video encoding engineers are running into an issue with H.264 not able to reach the quality they need below 3 Mbps. In fact, some distributors are pushing their H.264 bitrates lower than  3Mbps for 1080p, but in doing so they must be willing to introduce noticeable artifacts. So the question is, how do we get to 2 Mbps or lower, but with the same quality of 3-4 Mbps, and with the original resolution?

Enter HEVC.

With Apple’s recent announcement to support HEVC across as many as 400 million devices with HW decoding, content owners should be looking seriously to adopt HEVC in order to realize the 40% reduction in bitrate that Apple is reporting, over H.264. But how exactly can HEVC bring relief to an overburdened mobile network?

In the future it can be argued that once HEVC has reached broad adoption, the situation we have today with bitrates being higher than we’d like, will no longer exist. After all, if you could flip a switch and reduce all the video traffic on the network by 40% with a more efficient compression scheme (HEVC), then it’s quite possible that we’ll push the bandwidth crunch out for another 3-5 years.

But this thinking is more related to fairytales and unicorns than real life. For one thing, video encoding workflows and networks do not function like light switches. Not only does it takes time to integrate and test new technology, but a big issue is that video consumption and advanced entertainment experiences, like VR, AR, and 360, will consume the new white space as quickly as it becomes available, bringing us back to where we are today.

Meeting the bandwidth challenge will require us working together.

In the above scenario, there is a shared responsibility on both the distributor and the network to each play their role in guaranteeing that quality remains high while not wasting bits. For those who are wondering, inefficient encoding methods, or dated codecs such as H.264 fall into the “inefficient” category.

The Internet is a shared resource and whether it stays under some modicum of government regulation, or becomes open again, it’s critical for all members of the ecosystem to recognize that the network is not of infinite capacity and those using it to distribute video should respect this by taking the following steps:

  1. Adopt HEVC across all platforms and resolutions. This step alone will yield up to a 40% reduction over your current H.264 bandwidths.
  2. Implement advanced content-adaptive technologies such as Beamr CABR (CABR stands for Content-Adaptive Bitrate) which can enable a further reduction of video bitrates over the 40% that HEVC affords, by an additional 30-50%.
  3. Adopt just in time encoding that can allow for real-time dynamic control of bitrate based on the needs of the viewing device and network conditions. Intel and Beamr have partnered to offer an ultra-high density and low cost HEVC 4K, live 10bit encoding solution using the E3 platform with IRIS PRO P580 graphics accelerator.

In conclusion.

  • With or without network neutrality, reducing video bandwidth will be a perpetual need for the foreseeable future. Whether to delay capex investment, or to meet competitive pressure on video quality, or simply to increase profitability and decrease opex, the benefits to always delivering the smallest file and stream sizes possible, are easy to model.
  • The current method of brute forcing lower resolutions, or transcoding to reduced framerate will not be sustainable as consumers are expecting the original experience to be delivered. The technical solutions implemented must deliver high quality and be ready for next generation entertainment experiences. At the same time, if you don’t work to trim the fat from your video files, someone else may do it, and it most certainly will be at the expense of video quality and user experience.
  • HEVC and Beamr CABR represent the state of the art in high quality video encoding and bitrate reduction (optimization) without compromise.

If you’d like to learn more, keep an eye out for part two in this series, or take a moment to read this relevant article: It’s Unreasonable to Expect ISP’s Alone to Finance OTT Traffic

In the meantime, you can download our VP9 vs. HEVC white paper, learn how to encode content for the future, or contact us at sales@beamr.com to talk further.