With numerous advantages, AV1 is now supported on about 60% of devices and all major web browsers. To accelerate its adoption – Beamr has introduced an easy, automated upgrade to the codec that is in the forefront of today’s video technology
Four years ago we explored the different video codecs, analyzing their strengths and weaknesses, and took a look at current and predicted market share. While it is gratifying to see that many of our predictions were pretty accurate, that is accompanied by some degree of disappointment: while AV1 strengths are well known in the industry, significant change in adoption of new codecs has yet to materialize.
The bottom line of the 2020 post was: “Only time will tell which will have the highest market share in 5 years’ time, but one easy assessment is that with AVC current market share estimated at around 70%, this one is not going to disappear anytime soon. AV1 is definitely gaining momentum, and with the giants backing we expect to see it used a fair bit in online streaming. “
Indeed we are living in a multi-codec reality, where AVC still accounts for, by far, the largest percentage of video content, but adoption of AV1 is starting to increase with large players such as Netflix and YouTube incorporating it into their workflows, and many others using it for specific high value use cases.
Thus, we are faced with a mixture of the still dominant AVC, HEVC (serving primarily UHD and HDR use cases), AV1 and some additional codecs such as VP9, VVC which are being used in quite small amounts.
The Untapped Potential of AV1
So while AV1 adoption is increasing, there is still significant untapped potential. One of the causes for slower than hoped rollout of AV1 is the obstacle present for adoption of any new standard – critical mass of decoding support in H/W on edge devices.
While for AVC and HEVC the coverage is very extensive, for AV1 that has only recently become the case, with support across an estimate of 60% of devices and all major web browsers, and complementing the efficient software decoding offered by Dav1d.
Another obstacle AV1 faces involves the practicalities of deployment. While there is extensive knowledge, within the industry and available online, of how best to configure AVC encoding, and what presets and encoding parameters work well for which use cases – there is no such equivalent knowledge for AV1. Thus, in order to deploy it, extensive research is needed by those who intend to use it.
Additionally, AV1 encoding is complicated, resulting in much higher processing power required to perform software encoding. In a world that is constantly trying to cut back costs, and use lower power solutions, this can pose a problem. Even when using software solutions at the fastest settings, the compute required is still significantly slower than AVC encoding at typical speeds. This is a strong motivator to upgrade to AV1 using H/W accelerated solutions (Learn more about Beamr solution to the challenge).
The upcoming codec possibilities are also a deterrent for some. With AV2 in the works, VVC finalized and gaining some traction, and various groups working on AI based encoding solutions, there will always be players waiting for ‘the next big thing’, rather than having to switch out codecs twice.
In a world where JPEG, a 30+ year old standard, is still used in over 70% of websites and is the most popular format on the web for photographic content, it is no surprise that adoption of new video codecs is taking time.
While a multi codec reality is probably going to stay with us, we can at least hope that when we revisit this topic in a blog a few years down the line, the balance between deployed codecs leans more towards the higher efficiency codecs, like AV1, to yield the best bitrate – quality options for the video world.
This year at IBC 2024 in Amsterdam, we are excited to demonstrate Live 4K p60 optimized streaming with our Content-Adaptive Bitrate (CABR) technology on NVIDIA Holoscan for Media, a software-defined, AI-enabled platform that allows live video pipelines to run on the same infrastructure as AI. Using the CABR GStreamer plugin, premiered at the NAB Show earlier this year, we now support live, quality-driven optimized streaming for 4Kp60 video content.
It is no secret that savvy viewers are coming to expect the high-quality experience of 4K Ultra-High-Definition streamed at 60 frames per second for premium events. What started with a drizzle a few years back has become the high end norm for recent events such as the 2024 Olympics, where techies were sharing insights on where it could be accessed.
Given the fact that 4K means a whopping four times the pixels compared to full HD resolution, keeping up with live encoding of 4K at 60 fps can be quite challenging, and can also result in bitrates that are too high to manage.
One possible solution for broadcasters is to encode and transmit at 1080p and rely on the constantly improving upscalers available on TVs to provide the 4K experience, but this of course means they cannot control the user experience. A better solution is to have a platform that is super fast, and can create live 4Kp60 encodes, which combine excellent quality with an optimization process that minimizes the required bitrate for transmission.
Comparison of 4K Live video before and after optimization
Beamr CABR on Holoscan for Media offers exactly that, by combining the fast data buses and easy-to-use architecture of Holoscan for Media with Beamr hardware-accelerated, quality-driven optimized AV1 encoding. Together, it is possible to stream super efficient, 4K, lower bitrate encodes at top notch quality.
Content Adaptive Bitrate encoding, or CABR, is Beamr’s patented and award-winning technology that uses a quality measure to select the best candidate with the lowest bitrate and the same perceptual quality as a reference frame. In other words, users can enjoy 30-50% lower bitrate, faster delivery of files or live video streams and improved user experience – all with exactly the same quality as the original video.
In order to achieve aggressive bitrates which are feasible for broadcast of live events, we configure the system to use AV1 encoding. The advanced AV1 format has been around since 2018. However, its full potential has not been fully realized by many players in the video arena. AV1 is raising the bar significantly in comparison to previous modern codecs, such as AVC (H.264) or HEVC (H.265), in terms of efficiency, performance with GPUs and high quality for real-time video. When combined with CABR – AV1 is offering up even more. According to our tests, AV1 can reduce data by 50% compared to AVC and by 30% compared to HEVC. We also showed that CABR optimized AV1 is beneficial for machine learning tasks.
Putting all three of these technologies together, namely deploying Holoscan for Media with the Beamr CABR solution inside, which in turn is using NVIDIA’s hardware-accelerated AV1 encoder, provides a platform that offers spectacular benefits. With the rise in demand for high-quality live streaming at high resolution, high fps and manageable bitrates, while keeping an eye on the encoding costs – this solution is definitely an interesting prospect for companies looking to boost their streaming workflows.
Machine learning for Video is an expanding field, garnering vast interest, with generative AI for video picking up speed. However there are significant pain points for these technologies such as storage and bandwidth bottlenecks when dealing with video content, as well as training and inferencing speeds.
In the following case study, we show that training an AI network for action recognition using video files compressed and optimized through Beamr Content-Adaptive Bitrate technology (CABR), produces results that are as good as training the network with the original, larger files. The ability to use significantly smaller video files can accelerate machine learning (ML) training and inferencing.
Motivation
Beamr’s CABR enables significantly decreasing video file size without changing the video resolution, compression or file format or compromising perceptual quality. It is therefore a great candidate for resolving file size issues and bandwidth bottlenecks in the context of ML for video.
In a previous case study we looked at the task of people detection in video using pre-trained models. In this case study we cover the more challenging task of training a neural network for action recognition in video, comparing the outcome when using source vs optimized files.
We will start by describing the problem we targeted, and then provide the classifier architecture used. We will continue with details on the data sets used and their optimization results, followed by the experiment results, concluding with directions for future work.
Action recognition task
When setting the scope for this case study it was essential to us to define a test case that makes full use of the fact that the content is video, as opposed to image. Therefore we selected a task which requires the temporal element of video to perform the classification – action recognition. In viewing individual frames it is not possible to differentiate between frames captured during walking and running, or between someone jumping or dancing. For this a sequence of frames is required, which is why this was our task of choice.
Target data set
For the fine tuning step we collected a set of 160 user-generated content free to use video clips, downloaded from the Pexels and Envato stock-video websites. The videos were downloaded in 720p resolution, using the website default settings. We selected videos that belong to one of the following four action classes or categories: running, martial arts, dancing and rope jumping.
In order to use these in the selected architecture, they needed to be cropped to a square input. This was done by manually marking “ROI” in each clip, and performing the crop using OpenCV and corresponding OpenH264 encoding with default configuration and settings.
We first performed optimization of the original clip set, using the Beamr cloud optimization SaaS, obtaining an average reduction of 24%. This is beneficial when storing the test set for future use and possibly performing other manipulations on it. However, for our test we wanted to compress the set of cropped videos that were actually used for the training process. Applying the same optimization to these files, created by openCV, yielded a whopping 67% savings or average reduction.
Architecture
We selected an encoder-decoder architecture, which is commonly used for classification of video or other time series inputs. For the encoder we used ResNet-152 pre-trained with ImageNet, followed by 3 fully connected layers with sizes of 1024, 768 and 512. For the decoder we used an LSTM decoder followed by 2 fully connected layers consisting of 512 and 256 neurons.
Pre-training
We performed initial training of the network using the UCF-101 dataset which consists of 13,320 video clips, at a resolution of 240p, classified into 101 possible action classes. The data was split so that 85% of the files were used for the training and 15% for validation.
These videos were resized to 224 x 224 prior to feeding into the classifier. The training was done using a batch size of 24, and 35 epochs were performed. For the error function we used cross-entropy loss which is a popular choice for classifier training. The Adaptive Moment Estimation, or Adam, optimizer with a learning rate of 1e-3 was selected for the training process as it solves the problems of local minima, overshoot or oscillation caused by the fixed values of the learning rates during the updating of network parameters. This setup yielded a result of 83% accuracy on the validation set.
Training
We performed fine tuning of the pre-trained network described above, to learn the target data set..
The training was performed on 2.05 GB of cropped videos, and on 0.67 GB of cropped & optimized videos, with 76% of the files used for training and 24% for validation.
Due to the higher resolution of the input in the target data set the fine tuning training was done using a batch size of only 4. 30 epochs were performed, though we generally achieved convergence already at 9-10 epochs. Again we used cross-entropy loss and Adam optimizer with a learning rate of 1e-3.
Due to the relatively small sample size used here, a difference in one or two classifications can alter results, so we repeated the training process 10 times for each case in order to obtain confidence in the results. The obtained accuracy results for the 10 testing rounds on each of the non-optimized and optimized video sets are presented in the following table.
Minimum Accuracy
Average Accuracy
Maximum Accuracy
Non-Optimized Videos
56%
65%
69%
Optimized Videos
64%
67%
75%
To further verify the results we collected a set of 48 additional clips, and tested these independently on each of the trained classifiers. Below we provide the full cross matrix of maximum and mean accuracy obtained for the various cases.
Tested on Non-Optimized
Tested on Optimized
Trained on Non-Optimized
65%, 53%
62%, 50%
Trained on Optimized
62%, 50%
65%, 50%
Summary & Future work
the results shared above confirm that training a neural network with significantly smaller video files, optimized by Beamr’s CABR, has no negative impact on the training process. In this experiment we even saw a slight benefit resulting from training using optimized files. However, it is unclear if this is a significant conclusion, and we intend to investigate this further. We also see that the cross testing/training has similar results in the different cases.
This test was an initial, rather small scale experiment. We are planning to expand this to larger scale testing, including distributed training setups in the cloud using GPU clusters, where we expect to see further benefits from the reduced sizes of the files used.
This research is part of our ongoing quest to accelerate adoption, and increase accessibility of machine learning and deep learning video as well as video analysis solutions.
By reducing video size but not perceptual quality, Beamr’s Content Adaptive Bit Rate optimized encoding can make video used for vision AI easier to handle thus reducing workflow complexity
Written by: Tamar Shoham, Timofei Gladyshev
Motivation
Machine learning (ML) for video processing is a field which is expanding at a fast pace and presents significant untapped potential. Video is an incredibly rich sensor and has large storage and bandwidth requirements, making vision AI a high-value problem to solve and incredibly suited for AI and ML.
Beamr’s Content Adaptive Bit rate solution (CABR) is a solution that can significantly decrease video file size without changing the video resolution, compression or file format or compromising perceptual quality. It therefore interested us to examine how the Beamr CABR solution can be used to assist in cutting down the sizes of video used in the context of ML.
In this case study, we focus on the relatively simple task of people detection in video. We made use of the NVIDIA DeepStream SDK, a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio and image understanding. Using this SDK is a natural choice for Beamr as an NVIDIA Metropolis partner.
In the following we describe the test setup, the data set used, test performed and obtained results. Then we will present some conclusions and directions for future work.
Test Setup
In this case study, we limited ourselves to comparing detection results on source and reduced-size files by using pre-trained models, making it possible to use unlabeled data.
We collected a set of 19 User-Generated Content, or UGC, video clips, captured on a few different iPhone models. To these we added some clips downloaded from the Pexels free stock videos website. All test clips are in the mp4 or v file format, containing AVC/H.264 encoded video, with resolutions ranging from 480p to full HD and 4K and durations ranging from 10 seconds to 1 minute. Further details on the test files can be found in Annex A.
These 14 source files were then optimized using Beamr’s storage optimization solution to obtain files that were reduced in size by 9 – 73%, with an average reduction of 40%. As mentioned above, this optimization results in output files which retain the same coding and file formats and the same resolution and perceptual quality. The goal of this case study is to show that these reduced-size, optimized files also provide aligned ML results.
For this test, we used the NVIDIA DeepStream SDK [5] with the PeopleNet-ResNet34 detector. Once again, we calculated the mAP among the detections on the pairs of source and optimized files for an IoU threshold of 0.5.
Results
We found that for files with predictions that align with actual people, the mAP is very high, showing that true detection results are indeed unaffected by replacing the source file with the smaller, easier-to-transfer, optimized file.
An example showing how well they align is provided in Figure 1. This test clip resulted in a mAP[0.5] value of 0.98.
Figure 1: Detections for pexels_video_2160p_2 frame 305 using PeopleNet-ResNet34, source on top, with optimized (54% smaller) below
As the PeopleNet-ResNet34 model was developed specifically for people detection, it has quite stable results, and overall showed high mAP values with a median mAP value of 0.94.
When testing some other models we did notice that in cases where the detections were unstable, the source and optimized files sometimes created different false positives. It is important to note that because we did not have labeled data, or a ground truth, when such detection errors occur out of sync, they have a double impact on the mAP value calculated between the detections on the source and the detections on the optimized file. This results in poorer results than the mAP values expected when calculating for detections vs. the labeled data.
We also noticed cases where there is a detection flicker, with the person being detected only in some of the frames where they appear. This flicker is not always synchronized between the source and optimized clips, resulting once again in an ‘accumulated’ or double error in the mAP calculated among them. An example of this is shown in Figure, for a clip with a mAP[0,5] value of 0.92.
Figure 2a: Detections for frames 1170 from the clip pexels_musicians.mp4 using PeopleNet-ResNet34, source on the left and optimized (44% smaller) on the right. Note detection on the left of the stairs, present only in the source file. Figure 2b: same for frame 1171, with no detection in eitherFigure 2c: frame 1172, detected in bothFigure 2d: frame 1173, detected only in the optimized file
Summary
The experiments described above show that CABR can be applied to videos that undergo ML tasks such as object detection. We showed that when detections are stable, almost identical results will be obtained for the source and optimized clips. The advantages of reducing storage size and transmission bandwidth by using the optimized files make this possibility particularly attractive.
Another possible use for CABR in the context of ML stems from the finding that for unstable detection results, CABR may have some impact on false positives or mis-detects. In this context, it would be interesting to view it as a possible permutation on labeled data to increase training set size. In future work, we will investigate the further potential benefits obtained when CABR is incorporated at the training stage and expand the experiments to include more model types and ML tasks.
This research is all part of our ongoing quest to accelerate adoption and increase the accessibility of video ML/DL and video analysis solutions.
Annex A – test files
Below are details on the test files used in the above experiments. All the files, and detection results are available here
#
Filename
Source
Bitrate
Dims WxH
FPS
Duration [sec]
saved by CABR
1
IMG_0226
iPhone 3GS
3.61 M
640×480
15
36.33
35%
2
IMG_0236
iPhone 3GS
3.56 M
640×480
30
50.45
20%
3
IMG_0749
iPhone 5
17.0 M
1920×1080
29.9
11.23
34%
4
IMG_3316
iPhone 4S
21.9 M
1920×1080
29.9
9.35
26%
5
IMG_5288
iPhone SE
14.9 M
1920×1080
29.9
43.40
29%
6
IMG_5713
iPhone 5c
16.3 M
1080×1920
29.9
48.88
73%
7
IMG_7314
iPhone 7
15.5 M
1920×1080
29.9
54.53
50%
8
IMG_7324
iPhone 7
15.8 M
1920×1080
29.9
16.43
39%
9
IMG_7369
iPhone 6
17.9 M
1080×1920
29.9
10.23
30%
10
pexels_musicians
pexels
10.7 M
1920×1080
24
60.0
44%
11
pexels_video_1080p
pexels
4.4 M
1920×1080
25
12.56
63%
12
pexels_video_2160p
pexels
12.2 M
3840×2160
25
15.24
9%
13
pexels_video_2160p_2
pexels
15.2 M
3840×2160
25
15.84
54%
14
pexels_video_of_people_walking_1080p
pexels
3.5 M
1920×1080
23.9
19.19
58%
Table A1: Test files used, with the per file savings
My Journey Through the Evolution of Video Processing: From Low-Quality Streaming to HD and 4K Becoming a Commodity, and Now the AI-Powered Video Revolution
Digital video has been my primary focus for the past three decades. I have built software codecs, designed ASICs, and now optimize GPU encoders with advanced GPU software.
My journey in video processing has been transformative, starting with low-resolution streaming, advancing into HD and 4K, as they shift from a rare event to an everyday expectation. And now, we stand at the next frontier—AI is redefining how we create, deliver, and experience video like never before.
My journey into this field began with the introduction of QuickTime 1.0 in 1991, when I was in my 20s. It looked to me like magic — a compressed movie playing smoothly on a single-speed CD-ROM (150 KB/s, 1.2 Mbps). At the time, I had no understanding of video encoding, but I was fascinated. At that moment I knew this is the field I wish to dive into.
Apple QuickTime Version 1.0 Demo
Chapter 1: The Challenge of Streaming Video with Low-Resolution, Low-Quality Videos
The early days of streaming, in the mid 90s, were characterized by low-resolution video, low frame rates (12-15 fps), and low bitrates — 28.8 kbps, 33 kbps, or 56 kbps — two to three orders of magnitude 100x – 1000x lower bitrate than today’s standards. This was the reality of digital video in 1996 and the years followed.
By 1996, I was one of 4 co-founders of Emblaze – We developed a vector-based graphic tool called “Emblaze Creator” – think of it as Adobe Flash before Adobe Flash.
We soon realized we needed video support. We started by downloading videos in the background. Obviously, the longer the video was, the more time it took to download, which was frustrating to wait for. So we limited the videos to just 30 seconds.
Early solutions, like RealNetworks and VideoNet, required dedicated video servers — an expensive and complex infrastructure. It seemed to me like a very long and costly journey to streaming enablement.
Adding video to our offerings quickly was crucial for our company’s survival, so we persistently tackled this challenge. I remember the nights spent experimenting and exploring solutions, but all paths seemed to converge on the RealNetworks approach, which we couldn’t adopt in the short term.
We had to find a way to solve the challenge of streaming video efficiently for very low bandwidth. And while it was hard to stream files, you could slice them. So in 1997, I came up with an idea and worked with my team at Emblaze on the following solution:
Take a video file and divide it into numbered slices.
Create an index file with the order of the slices, and place it on a standard HTTP server.
The player will read that index file and pull the segments from a web server in the order as in the index file.
Just to make it more real, here is the patent we submitted in 1998, and granted in 2002:
But that was not enough, why not create time synchronized slices, so the player will be able to pull the optimal chucks based on the specific bandwidth characteristics when playing the files?
The player will read the index file from the server and choose a level to read, decide on a slice, and based on the bitrate move up and down the bitrate ladder.
If that reminds you of HLS – then it was HLS many years before HLS was out.
We demonstrated this live with EarthLink at the Easter Egg Roll at the White House in 1998. Our systems were made of H.263 and then H.264 encoders, and a patented streaming protocol. We had a track with 10 Compaq workstations running 8 cameras that day.
When you build a streaming solution, you need a player. Without it, all that effort is meaningless. At Emblaze, we had a Java-based player that required no installation—a major advantage at the time.
Back then, mobile video was in its infancy, and we saw an opportunity. Feature phones simply couldn’t play video, but the Nokia Communicator 9110 could. It had everything—a grayscale screen, a 33MHz 32-bit CPU, and wireless data access—a powerhouse by late ‘90s standards.
In 1999, I demonstrated a software video decoder running on the Nokia 9110 to Samsung Mobile CEO. This was a game-changer—it proved that video streaming on mobile devices was possible. Samsung, being a leader in CDMA 2000, wanted to showcase this capability at the 2000 Olympics and needed working prototypes.
Samsung challenged us to build a mobile ASIC capable of decoding streaming video on just 100mW of power. We delivered. The solution was announced at the Olympics, and by 2001, it was in mass production.
This phone featured The Emblaze Multimedia Application Co-Processor, working alongside the baseband chip to enable seamless video playback over CDMA 2000 networks—a groundbreaking achievement at the time.
Chapter 2: HD Becomes the Standard, 4K HDR Becomes Common
HD television was introduced in the U.S. during the second half of the 90s, but it wasn’t until 2003 that satellite and cable providers really started broadcasting in HD.
I still remember 2003, staying at the Mandarin Oriental Hotel in NYC, where I had a 30-inch LCD screen with HD broadcasting. Standing close to the screen, taking in the crisp detail, was an eye-opening moment—the clarity, the colors, the sharpness. It was a huge leap forward from standard definition, and definitely better than DVDs.
But even then, it felt like just the beginning. HD was here, but it wasn’t everywhere yet. It took a few more years for Netflix to introduce streaming.
Beamr is Born
In early 2008, the startup I led, which focused on online backup, was acquired. By the end of the year, I found myself out of work. And so, I sent an email to Steve Jobs, pointing out that Time Machine’s performance was lacking, and that I believed I could help fix it. That email led to a meeting in Cupertino with the head of MobileMe—what we now know as iCloud.
That visit to Apple in early 2009 was fascinating. I learned that storing iPhone photos was becoming an enormous challenge. The sheer volume of images was straining Apple’s data centers, and they were running into power limitations just to keep up with demand.
With this realization, Beamr was born!
The question that intrigued us was: Can we make images smaller, while making sure they look exactly the same?
After about one year of research, we ended up founding Beamr instead of becoming a part of MobileMe. And the leader of this – the brains behind our technology that is here today.
During the first year of Beamr, we explored this idea. And we came out with our first product called JPEGmini, which does exactly that. This was achieved through the amazing innovation of our wonderful CTO, Tamar Shoham.
JPEGmini is a wonderful tool, and hundreds of thousands of content creators around the world use it.
After optimizing photos, we wanted to take on video compression. That’s when we developed our gravity defier—CABR, Content Adaptive BitRate technology. This quality-driven process can cut every high-quality video by 30% to 50% while preserving every frame’s visual integrity.
But our innovation comes with challenges:
Lightning-fast encoding without CABR, but with CABR it is slower and can’t run live at 4Kp60.
Running CABR is more expensive than non-CABR encoding.
In the year 2018, we came to the conclusion that we needed a hardware acceleration solution – to improve our density, our speed and the cost of processing.
We started by integrating with Intel GPUs, and it worked very well. We even demoed it at Intel Experience Day in 2019.
We had wonderful relationships with Intel and they had a good video encoding engine. We invested about two years of effort, and it did not materialize as an Intel GPU for the Data Center didn’t happen – a wasted opportunity.
Then, we thought of developing our own chip:
Its power will be a function of CPU or GPU
We will be able to put four 8Kp60 CABR chips on a single PCI card (for AC/HEVC and AV1).
It will cost less than a GPU and have 3X density.
Here’s a slide that shows that we were serious. We also started a discussion about raising funds to build that chip using 12nm technology.
But then, we looked at our plan and wondered: does this chip support the needs of the future?
How would you innovate on this platform?
What if you would like to run smarter algorithms or a new version of CABR?
Our design included programmable parts for customization. We even thought of adding GPU cores – but who is going to develop for it?
This was a key moment in 2020, when we understood that innovation is so fast that every silicon generation takes at least two years to build and that is too slow.
There is a scale that VPU solutions will be more efficient than GPU, but that cannot compete with the current pace of change. It may come that even the biggest social networks will abandon VPUs due to the need for AI and video to work together.
Chapter 3: GPUs and the Future of Video Processing
By 2021, NVIDIA invited us to bring CABR to GPUs. This was a three-year journey, requiring a complete rewrite of our technology for NVENC. NVIDIA fully supported us, integrating CABR into all encoding modes across AVC, HEVC, and AV1.
In May 2023, the first driver was out: NVENC SDK 12.1!
At the same time, Beamr went public on NASDAQ (under the ticker BMR), on the premise of a high-quality large-scale video encoding platform enabled on NVIDIA GPUs.
Since September 2024. Beamr CABR is running LIVE video optimization on NVIDIA GPUs at 4Kp60 across 3 codecs AVC, HEVC and AV1. It is 10X faster at 1/10 of the cost for AVC, and the ratio for HEVC is double – and you can double that again for AV1.
All of our challenges for bringing CABR to the masses are solved.
But the story doesn’t end here.
What we didn’t fully anticipate was how AI-driven innovation is transforming the way we interact with video, and the opportunities are even greater than we imagined, thanks to the shift to GPUs.
Let me give you a couple of examples:
In the last Olympics, I was watching windsurfing, and on-screen, I saw a real-time overlay showing the planned routes of each surfer, the wind speed and forward tactics, and the predictions on how they would converge at the finish line.
It was seamless, intuitive, and AI-driven—a perfect example of how AI enriches the viewing experience.
Or think about social media: AI plays a huge role in processing video behind the scenes. As videos are uploaded, VPUs (Video Processing Units) handle encoding, while AI algorithms simultaneously analyze content—deciding whether it’s appropriate, identifying trends, and determining who should see it.
But the processes used by many businesses are slow and inefficient. For every AI-powered video workflow, you need:
Load the video.
Decode it.
Process it (either for AI analysis or encoding).
Sync and converge the process.
Traditionally, these steps happened separately, often with significant latency.
But on a GPU?
Single load, single decode, shared memory buffer.
AI and video processing run in parallel.
Everything is synced and optimized.
And just like that—you’re done. It’s faster, more efficient, and more cost-effective. This is the winning architecture for the future of AI and video at scale.
Now available: Hardware accelerated, unsupervised, codec modernization to AV1 for increased efficiency video AI workflows
AV1, the new kid on the block of video encoders, is definitely starting to gain traction due to its high compression efficiency and increasing adoption on browsers and end devices. As we mentioned in our previous blog, H/W accelerated AV1 encoding is a particularly attractive prospect due to the combination of increased efficiency and light speed performance. H/W accelerated codec modernization – using Beamr’s Content Adaptive Bit-Rate (CABR) video optimization process running with NVIDIA video encoding – allows for fast, fully automatic, upgrade of legacy encodes to perceptually identical AV1 encodes.
Codec modernization is essentially the ability to get double the benefit – both the increased compression efficiency of codecs such as AV1, and the bitrate efficiencies of Beamr’s perceptually driven optimization. Over the years we have consistently validated that Beamr CABR technology creates optimized files that are perceptually identical, meaning they look the same to the human eye. While we have consistently demonstrated that the visual quality is indeed preserved, in this blog post we continue to explore how Beamr’s optimization lends itself to AI based workflows.
In our previous case studies, we looked at how the reduced bitrate, optimized videos, behave in Machine Learning (ML) tasks such as face detectionand action recognition training. We showed that the results when using optimized AVC and HEVC encodes are stable, despite reducing file sizes significantly with an average reduction of 24% on the source files, and an amazing x3 decrease in size of the cropped AVC encoded files created by openCV.
Now we add codec modernization to the mix, which allows to reduce the sizes of the cropped encodes further. The AV1 encoded files are smaller by a factor of 4, while still providing very similar training and inference results, as shown by the maximal and average accuracy results obtained in the different experiments and presented in the following table:
Tested on AVC
Tested on optimized AV1
Trained on AVC
67.5% (53%)
66.4% (53%)
Trained on optimized AV1
66.4% (52%)
64.8% (53.5%)
Next we decided to ramp up the fun factor and play around with some cool AI applications. Using the open source Face Fusion project, we took 10 source AVC videos, an image containing our target face and proceeded to swap the faces in the source videos with our target person. Now, while this is a fun experiment in itself, imagine how much easier it becomes when the source videos are reduced by a factor of 4, with the results looking just the same.
Below is an example showing a frame from the source video, the target face image, and side by side comparison of the video with the replaced or fused face – when using the original AVC encode (on the left) or the AV1 optimized by Beamr (on the right), looking just as good:
We are just starting to scratch the surface on how Beamr’s technology and offerings, including codec modernization to AV1, can help make AI workflows more efficient without compromising quality or accuracy. We are excited to be on this journey and will continue to explore and add on to the synergies between video optimization, video modernization and video AI solutions.
Beamr’s Content Adaptive Bit Rate solution enables significantly decreasing video file size or bitrates without changing the video resolution or compromising perceptual quality. Since the optimized file is fully standard compliant, it can be used in your workflow seamlessly, whatever your use case, be it video streaming, playback or even part of an AI workflow.
Beamr first launched Beamr cloud earlier this year, and we are now super excited to announce that our valued partnership with Oracle Cloud Infrastructure (OCI) is enabling us to offer to OCI customers more features and better performance.
The performance improvements are due in part to the availability of the powerful NVIDIA L40S GPUs on OCI. In preliminary testing we found that running our video encoding workflows can be up to 30% faster when using these cards, than when running on the cards we currently use in the Beamr Cloud solution.
This was derived from testing AVC and HEVC NVENC driven encodes for a set of nine 1080p classic test clips with eight different configurations, and comparing encoding wall times on an A10G vs. a L40S GPU. Speedup factors of up to 55% were observed, with an average just above 30%. The full test data is available here.
Another exciting feature about these cards is that they support AV1 encoding, which means Beamr Cloud will now offer to turn your videos into optimized AV1 encodes, offering even higher bitrate/file size savings.
What’s the fuss about AV1?
In order to store and transmit video, substantial compression is needed. From the very earliest efforts to standardize video compression in the 90s, there has been a constant effort to create video compression standards offering increasing efficiency – meaning that the same video quality can be achieved with smaller files or lower bitrates.
As shown in the schematic illustration below, AV1 has come a long way in improving over H.264/AVC, the most widely adopted standard today, despite being 20 years old. However, the increased compression efficiency is not free – the computational complexity of newer codecs is also significantly higher, motivating the adoption of hardware accelerated encoding options.
With the demand and need for Video AI workflows continuing to rise, the ability to perform fully automatic, fast, efficient, optimized video encoding is an important enabler.
The Beamr GPU powered video compression and optimization occur within the GPU on OCI, right at the heart of these AI workflows, making them extremely well placed to offer benefits to such workflows. We have previously shown in a number of case studies that there is no negative impact on inference or training results when using the optimized files – making the integration of this optimization process into AI workflows a natural choice for cost savvy developers.
This year at the NAB Show 2024 in Las Vegas, we are excited to demonstrate our Content-Adaptive Bitrate (CABR) technology on the NVIDIA Holoscan for Media platform. By implementing CABR as a GStreamer plugin, we have, for the first time, made bitrate optimization of live video streams easily achievable in the cloud or premise.
Building on the NVIDIA DeepStream software development kit, which can extends GStreamer’s capabilities, significantly reduced the amount of code required to develop the Holoscan for Media based application. Using DeepStream components for real-time video processing and NMOS (Networked Media Open Specifications) signaling, we were able to keep our focus on the CABR technology and video processing.
The NVIDIA DeepStream SDK provides an excellent framework for developers to build and customize dynamic video processing pipelines. DeepStream provides pipeline components that make it very simple to build and deploy live video processing pipelines that utilize the hardware decoders and encoders available on all NVIDIA GPUs.
Beamr CABR dynamically adjusts video bitrate in real-time, optimizing quality and bandwidth use. It reduces data transmission without compromising video quality, making the video streaming more efficient. Recently we released our GPU implementation which uses the NVIDIA NVENC, encoder, providing significantly higher performance compared to previous solutions.
Taking our GPU implementation for CABR to the next level, we have built a GStreamer Plugin. With our GStreamer Plugin, users can now easily and seamlessly incorporate the CABR solution into their existing DeepStream pipelines as a simple drop-in replacement to their current encoder component.
Holoscan For Media
A GStreamer Pipeline Example
To illustrate the simplicity of using CABR, consider a simple DeepStream transcoding pipeline that reads and writes from files.
By simply replacing the nvv4l2av1enc component with our CABR component, the encoding bitrate is adapted in real-time, according to the content, ensuring optimal bitrate usage for each frame, without any loss of perceptual quality.
Similarly, we can replace the encoder component used in a live streaming pipeline with the CABR component to optimize live video streams, dynamically adjusting the output bitrate and offering up to a 50% reduction in data usage without sacrificing video quality.
The Broad Horizons of CABR Integration in Live Media
Beamr CABR, demonstrated using NVIDIA Holoscan for Media at NAB show, marks just the beginning. This technology is an ideal fit for applications running on NVIDIA RTX GPU-powered accelerated computing and sets a new standard for video encoding.
Lowering the video bitrate reduces the required bandwidth when ingesting video to the cloud, creating new possibilities where high resolution or quality were previously costly or not even possible. Similarly, reduced bitrate when encoding on the cloud allows for streaming of higher quality videos at lower cost.
From file-based encoding to streaming services — the potential use cases are diverse, and the integration has never before been so simple. Together, let’s step into the future of media streaming, where quality and efficiency coexist without compromise.
High Quality, High Scale, Low Cost video processing – Beamr Cloud is bringing technology that used to be exclusive to tech giants like Google and Meta – to everyone.
Here’s all you need to know about Beamr Cloud
Video rules the world
Video is the most dominant digital content type today. From high-res smartphones to IoT-enabled cameras, videos are being produced at a faster pace than ever, and at increasingly higher resolutions.
These days, videos are even created without cameras or human intervention, thanks to the expanding usage of GenerativeAI, Artificial Intelligence (AI) and Machine Learning (ML). All that, on top of the growing markets of Streaming (OTT) and User-Generated Content.
Video storage in the cloud alone is already a huge market that is expected to grow to $13.5 billion in the next year. Beamr Cloud started on Amazon’s AWS, and we plan to deliver our services to additional cloud platforms.
But Video is hard and complicated
Video is booming, but handling it is challenging and complicated.
Video files are huge, and they pile up in enormous libraries and repositories. Due to their enormous size, among other reasons, storing videos is costly, moving them around is expensive and processing them according to your specific needs takes significant computer resources.
In order to find the right balance between various business needs like Costs, Quality, User experience and Speed – you must be an expert. Otherwise the costs would be too high – and you’d leave money on the table.
Yet getting to this sweet spot is very complicated. Without getting too technical, let’s just briefly explain why: when you customize a video for a specific application, you have to control several key parameters, and balance the desired video quality, the minimal bandwidth required, and the compute intensity. In addition, this balancing act varies for different types of video, such as photorealistic, gaming, or animation content. Each type has different bandwidth requirements when delivered over the network.
Some companies handle videos well
The fact is that tech giants have the knowledge, money and engineering power to build efficient video workflows. They even developed specialized chips for this purpose, like Mount Shasta for Meta and VPU (Video Processing Unit) for Google.
But good luck getting your hands on the giants’ tech. Their video solutions are exclusive to them, and even if you manage to access them, you’ll probably find that they are tailored to their own specific needs and may not meet the quality, bandwidth, and cost criteria for your application.
Disrupting the Video World
Beamr is poised to disrupt the video world with automatic, efficient and scalable video processing that is oriented to the booming markets of Artificial Intelligence (AI), User-Generated Content, Autonomous Cars, Online-Video-Editing and Podcast platforms and more.
Beamr Cloud automatically strikes the balance between quality, bandwidth, compute power and costs – and does that at scale. Here’s how it works:
Simply connect your repository in the AWS platform – and start optimizing and modernizing your video files right away, while enjoying the most advanced technology.
Choose between essential workflows or create specific presets according to your needs.
Optimize videos while upgrading them to the newest standards: HEVC and soon AV1 – the emerging video format backed by tech giants.
Beamr Cloud requires no code, and you can also use our API to further customize your video solutions.
Easy & Safe Codec Modernization with Beamr using Nvidia GPUs
Following a decade where AVC/H.264 was the clear ruler of the video encoding world, the last years have seen many video coding options battling to conquer the video arena. For some insights on the race between modern coding standards you can check out our corresponding blog post.
Today we want to share how easy it can be to upgrade your content to a new and improved codec in a fast, fully automatic process which guarantees the visual quality of the content will not be harmed. This makes the switchover to newer encoders a smooth, easy and low cost process which can help accelerate the adoption of new standards such as HEVC and AV1. When this transformation is done using a combination of Beamr’s technology with the Nvidia NVENC encoder, using their recently released APIs, it becomes a particularly cutting-edge solution, enjoying the benefits of the leading solution in hardware AV1 encoding.
The benefit of switching to more modern codecs lies of course in the higher compression efficiency that they offer. While the extent of improvement is very dependent on the actual content, bitrates and encoders used, HEVC is considered to offer gains of 30%-50% over AVC, meaning that for the same quality you can spend up to 50% fewer bits. For AV1 this increase is generally a bit higher.. As more and more on-device support is added for these newer codecs, the advantage of utilizing them to reduce both storage and bandwidth is clear.
Generally speaking, performing such codec modernization involves some non-trivial steps.
First, you need to get access to the modern encoder you want to use, and know enough about it in order to configure the encoder correctly for your needs. Then you can proceed to encoding using one of the following approaches.
The first approach is to perform bit-rate driven encoding. One possibility is to use conservative bitrates, in which case the potential reduction in size will not be achieved. Another possibility is to set target bitrates that reflect the expected savings, in which case there is a risk of losing quality. For example, In an experimental test of files which were converted from their AVC source to HEVC, we found that on average, a bitrate reduction of 50% could be obtained when using the Beamr CABR codec modernization approach. However, when the same files were all brute-force encoded to HEVC at 50% reduced bitrate, using the same encoder and configuration, the quality took a hit for some of the files.
This example shows the full AVC source frame on top, with the transcodes to HEVC below it. Note the distortion in the blind HEVC encode, shown on the left, compared to the true-to-source video transformed with CABR on the right.
The second approach is to perform the transcode using a quality driven encode, for instance using the constant QP (Quantization Parameter) or CRF (Constant Rate Factor) encoding modes with conservative values, which will in all likelihood preserve the quality. However, in this case you are likely to unnecessarily “blow up” some of your files to much higher bitrates. For example, for the UGC content shown below, transcoding to HEVC using a software encoder and CRF set to 21 almost doubled the file size.
Yet another approach is to use a trial and error encode process for each file or even each scene, manually verifying that a good target encoding setup was selected which minimizes the bitrate while preserving the quality. This is of course an expensive and cumbersome process, and entirely unscalable.
By using Beamr CABR this is all done for you under the hood, in a fully automatic process, which makes optimized choices for each and every frame in your video, selecting the lowest bitrate that will still perfectly preserve the source visual quality. When performed using the Nvidia NVENC SDK with interfaces to Beamr’s CABR technology, this transformation is significantly accelerated and becomes even more cost effective.
The codec modernization flow is demonstrated for AVC to HEVC conversion in the above high-level block diagram. As shown here, the CABR controller interacts with NVENC, Nvidia’s hardware video encoder, using the new APIs Nvidia has created for this purpose. At the heart of the CABR controller lies Beamr’s Quality Measure, BQM, a unique, patented, Emmy award winning perceptual video quality measure. BQM has now been adapted and ported to the Nvidia GPU platform, resulting in significant acceleration of the optimization process .
The Beamr optimization technology can be used not only for codec modernization, but also to reduce bitrate of an input video, or of a target encode, while guaranteeing the perceptual quality is preserved, thus creating encodes with the same perceptual quality at lower bitrates or file sizes. In any and every usage of the Beamr CABR solution, size or bitrate are reduced as much as possible while each frame of the optimized encode is guaranteed to be perceptually identical to the reference. The codec modernization use case is particularly exciting as it puts the ability to migrate to more efficient and sophisticated codecs, previously used primarily by video experts, into the hands of any user with video content.
For more information please contact us at info@beamr.com