Is the Future of Video Processing Destined for GPU?

My Journey Through the Evolution of Video Processing: From Low-Quality Streaming to HD and 4K Becoming a Commodity, and Now the AI-Powered Video Revolution

Digital video has been my primary focus for the past three decades. I have built software codecs, designed ASICs, and now optimize GPU encoders with advanced GPU software.

My journey in video processing has been transformative, starting with low-resolution streaming, advancing into HD and 4K, as they shift from a rare event to an everyday expectation. And now, we stand at the next frontier—AI is redefining how we create, deliver, and experience video like never before.

My journey into this field began with the introduction of QuickTime 1.0 in 1991, when I was in my 20s. It looked to me like magic — a compressed movie playing smoothly on a single-speed CD-ROM (150 KB/s, 1.2 Mbps). At the time, I had no understanding of video encoding, but I was fascinated. At that moment I knew this is the field I wish to dive into.

Apple QuickTime Version 1.0 Demo

Chapter 1: The Challenge of Streaming Video with Low-Resolution, Low-Quality Videos

The early days of streaming, in the mid 90s, were characterized by low-resolution video, low frame rates (12-15 fps), and low bitrates — 28.8 kbps, 33 kbps, or 56 kbps — two to three orders of magnitude 100x – 1000x lower bitrate than today’s standards. This was the reality of digital video in 1996 and the years followed.

By 1996, I was one of 4 co-founders of Emblaze – We developed a vector-based graphic tool called “Emblaze Creator” – think of it as Adobe Flash before Adobe Flash.

We soon realized we needed video support. We started by downloading videos in the background. Obviously, the longer the video was, the more time it took to download, which was frustrating to wait for. So we limited the videos to just 30 seconds.

Early solutions, like RealNetworks and VideoNet, required dedicated video servers — an expensive and complex infrastructure. It seemed to me like a very long and costly journey to streaming enablement. 

Adding video to our offerings quickly was crucial for our company’s survival, so we persistently tackled this challenge. I remember the nights spent experimenting and exploring solutions, but all paths seemed to converge on the RealNetworks approach, which we couldn’t adopt in the short term.

We had to find a way to solve the challenge of streaming video efficiently for very low bandwidth. And while it was hard to stream files, you could slice them. So in 1997, I came up with an idea and worked with my team at Emblaze on the following solution:

  1. Take a video file and divide it into numbered slices. 
  2. Create an index file with the order of the slices, and place it on a standard HTTP server.
  3. The player will read that index file and pull the segments from a web server in the order as in the index file.

Just to make it more real, here is the patent we submitted in 1998, and granted in 2002:

But that was not enough, why not create time synchronized slices, so the player will be able to pull the optimal chucks based on the specific bandwidth characteristics when playing the files?

The player will read the index file from the server and choose a level to read, decide on a slice, and based on the bitrate move up and down the bitrate ladder.

If that reminds you of HLS – then it was HLS many years before HLS was out.

We demonstrated this live with EarthLink at the Easter Egg Roll at the White House in 1998. Our systems were made of H.263 and then H.264 encoders, and a patented streaming protocol. We had a track with 10 Compaq workstations running 8 cameras that day.

When you build a streaming solution, you need a player. Without it, all that effort is meaningless. At Emblaze, we had a Java-based player that required no installation—a major advantage at the time.

Back then, mobile video was in its infancy, and we saw an opportunity. Feature phones simply couldn’t play video, but the Nokia Communicator 9110 could. It had everything—a grayscale screen, a 33MHz 32-bit CPU, and wireless data access—a powerhouse by late ‘90s standards.

In 1999, I demonstrated a software video decoder running on the Nokia 9110 to Samsung Mobile CEO. This was a game-changer—it proved that video streaming on mobile devices was possible. Samsung, being a leader in CDMA 2000, wanted to showcase this capability at the 2000 Olympics and needed working prototypes.

Samsung challenged us to build a mobile ASIC capable of decoding streaming video on just 100mW of power. We delivered. The solution was announced at the Olympics, and by 2001, it was in mass production.

This phone featured The Emblaze Multimedia Application Co-Processor, working alongside the baseband chip to enable seamless video playback over CDMA 2000 networks—a groundbreaking achievement at the time.

Chapter 2: HD Becomes the Standard, 4K HDR Becomes Common 

HD television was introduced in the U.S. during the second half of the 90s, but it wasn’t until 2003 that satellite and cable providers really started broadcasting in HD.

I still remember 2003, staying at the Mandarin Oriental Hotel in NYC, where I had a 30-inch LCD screen with HD broadcasting. Standing close to the screen, taking in the crisp detail, was an eye-opening moment—the clarity, the colors, the sharpness. It was a huge leap forward from standard definition, and definitely better than DVDs.

But even then, it felt like just the beginning. HD was here, but it wasn’t everywhere yet. It took a few more years for Netflix to introduce streaming.

Beamr is Born

In early 2008, the startup I led, which focused on online backup, was acquired. By the end of the year, I found myself out of work. And so, I sent an email to Steve Jobs, pointing out that Time Machine’s performance was lacking, and that I believed I could help fix it. That email led to a meeting in Cupertino with the head of MobileMe—what we now know as iCloud.

That visit to Apple in early 2009 was fascinating. I learned that storing iPhone photos was becoming an enormous challenge. The sheer volume of images was straining Apple’s data centers, and they were running into power limitations just to keep up with demand.

With this realization, Beamr was born!

The question that intrigued us was: Can we make images smaller, while making sure they look exactly the same? 

After about one year of research, we ended up founding Beamr instead of becoming a part of MobileMe. And the leader of this – the brains behind our technology that is here today. 

During the first year of Beamr, we explored this idea. And we came out with our first product called JPEGmini, which does exactly that. This was achieved through the amazing innovation of our wonderful CTO, Tamar Shoham.

JPEGmini is a wonderful tool, and hundreds of thousands of content creators around the world use it. 

After optimizing photos, we wanted to take on video compression. That’s when we developed our gravity defier—CABR, Content Adaptive BitRate technology. This quality-driven process can cut every high-quality video by 30% to 50% while preserving every frame’s visual integrity.

But our innovation comes with challenges:

  1. Lightning-fast encoding without CABR, but with CABR it is slower and can’t run live at 4Kp60.
  2. Running CABR is more expensive than non-CABR encoding.

In the year 2018, we came to the conclusion that we needed a hardware acceleration solution – to improve our density, our speed and the cost of processing. 

We started by integrating with Intel GPUs, and it worked very well. We even demoed it at Intel Experience Day in 2019.

We had wonderful relationships with Intel and they had a good video encoding engine. We invested about two years of effort, and it did not materialize as an Intel GPU for the Data Center didn’t happen – a wasted opportunity.

Then, we thought of developing our own chip:

  • Its power will be a function of CPU or GPU
  • We will be able to put four 8Kp60 CABR chips on a single PCI card (for AC/HEVC and AV1). 
  • It will cost less than a GPU and have 3X density. 

Here’s a slide that shows that we were serious. We also started a discussion about raising funds to build that chip using 12nm technology.

But then, we looked at our plan and wondered: does this chip support the needs of the future? 

  • How would you innovate on this platform? 
  • What if you would like to run smarter algorithms or a new version of CABR?
  • Our design included programmable parts for customization. We even thought of adding GPU cores – but who is going to develop for it? 

This was a key moment in 2020, when we understood that innovation is so fast that every silicon generation takes at least two years to build and that is too slow.

There is a scale that VPU solutions will be more efficient than GPU, but that cannot compete with the current pace of change. It may come that even the biggest social networks will abandon VPUs due to the need for AI and video to work together.

Chapter 3: GPUs and the Future of Video Processing

By 2021, NVIDIA invited us to bring CABR to GPUs. This was a three-year journey, requiring a complete rewrite of our technology for NVENC. NVIDIA fully supported us, integrating CABR into all encoding modes across AVC, HEVC, and AV1.

In May 2023, the first driver was out: NVENC SDK 12.1!

At the same time, Beamr went public on NASDAQ (under the ticker BMR), on the premise of a high-quality large-scale video encoding platform enabled on NVIDIA GPUs.

Since September 2024. Beamr CABR is running LIVE video optimization on NVIDIA GPUs at 4Kp60 across 3 codecs AVC, HEVC and AV1. It is 10X faster at 1/10 of the cost for AVC, and the ratio for HEVC is double – and you can double that again for AV1.

All of our challenges for bringing CABR to the masses are solved.

But the story doesn’t end here.

What we didn’t fully anticipate was how AI-driven innovation is transforming the way we interact with video, and the opportunities are even greater than we imagined, thanks to the shift to GPUs.

Let me give you a couple of examples:

In the last Olympics, I was watching windsurfing, and on-screen, I saw a real-time overlay showing the planned routes of each surfer, the wind speed and forward tactics, and the predictions on how they would converge at the finish line.

It was seamless, intuitive, and AI-driven—a perfect example of how AI enriches the viewing experience.

Or think about social media: AI plays a huge role in processing video behind the scenes. As videos are uploaded, VPUs (Video Processing Units) handle encoding, while AI algorithms simultaneously analyze content—deciding whether it’s appropriate, identifying trends, and determining who should see it.

But the processes used by many businesses are slow and inefficient. For every AI-powered video workflow, you need:

  1. Load the video.
  2. Decode it.
  3. Process it (either for AI analysis or encoding).
  4. Sync and converge the process.

Traditionally, these steps happened separately, often with significant latency.

But on a GPU?

  • Single load, single decode, shared memory buffer.
  • AI and video processing run in parallel.
  • Everything is synced and optimized.

And just like that—you’re done. It’s faster, more efficient, and more cost-effective. This is the winning architecture for the future of AI and video at scale.

Want to learn more? Start a free trial with Beamr Cloud or Talk to us!

Beamr CABR Poised to Boost Vision AI

By reducing video size but not perceptual quality, Beamr’s Content Adaptive Bit Rate optimized encoding can make video used for vision AI easier to handle thus reducing workflow complexity


Written by: Tamar Shoham, Timofei Gladyshev


Motivation 

Machine learning (ML) for video processing is a field which is expanding at a fast pace and presents significant untapped potential. Video is an incredibly rich sensor and has large storage and bandwidth requirements, making vision AI a high-value problem to solve and incredibly suited for AI and ML.

Beamr’s Content Adaptive Bit rate solution (CABR) is a solution that can significantly decrease video file size without changing the video resolution, compression or file format or compromising perceptual quality. It therefore interested us to examine how the Beamr CABR solution can be used to assist in cutting down the sizes of video used in the context of ML.

In this case study, we focus on the relatively simple task of people detection in video. We made use of the NVIDIA DeepStream SDK, a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio and image understanding. Using this SDK is a natural choice for Beamr as an NVIDIA Metropolis partner.

In the following we describe the test setup, the data set used, test performed and obtained results. Then we will present some conclusions and directions for future work.

Test Setup

In this case study, we limited ourselves to comparing detection results on source and reduced-size files by using pre-trained models, making it possible to use unlabeled data. 

We collected a set of 19 User-Generated Content, or UGC, video clips, captured on a few different iPhone models. To these we added some clips downloaded from the Pexels free stock videos website. All test clips are in the mp4 or v file format, containing AVC/H.264 encoded video, with resolutions ranging from 480p to full HD and 4K and durations ranging from 10 seconds to 1 minute. Further details on the test files can be found in Annex A.

These 14 source files were then optimized using Beamr’s storage optimization solution to obtain files that were reduced in size by 9 – 73%, with an average reduction of 40%. As mentioned above, this optimization results in output files which retain the same coding and file formats and the same resolution and perceptual quality. The goal of this case study is to show that these reduced-size, optimized files also provide aligned ML results. 

For this test, we used the NVIDIA DeepStream SDK [5] with the PeopleNet-ResNet34 detector. Once again, we calculated the mAP among the detections on the pairs of source and optimized files for an IoU threshold of 0.5.  

Results

We found that for files with predictions that align with actual people, the mAP is very high, showing that true detection results are indeed unaffected by replacing the source file with the smaller, easier-to-transfer, optimized file. 

An example showing how well they align is provided in Figure 1. This test clip resulted in a mAP[0.5] value of 0.98.

Figure 1
Figure 2
Figure 1: Detections for pexels_video_2160p_2 frame 305 using PeopleNet-ResNet34, source on top, with optimized (54% smaller) below

As the PeopleNet-ResNet34 model was developed specifically for people detection, it has quite stable results, and overall showed high mAP values with a median mAP value of 0.94. 

When testing some other models we did notice that in cases where the detections were unstable, the source and optimized files sometimes created different false positives. It is important to note that because we did not have labeled data, or a ground truth, when such detection errors occur out of sync, they have a double impact on the mAP value calculated between the detections on the source and the detections on the optimized file. This results in poorer results than the mAP values expected when calculating for detections vs. the labeled data.  

We also noticed cases where there is a detection flicker, with the person being detected only in some of the frames where they appear. This flicker is not always synchronized between the source and optimized clips, resulting once again in an ‘accumulated’ or double error in the mAP calculated among them. An example of this is shown in Figure, for a clip with a mAP[0,5] value of 0.92.

Figure 2a: Detections for frames 1170 from the clip pexels_musicians.mp4 using PeopleNet-ResNet34, source on the left and optimized (44% smaller) on the right. Note detection on the left of the stairs, present only in the source file.
Figure 2b: same for frame 1171, with no detection in either
Figure 2c: frame 1172, detected in both
Figure 2d: frame 1173, detected only in the optimized file

Summary

The experiments described above show that CABR can be applied to videos that undergo ML tasks such as object detection. We showed that when detections are stable, almost identical results will be obtained for the source and optimized clips. The advantages of reducing storage size and transmission bandwidth by using the optimized files make this possibility particularly attractive. 

Another possible use for CABR in the context of ML stems from the finding that for unstable detection results, CABR may have some impact on false positives or mis-detects. In this context, it would be interesting to view it as a possible permutation on labeled data to increase training set size. In future work, we will investigate the further potential benefits obtained when CABR is incorporated at the training stage and expand the experiments to include more model types and ML tasks. 

This research is all part of our ongoing quest to accelerate adoption and increase the accessibility of video ML/DL and video analysis solutions.

Annex A – test files

Below are details on the test files used in the above experiments. All the files, and detection results are available here 

#FilenameSourceBitrateDims WxHFPSDuration [sec]saved by CABR 
1IMG_0226iPhone 3GS3.61 M640×4801536.3335%
2IMG_0236iPhone 3GS3.56 M640×4803050.4520%
3IMG_0749iPhone 517.0 M1920×108029.911.2334%
4IMG_3316iPhone 4S21.9 M1920×108029.99.3526%
5IMG_5288iPhone SE14.9 M1920×108029.943.4029%
6IMG_5713iPhone 5c16.3 M1080×192029.948.8873%
7IMG_7314iPhone 715.5 M1920×108029.954.5350%
8IMG_7324iPhone 715.8 M1920×108029.916.4339%
9IMG_7369iPhone 617.9 M1080×192029.910.2330%
10pexels_musicianspexels10.7 M 1920×10802460.044%
11pexels_video_1080ppexels4.4 M1920×10802512.5663%
12pexels_video_2160ppexels12.2 M3840×21602515.249%
13pexels_video_2160p_2pexels15.2 M3840×21602515.8454%
14pexels_video_of_people_walking_1080ppexels3.5 M1920×108023.919.1958%

Table A1: Test files used, with the per file savings

Beamr teams with NVIDIA to accelerate Beamr technology on NVIDIA GPUs

2023 is a very exciting year for Beamr. In February Beamr became a public company on NASDAQ:BMR on the premise of making our video optimization technology globally available as a SaaS. This month we are already announcing a second milestone for 2023: Release of the Nvidia driver that enables running our technology on the Nvidia platform. This is a result of a 2 year joint project, where Beamr engineers worked alongside the amazing engineering team at Nvidia to ensure that the Beamr solution can be integrated with all Nvidia codecs – AVC, HEVC and AV1. 

The new NVENC driver, just now made public, provides an API that allows external control over NVENC, enabling Nvidia partners such as Beamr to tightly integrate with the NVENC H/W encoders for AVC, HEVC and AV1. Beamr is excited to have been a design partner for development of this API and to be the first company that uses it, to accelerate and reduce costs of video optimization. 

This milestone with Nvidia offers some important benefits. A significant cost reduction is achieved when performing Beamr video optimization using this platform. For example, for 4Kp60 encoded with advanced codecs, when using the Beamr video optimization on GPU the costs of video optimization can be cut by a factor of x10, compared to running on CPU. 

Using the Beamr solution integrated on GPU means that the encoding can be performed using the built in H/W codecs, which offer very fast, high frame rate, encoding. This means the combined solution can support live and real time video encoding which is a new use case for the Beamr video optimization technology.

In addition, Nvidia recently announced their AV1 codec, considered to be the highest quality AV1 HW accelerated encoder. In this comparison Jarred Walton concluded that “”From an overall quality and performance perspective, Nvidia’s latest Ada Lovelace NVENC hardware comes out as the winner with AV1 as the codec of choice”. When using the new driver to combine the Beamr video optimization with this excellent AV1 implementation, a very competitive solution is obtained, with video encoding abilities exceeding other AV1 encoders on the market.

So, how does the new driver actually allow the integration of NVENC codecs with Beamr video optimization technology? 

Above you can see a high level illustration of the system flow. The user video is ingested, and for each video frame the encoding parameters are controlled by the Beamr Quality Control block instructing NVENC on how to encode the frame, to reach the target quality while minimizing bit consumption. The New NVENC API layer is what enables the interactions between the Beamr Quality Control and the encoder to create the reduced bitrate, target optimized video. As part of the efforts towards the integrated solution, Beamr also ported its quality measurement IP to GPU and redesigned it to match the performance of NVENC, thus placing the entire solution on the GPU.

Beamr uses the new API to control the encoder and perform optimization which can reduce bitrate of an input video, or of a target encode, while guaranteeing the perceptual quality is preserved, thus creating encodes with the same perceptual quality at lower bitrates or file sizes.

The Beamr optimization may also be used for automatic, quality guaranteed codec modernization, where input content can be converted to a modern codec such as AV1, while guaranteeing each frame of the optimized encode is perceptually identical to the source video. This allows for faster migration to modern codecs, for example from AVC to HEVC or AVC to AV1, in an automated, always safe process – with no loss of quality.

In the below examples the risk of blind codec modernization is clearly visible, showcasing the advantage of using Beamr technology for this task. In these examples, we took AVC sources and encoded them to HEVC, to benefit from the increased compression efficiency offered by the more advanced coding standard. On the test set we used, Beamr reduced the source clips by 50% when converting to perceptually identical HEVC streams. We compare these encodes to the results obtained when performing ‘brute force’ compression to HEVC, using 50% lower bitrates. As is clear in these examples, using the blind conversion, shown on the left, can introduce disturbing artifacts compared to the source, shown in the middle. The Beamr encodes however, shown on the right, preserve the quality perfectly.

This driver release and the technology enablement it offers, while a significant milestone, is just the beginning. Beamr is now building a new SaaS that will allow a scalable, no code, implementation of its technology for reducing storage and networking costs. This service is planned to be publicly available in Q3 of 2023. In addition Beamr is looking for design partners that will get early access to its service and help us build the best experiences for our customers. 

At the same time Beamr will continue to strengthen relationships with existing users by offering them low level API’s for enhanced controls and specific workflow adaptations. 

For more information please contact us at info@beamr.com