2023 is a very exciting year for Beamr. In February Beamr became a public company on NASDAQ:BMR on the premise of making our video optimization technology globally available as a SaaS. This month we are already announcing a second milestone for 2023: Release of the Nvidia driver that enables running our technology on the Nvidia platform. This is a result of a 2 year joint project, where Beamr engineers worked alongside the amazing engineering team at Nvidia to ensure that the Beamr solution can be integrated with all Nvidia codecs – AVC, HEVC and AV1.
The new NVENC driver, just now made public, provides an API that allows external control over NVENC, enabling Nvidia partners such as Beamr to tightly integrate with the NVENC H/W encoders for AVC, HEVC and AV1. Beamr is excited to have been a design partner for development of this API and to be the first company that uses it, to accelerate and reduce costs of video optimization.
This milestone with Nvidia offers some important benefits. A significant cost reduction is achieved when performing Beamr video optimization using this platform. For example, for 4Kp60 encoded with advanced codecs, when using the Beamr video optimization on GPU the costs of video optimization can be cut by a factor of x10, compared to running on CPU.
Using the Beamr solution integrated on GPU means that the encoding can be performed using the built in H/W codecs, which offer very fast, high frame rate, encoding. This means the combined solution can support live and real time video encoding which is a new use case for the Beamr video optimization technology.
In addition, Nvidia recently announced their AV1 codec, considered to be the highest quality AV1 HW accelerated encoder. In this comparison Jarred Walton concluded that “”From an overall quality and performance perspective, Nvidia’s latest Ada Lovelace NVENC hardware comes out as the winner with AV1 as the codec of choice”. When using the new driver to combine the Beamr video optimization with this excellent AV1 implementation, a very competitive solution is obtained, with video encoding abilities exceeding other AV1 encoders on the market.
So, how does the new driver actually allow the integration of NVENC codecs with Beamr video optimization technology?
Above you can see a high level illustration of the system flow. The user video is ingested, and for each video frame the encoding parameters are controlled by the Beamr Quality Control block instructing NVENC on how to encode the frame, to reach the target quality while minimizing bit consumption. The New NVENC API layer is what enables the interactions between the Beamr Quality Control and the encoder to create the reduced bitrate, target optimized video. As part of the efforts towards the integrated solution, Beamr also ported its quality measurement IP to GPU and redesigned it to match the performance of NVENC, thus placing the entire solution on the GPU.
Beamr uses the new API to control the encoder and perform optimization which can reduce bitrate of an input video, or of a target encode, while guaranteeing the perceptual quality is preserved, thus creating encodes with the same perceptual quality at lower bitrates or file sizes.
The Beamr optimization may also be used for automatic, quality guaranteed codec modernization, where input content can be converted to a modern codec such as AV1, while guaranteeing each frame of the optimized encode is perceptually identical to the source video. This allows for faster migration to modern codecs, for example from AVC to HEVC or AVC to AV1, in an automated, always safe process – with no loss of quality.
In the below examples the risk of blind codec modernization is clearly visible, showcasing the advantage of using Beamr technology for this task. In these examples, we took AVC sources and encoded them to HEVC, to benefit from the increased compression efficiency offered by the more advanced coding standard. On the test set we used, Beamr reduced the source clips by 50% when converting to perceptually identical HEVC streams. We compare these encodes to the results obtained when performing ‘brute force’ compression to HEVC, using 50% lower bitrates. As is clear in these examples, using the blind conversion, shown on the left, can introduce disturbing artifacts compared to the source, shown in the middle. The Beamr encodes however, shown on the right, preserve the quality perfectly.
This driver release and the technology enablement it offers, while a significant milestone, is just the beginning. Beamr is now building a new SaaS that will allow a scalable, no code, implementation of its technology for reducing storage and networking costs. This service is planned to be publicly available in Q3 of 2023. In addition Beamr is looking for design partners that will get early access to its service and help us build the best experiences for our customers.
At the same time Beamr will continue to strengthen relationships with existing users by offering them low level API’s for enhanced controls and specific workflow adaptations.
For more information please contact us at info@beamr.com
A few weeks ago Beamr reached a historic milestone, which got everyone in the company excited. It was triggered by a rather formal announcement from the US Patent Office, in their typical “dry” language: “THE APPLICATION IDENTIFIED ABOVE HAS BEEN EXAMINED AND IS ALLOWED FOR ISSUANCE AS A PATENT”. We’ve received such announcements many times before, from the USPTO and from other national patent offices, but this one was special: It meant that the Beamr patent portfolio has now grown to 50 granted patents!
We have always believed that a strong IP portfolio is extremely important for an innovative technology company, and invested a lot of human and capital resources over the years to build it. So we thought that this anniversary would be a good opportunity to reflect back on our IP journey, and share some lessons we learned along the way, which might come in handy to others who are pursuing similar paths.
Starting With Image Optimization
Beamr was established in 2009, and the first technology we developed was for optimizing images – reducing their file size while retaining their subjective quality. In order to verify that subjective quality is preserved, we needed a way to accurately measure it, and since existing quality metrics at the time were not reliable enough (e.g. PSNR, SSIM), we developed our own quality metric, which was specifically tuned to detect the artifacts of block-based compression.
Our first patent applications covered the components of the quality measure itself, and its usage in a system for “recompressing” images or video frames. The system takes a source image or a video frame, compresses it at various compression levels, and then compares the compressed versions to the source. Finally, it selects the compressed version that is smallest in file size, but still retains the full quality of the source, as measured by our quality metric.
After these initial patent applications which covered the basic method we were using for optimization, we submitted a few more patent applications which covered additional aspects of the optimization process. For example, we found that sometimes when you increase the compression level, the quality of the image increases, and vice versa. This is counter-intuitive, since typically increasing the compression reduces image quality, but it does happen in certain situations. It means that the relationship between quality and compression is not “monotonic”, which makes finding the optimal compression level quite challenging. So we devised a method to solve this issue of non-monotony, and filed a separate patent application for it.
Another issue we wanted to address was the fact that some images could not be optimized – every compression level we tried would result in quality reduction, and eventually we just copied the source image to the output. In order to save CPU cycles, we wanted to refrain from even trying to optimize such images. Therefore, we developed an algorithm which determines whether the source image is “highly compressed” (meaning that it can’t be optimized without compromising quality), based on analyzing the source image itself. And of course – we submitted a patent application on this algorithm as well.
As we continued to develop the technology, we found that some images required special treatment due to specific content or characteristics of the images. So we filed additional patent applications on algorithms we developed for configuring our quality metric for specific types of images, such as synthetic (computer-generated) images and images with vivid colors (chroma-rich).
Extending to Video Optimization
Optimizing images turned out to be very valuable for improving the workflow of professional photographers, reducing page load time for web services, and improving the UX for mobile photo apps. But with video reaching 80% of total Internet bandwidth, it was clear that we needed to extend our technology to support optimizing full video streams. As our technology evolved, so did our patent portfolio: We filed patent applications on the full system of taking a source video, decoding it, encoding each frame with several candidate compression levels, selecting the optimal compression level for that frame, and moving on to the next frame. We also filed patent applications on extending the quality measure with additional components that were designed specifically for video: For example, a temporal component that measures the difference in the “temporal flow” of two successive frames using different compression levels. Special handling of real or simulated “film grain”, which is widely used in today’s movie and TV productions, was the subject of another patent application.
When integrating our quality measure and control mechanism (which sets the candidate compression levels) with various video encoders, we came to the conclusion that we needed a way to save and reload a “state” of the encoder without modifying the encoder internals, and of course – patented this method as well. Additional patents were filed on a method to optimize video streams on the basis of a GOP (Group of Pictures) rather than a frame, and on a system that improves performance by determining the optimal compression level based on sampled segments instead of optimizing the whole stream.
Embracing Video Encoding
In 2016 Beamr acquired Vanguard Video, the leading provider of software H.264 and HEVC encoders. We integrated our optimization technology into Vanguard Video’s encoders, creating a system that optimized video while encoding it. We call this CABR, and obviously we filed a patent on the integrated system. For more information about CABR, see our blog post “A Deep Dive into CABR”.
With the acquisition of Vanguard, we didn’t just get access to the world’s best SW encoders. We also gained a portfolio of video encoding patents developed by Vanguard Video, which we continued to extend in the years since the acquisition. These patents cover unique algorithms for intra prediction, motion estimation, complexity analysis, fading and scene change analysis, adaptive pre-processing, rate control, transform and block type decisions, film grain estimation and artifact elimination.
In addition to encoding and optimization, we’ve also filed patents on technologies developed for specific products. For example, some of our customers wanted to use our image optimization technology while creating lower-resolution preview images, so we patented a method for fast and high-quality resizing of an image. Another patent application was filed on an efficient method of generating a transport stream, which was used in our Beamr Optimizer and Beamr Transcoder products.
The chart below shows the split of our 50 patents by the type of technology.
Patent Strategy – Whether and Where to File
Our patent portfolio was built to protect our inventions and novel developments, while at the same time establish the validity of our technology. It’s common knowledge that filing for a patent is a time and money consuming endeavor. Therefore, prior to filing each patent application we ask ourselves: Is this a novel solution to an interesting problem? Is it important to us to protect it? Is it sufficiently tangible (and explainable) to be patentable? Only when the answer to all these questions is a resounding yes, we proceed to file a corresponding patent application.
Geographically speaking, you need to consider where you plan to market your products, because that’s where you want your inventions protected. We have always been quite heavily focused on the US market, making that a natural jurisdiction for us. Thus, all our applications were submitted to the US Patent Office (USPTO). In addition, all applications that were invented in Beamr’s Israeli R&D center were also submitted to the Israeli Patent Office (ILPTO). Early on, we also submitted some of the applications in Europe and Japan, as we expanded our sales activities to these markets. However, our experience showed that the additional translation costs (not only of the patent application itself, but also of documents cited by an Office Action to which we needed to respond), as well as the need to pay EU patent fees in each selected country, made this choice less cost effective. Therefore, in recent years we have focused our filings mainly on the US and Israel.
The chart below shows the split of our 50 patents by the country in which they were issued.
Patent Process – How to File
The process which starts with an idea, or even an implemented system based on that idea, and ends in a granted patent – is definitely not a short or easy one.
Many patents start their lifecycle as Provisional Applications. This type of application has several benefits: It doesn’t require writing formal patent claims or an Information Disclosure Statement (IDS), it has a lower filing fee than a regular application, and it establishes a priority date for subsequent patent filings. The next step can be a PCT, which acts as a joint base for submission in various jurisdictions. Then the search report and IDS are performed, followed by filing national applications in the selected jurisdictions. Most of our initial patent applications went through the full process described above, but in some cases, particularly when time was of the essence, we skipped the provisional or PCT steps, and directly filed national applications.
For a national application, the invention needs to be distilled into a set of claims, making sure that they are broad enough to be effective, while constrained enough to be allowable, and that they follow the regulations of the specific jurisdiction regarding dependencies, language etc. This is a delicate process, and at this stage it is important to have a highly experienced patent attorney that knows the ins and outs of filing in different countries. For the past 12 years, since filing our first provisional patent, we were very fortunate to work with several excellent patent attorneys at the Reinhold Cohen Group, one of the leading IP firms in Israel, and we would like to take this opportunity to thank them for accompanying us through our IP journey.
After finalizing the patent claims, text and drawings, and filing the national application, what you need most is – patience… According to the USPTO, the average time between filing a non-provisional patent application and receiving the first response from the USPTO is around 15-16 months, and the total time until final disposition (grant or abandonment) is around 27 months. Add this time to the provisional and PCT process, and you are looking at several years between filing the initial provisional application and receiving the final grant notice. In some cases it’s possible to speed up the process by using the option of a modified examination in one jurisdiction, after the application gained allowance in another jurisdiction.
The chart below shows the number of granted patents Beamr has received in each passing year.
Sometimes, the invention, description and claims are straightforward enough that the examiner is convinced and simply allows the application as filed. However, this is quite a rare occurrence. Usually there is a process of Office Actions – where the examiner sends a written opinion, quoting prior art s/he believes is relevant to the invention and possibly rejecting some or even all the claims based on this prior art. We review the Office Action and decide on the next step: In some cases a simple clarification is required in order to make the novelty of our invention stand out. In others we find that adding some limitation to the claims makes it distinctive over the prior art. We then submit a response to the examiner, which may result either in acceptance or in another Office Action. Occasionally we choose to perform an interview with the examiner to better understand the objections, and discuss modifications that can bring the claims into allowance.
Finally, after what is sometimes a smooth, and sometimes a slightly bumpy route, hopefully a Notice Of Allowance is received. This means that once filing fees are paid – we have another granted patent! In some cases, at this point we decide to proceed with a divisional application, a continuation or continuation in part – which means that we claim additional aspects of the described invention in a follow up application, and then the patent cycle starts once again…
Summary
Receiving our 50th patent was a great opportunity to reflect back on the company’s IP journey over the past 12 years. It was a long and winding road, which will hopefully continue far into the future, with more patent applications, office actions and new grants to come.
Speaking of new grants – as this blog post went to press, we were informed that our 51st patent was granted! This patent covers “Auto-VISTA”, a method of “crowdsourcing” subjective user opinions on video quality, and aggregating the results to obtain meaningful metrics. You can learn more about Auto-VISTA in Episode 34 of The Video Insiders podcast.
The attention of Internet users, especially the younger generation, is shifting from professionally-produced entertainment content to user-generated videos and live streams on YouTube, Facebook, Instagram and most recently TikTok. On YouTube, creators upload 500 hours of video every minute, and users watch 1 billion hours of video every day. Storing and delivering this vast amount of content creates significant challenges to operators of user-generated content services. Beamr’s CABR (Content Adaptive BitRate) technology reduces video bitrates by up to 50% compared to regular encodes, while preserving perceptual quality and creating fully-compliant standard video streams that don’t require any proprietary decoder on the playback side. CABR technology can be applied to any existing or future block-based video codec, including AVC, HEVC, VP9, AV1, EVC and VVC.
In this blog post we present the results of a UGC encoding test, where we selected a sample database of videos from YouTube’s UGC dataset, and encoded them both with regular encoding and with CABR technology applied. We compare the bitrates, subjective and objective quality of the encoded streams, and demonstrate the benefits of applying CABR-based encoding to user-generated content.
Beamr CABR Technology
At the heart of Beamr’s CABR (Content-Adaptive BitRate) technology is a patented perceptual quality measure, developed during 10 years of intensive research, which features very high correlation with human (subjective) quality assessment. This correlation has been proven in user testing according to the strict requirements of the ITU BT.500 standard for image quality testing. For more information on Beamr’s quality measure, see our quality measure blog post.
When encoding a frame, Beamr’s encoder first applies a regular rate control mechanism to determine the compression level, which results in an initial encoded frame. Then, the Beamr encoder creates additional candidate encoded frames, each one with a different level of compression, and compares each candidate to the initial encoded frame using the Beamr perceptual quality measure. The candidate frame which has the lowest bitrate, but still meets the quality criteria of being perceptually identical to the initial frame, is selected and written to the output stream.
This process repeats for each video frame, thus ensuring that each frame is encoded to the lowest bitrate, while fully retaining the subjective quality of the target encode. Beamr’s CABR technology results in video streams that are up to 50% lower in bitrate than regular encodes, while retaining the same quality as the full bitrate encodes. The amount of CPU cycles required to produce the CABR encodes is only 20% higher than regular encodes, and the resulting streams are identical to regular encodes in every way except their lower bitrate. CABR technology can also be implemented in silicon for high-volume video encoding use cases such as UGC video clips, live surveillance cameras etc.
For more information about Beamr’s CABR technology, see our CABR Deep Dive blog post.
CABR for UGC
Beamr’s CABR technology is especially suited for User-Generated Content (UGC), due to the high diversity and variability of such content. UGC content is captured on different types of devices, ranging from low-end cellular phones to high-end professional cameras and editing software. The content itself varies from “talking head” selfie videos, to instructional videos shot in a home or classroom, to sporting events and even rock band performances with extreme lighting effects.
Encoding UGC content with a fixed bitrate means that such a bitrate might be too low for “difficult” content, resulting in degraded quality, while it may be too high for “easy” content, resulting in wasted bandwidth. Therefore, content-adaptive encoding is required to ensure that the optimal bitrate is applied to each UGC video clip.
Some UGC services use the Constant Rate Factor (CRF) rate control mode of the open-source x264 video encoder for processing UGC content, in order to ensure a constant quality level while varying the actual bitrate according to the content. However, CRF bases its compression level decisions on heuristics of the input stream, and not on a true perceptual quality measure that compares candidate encodes of a frame. Therefore, even CRF encodes waste bits that are unnecessary for a good viewing experience. Beamr’s CABR technology, which is content-adaptive at the frame level, is perfectly suited to remove these remaining redundancies, and create encodes that are smaller than CRF-based encodes but have the same perceptual quality.
Evaluation Methodology
To evaluate the results of Beamr’s CABR algorithm on UGC content, we used samples from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license, which was created to assist in video compression and quality assessment research. The dataset includes around 1500 source video clips (raw video), with a duration of 20 seconds each. The resolution of the clips ranges from 360p to 4K, and they are divided into 15 different categories such as animation, gaming, how-to, music videos, news, sports, etc.
To create the database used for our evaluation, we randomly selected one clip in each resolution from each category, resulting in a total of 67 different clips (note that not all categories in the YouTube UGC set have clips in all resolutions). The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post. As typical user-generated videos, many of the videos suffer from perceptual quality issues in the source, such as blockiness, banding, blurriness, noise, jerky camera movements, etc. which makes them specifically difficult to encode using standard video compression techniques.
We encoded the selected video clips using Beamr 4x, Beamr’s H.264 software encoder library, version 5.4. The videos were encoded using speed 3, which is typically used to encode VoD files in high quality. Two rate control modes were used for encoding: The first is CSQ mode, which is similar to x264 CRF mode – this mode aims to provide a Constant Subjective Quality level, and varies the encoded bitrate based on the content to reach that quality level. The second is CSQ-CABR mode, which creates an initial (reference) encode in CSQ mode, and then applies Beamr’s CABR technology to create a reduced-bitrate encode which has the same perceptual quality as the target CSQ encode. In both cases, we used a range of six CSQ values equally spaced from 16 to 31, representing a wide range of subjective video qualities.
After we completed the encodes in both rate control modes, we compared three attributes of the CSQ encodes to the CSQ-CABR encodes:
File Size – to determine the amount of bitrate savings achievable by the CABR-CSQ rate control mode
BD-Rate – to determine how the two rate control modes compare in terms of the objective quality measures PSNR, SSIM and VMAF, computed between each encode and the source (uncompressed) video
Subjective quality – to determine whether the CSQ encode and the CABR-CSQ encode are perceptually identical to each other when viewed side by side in motion.
Results
The table below shows the bitrate savings of CABR-CSQ vs. CSQ for various values of the CSQ parameter. As expected, the savings are higher for low CSQ values, which correlate with higher subjective quality and higher bitrates. As the CSQ increases, quality decreases, bitrate decreases, and the savings of the CABR-CSQ algorithm are decreased as well.
Table 1: Savings by CSQ value
The overall average savings across all clips and all CSQ values is close to 26%. If we average the savings only for the high CSQ values (16-22), which correspond to high quality levels, the average savings are close to 32%. Obviously, saving one quarter or one third of the storage cost, and moreover the CDN delivery cost, can be very significant for UGC service providers.
Another interesting analysis would be to look at how the savings are distributed across specific UGC genres. Table 2 shows the average savings for each of the 15 content categories available on the YouTube UGC Dataset.
Table 2: Savings by Genre
As we can see, simple content such as lyric videos and “how to” videos (where the camera is typically fixed) get relatively higher savings, while more complex content such as gaming (which has a lot of detail) and live music (with many lights, flashes and motion) get lower savings. However, it should be noted that due to the relatively low number of selected clips from each genre (one in each resolution, for a total of 2-5 clips per genre), we cannot draw any firm conclusions from the above table regarding the expected savings for each genre.
Next, we compared the objective quality metrics PSNR, SSIM and VMAF for the CSQ encodes and the CABR-CSQ encodes, by creating a BD-Rate graph for each clip. To create the graph, we computed each metric between the encodes at each CSQ value and the source files, resulting in 6 points for CSQ and 6 points for CABR-CSQ (corresponding to the 6 CSQ values used in both encodes). Below is an example of the VMAF BD-Rate graph comparing CSQ with CABR-CSQ for one of clips in the lyric video category.
Figure 1: CSQ vs. CSQ-CABR VMAF scores for the 1920×1080 LyricVIdeo file
As we can see, the BD-Rate curve of the CABR-CSQ graph follows the CSQ curve, but each CSQ point on the original graph is moved down and to the left. If we compare, for example, the CSQ 19 point to the CABR-CSQ 19 point, we find that CSQ 19 has a bitrate of around 8 Mbps and a VMAF score of 95, while the CABR-CSQ 19 point has a bitrate of around 4 Mbps, and a VMAF score of 91. However, when both of these files are played side-by-side, we can see that they are perceptually identical to each other (see screenshot from the Beamr View side by side player below). Therefore, the CABR-CSQ 19 encode can be used as a lower-bitrate proxy for the CSQ 19 encode.
Figure 2: Side-by-side comparison in Beamr View of CSQ 19 vs. CSQ-CABR 19 encode for the 1920×1080 LyricVIdeo file
Finally, to verify that the CSQ and CABR-CSQ encodes are indeed perceptually identical, we performed subjective quality testing using the Beamr VISTA application. Beamr VISTA enables visually comparing pairs of video sequences played synchronously side by side, with a user interface for indicating the relative subjective quality of the two video sequences (for more information on Beamr VISTA, listen to episode 34 of The Video Insiders podcast). The set of target comparison pairs comprised 78 pairs of 10 second segments of Beamr4x CSQ encodes vs. corresponding Beamr4x CABR-CSQ encodes. 30 test rounds were performed, resulting in 464 valid target pair views (e.g. by users who correctly recognized mildly distorted control pairs), or on average 6 views per pair. The results show that on average, close to 50% of the users selected CABR-CSQ as having lower quality, while a similar percentage of users selected CSQ as having lower quality, therefore we can conclude that the two encodes are perceptually identical with a statistical significance exceeding 95%.
Figure 3: Percentage of users who selected CABR-CSQ as having lower quality per file
Conclusions
In this blog post we presented the results of applying Beamr’s Content Adaptive BitRate (CABR) encoding to a random selection of user-generated clips taken from the YouTube UGC Dataset, across a range of quality (CSQ) values. The CABR encodes had 25% lower bitrate on average than regular encodes, and at high quality values, 32% lower bitrate on average. The Rate-Distortion graph is unaffected by applying CABR technology, and the subjective quality of the CABR encodes is the same as the subjective quality of the regular encodes. By shaving off a quarter of the video bitrate, significant storage and delivery cost savings can be achieved, and the strain on today’s bandwidth-constrained networks can be relieved, for the benefit of all netizens.
Appendix
Below are links to all the source clips used in the Beamr 4x CABR UGC test.