We are thrilled to share with you that Beamr has won the Seagate Lyve Innovator of the Year competition!
The competition was organized by Seagate Lyve Innovation Labs, which is a collaboration platform that Seagate utilizes to work with entrepreneurs, startup companies and enterprises to create joint solutions based on the flow of data. Lyve Labs currently operates exclusively in Israel, and plans to open additional Labs around the world in the future. Lyve Labs and its partners explore industry challenges, and work together to develop simple, secure, and efficient ways to move and optimize data.
Optimize data? Well, that’s exactly what we do here at Beamr – optimizing images and videos, by reducing their size as much as possible while retaining full quality. So when we heard that Lyve Labs is holding a competition, seeking the most innovative company in this field – we immediately seized this opportunity and registered!
After an initial screening process of several dozen candidate companies, we were informed that Beamr made it to the finals, where 8 companies were selected to pitch their technology in front of senior Seagate executives.
From the start, it was clear that Lyve Labs is putting on a highly professional event: First, we were asked to prepare slides for a 3-minute company pitch, and invited to a preparation session with Dana Ashkenazi, a senior consultant on innovation, leadership and pitching. Dana gave a one hour presentation to all companies about how to structure a 3 minute pitch, from opening with a bang to closing with a smile, including some very useful tips on slide content and design. Then, each company got a private 20 minute session with Dana and with Ruti Arazi, who handles business development at the Seagate Israel Innovation Center. In this session we went over our draft pitch, and got some useful suggestions for improvement. We were also asked to prepare a 30-second “informal” video to introduce ourselves and the company.
A week before the actual event, we were invited to record our 3-minute pitches at Ynet Studios in Israel. Ynet is Israel’s leading online news portal, owned by Israel’s largest newspaper, and they operate professional studios that broadcast live TV news programs daily. It was a real treat to record our pitch in these studios, with top-of-the-line cameras, lighting, and even a teleprompter – so memorizing our pitch was unnecessary…
On Monday November 15th we gathered again in the same studios for the live event. 6 top Seagate executives joined us via Zoom: Jeff Fochtman, Seagate SVP of Marketing; BS Teh, Seagate EVP of Global Sales & Sales Operations; KF Chong, Seagate SVP of Global Operations; Ravi Naik, Seagate CIO & SVP of Storage Services; Shanye Hudson, Seagate SVP of Investor Relations and Treasury; and Patricia Frost, Seagate SVP & Chief of HR. Their role was to evaluate the pitches, ask follow-on questions, and assess the innovation of each company.
For each company , the judges first saw the 30-second “informal” video, then watched the 3-minute pre-recorded pitch presentation, and immediately after had 5 minutes to ask live questions. Since Beamr was first to present, I didn’t quite know what was coming, and obviously didn’t see the questions ahead of time. But the Q&A session went pretty well, the 5 minutes passed quickly, and then I went back to the “green room” to watch the pitches and Q&A sessions of the other companies. I must say that all of them were well-prepared, presented their case quite clearly, and bravely handled all questions thrown at them by the judges. I guess that’s the nature of entrepreneurs…
After all the presentations and Q&A sessions were completed, the judges took 15 minutes to consult, and finally we all gathered in the studio, shoulder to shoulder. The atmosphere was very tense, and the organizers told us that they have no idea who is the winner – they would get notified by the judges at the last minute. Don’t be fooled by the smiles you see in this picture – we were really anxious to hear the judges’ decision at this point…
Finally, they announced the winner – Beamr!!! I was very excited, and stepped up to receive a trophy and an oversized cheque to the amount of $10,000. All the others cheered and shook my hand, and I felt very proud that Beamr was the winner!
Right on the heels of the Emmy ceremony that took place earlier this month, and our 50th patent awarded in July, it feels like Beamr is on a roll… I am very proud of the recognition we have recently received: The 50 patents recognizing our IP, the Technology and Engineering Emmy® award recognizing our contribution to the TV industry, and the Seagate Lyve Innovator of the Year 2021 award recognizing the innovative nature of our technology. And I am very proud of the Beamr team for developing this amazing technology!
Below you can watch the video of Beamr’s appearance in the competition.
A few weeks ago Beamr reached a historic milestone, which got everyone in the company excited. It was triggered by a rather formal announcement from the US Patent Office, in their typical “dry” language: “THE APPLICATION IDENTIFIED ABOVE HAS BEEN EXAMINED AND IS ALLOWED FOR ISSUANCE AS A PATENT”. We’ve received such announcements many times before, from the USPTO and from other national patent offices, but this one was special: It meant that the Beamr patent portfolio has now grown to 50 granted patents!
We have always believed that a strong IP portfolio is extremely important for an innovative technology company, and invested a lot of human and capital resources over the years to build it. So we thought that this anniversary would be a good opportunity to reflect back on our IP journey, and share some lessons we learned along the way, which might come in handy to others who are pursuing similar paths.
Starting With Image Optimization
Beamr was established in 2009, and the first technology we developed was for optimizing images – reducing their file size while retaining their subjective quality. In order to verify that subjective quality is preserved, we needed a way to accurately measure it, and since existing quality metrics at the time were not reliable enough (e.g. PSNR, SSIM), we developed our own quality metric, which was specifically tuned to detect the artifacts of block-based compression.
Our first patent applications covered the components of the quality measure itself, and its usage in a system for “recompressing” images or video frames. The system takes a source image or a video frame, compresses it at various compression levels, and then compares the compressed versions to the source. Finally, it selects the compressed version that is smallest in file size, but still retains the full quality of the source, as measured by our quality metric.
After these initial patent applications which covered the basic method we were using for optimization, we submitted a few more patent applications which covered additional aspects of the optimization process. For example, we found that sometimes when you increase the compression level, the quality of the image increases, and vice versa. This is counter-intuitive, since typically increasing the compression reduces image quality, but it does happen in certain situations. It means that the relationship between quality and compression is not “monotonic”, which makes finding the optimal compression level quite challenging. So we devised a method to solve this issue of non-monotony, and filed a separate patent application for it.
Another issue we wanted to address was the fact that some images could not be optimized – every compression level we tried would result in quality reduction, and eventually we just copied the source image to the output. In order to save CPU cycles, we wanted to refrain from even trying to optimize such images. Therefore, we developed an algorithm which determines whether the source image is “highly compressed” (meaning that it can’t be optimized without compromising quality), based on analyzing the source image itself. And of course – we submitted a patent application on this algorithm as well.
As we continued to develop the technology, we found that some images required special treatment due to specific content or characteristics of the images. So we filed additional patent applications on algorithms we developed for configuring our quality metric for specific types of images, such as synthetic (computer-generated) images and images with vivid colors (chroma-rich).
Extending to Video Optimization
Optimizing images turned out to be very valuable for improving the workflow of professional photographers, reducing page load time for web services, and improving the UX for mobile photo apps. But with video reaching 80% of total Internet bandwidth, it was clear that we needed to extend our technology to support optimizing full video streams. As our technology evolved, so did our patent portfolio: We filed patent applications on the full system of taking a source video, decoding it, encoding each frame with several candidate compression levels, selecting the optimal compression level for that frame, and moving on to the next frame. We also filed patent applications on extending the quality measure with additional components that were designed specifically for video: For example, a temporal component that measures the difference in the “temporal flow” of two successive frames using different compression levels. Special handling of real or simulated “film grain”, which is widely used in today’s movie and TV productions, was the subject of another patent application.
When integrating our quality measure and control mechanism (which sets the candidate compression levels) with various video encoders, we came to the conclusion that we needed a way to save and reload a “state” of the encoder without modifying the encoder internals, and of course – patented this method as well. Additional patents were filed on a method to optimize video streams on the basis of a GOP (Group of Pictures) rather than a frame, and on a system that improves performance by determining the optimal compression level based on sampled segments instead of optimizing the whole stream.
Embracing Video Encoding
In 2016 Beamr acquired Vanguard Video, the leading provider of software H.264 and HEVC encoders. We integrated our optimization technology into Vanguard Video’s encoders, creating a system that optimized video while encoding it. We call this CABR, and obviously we filed a patent on the integrated system. For more information about CABR, see our blog post “A Deep Dive into CABR”.
With the acquisition of Vanguard, we didn’t just get access to the world’s best SW encoders. We also gained a portfolio of video encoding patents developed by Vanguard Video, which we continued to extend in the years since the acquisition. These patents cover unique algorithms for intra prediction, motion estimation, complexity analysis, fading and scene change analysis, adaptive pre-processing, rate control, transform and block type decisions, film grain estimation and artifact elimination.
In addition to encoding and optimization, we’ve also filed patents on technologies developed for specific products. For example, some of our customers wanted to use our image optimization technology while creating lower-resolution preview images, so we patented a method for fast and high-quality resizing of an image. Another patent application was filed on an efficient method of generating a transport stream, which was used in our Beamr Optimizer and Beamr Transcoder products.
The chart below shows the split of our 50 patents by the type of technology.
Patent Strategy – Whether and Where to File
Our patent portfolio was built to protect our inventions and novel developments, while at the same time establish the validity of our technology. It’s common knowledge that filing for a patent is a time and money consuming endeavor. Therefore, prior to filing each patent application we ask ourselves: Is this a novel solution to an interesting problem? Is it important to us to protect it? Is it sufficiently tangible (and explainable) to be patentable? Only when the answer to all these questions is a resounding yes, we proceed to file a corresponding patent application.
Geographically speaking, you need to consider where you plan to market your products, because that’s where you want your inventions protected. We have always been quite heavily focused on the US market, making that a natural jurisdiction for us. Thus, all our applications were submitted to the US Patent Office (USPTO). In addition, all applications that were invented in Beamr’s Israeli R&D center were also submitted to the Israeli Patent Office (ILPTO). Early on, we also submitted some of the applications in Europe and Japan, as we expanded our sales activities to these markets. However, our experience showed that the additional translation costs (not only of the patent application itself, but also of documents cited by an Office Action to which we needed to respond), as well as the need to pay EU patent fees in each selected country, made this choice less cost effective. Therefore, in recent years we have focused our filings mainly on the US and Israel.
The chart below shows the split of our 50 patents by the country in which they were issued.
Patent Process – How to File
The process which starts with an idea, or even an implemented system based on that idea, and ends in a granted patent – is definitely not a short or easy one.
Many patents start their lifecycle as Provisional Applications. This type of application has several benefits: It doesn’t require writing formal patent claims or an Information Disclosure Statement (IDS), it has a lower filing fee than a regular application, and it establishes a priority date for subsequent patent filings. The next step can be a PCT, which acts as a joint base for submission in various jurisdictions. Then the search report and IDS are performed, followed by filing national applications in the selected jurisdictions. Most of our initial patent applications went through the full process described above, but in some cases, particularly when time was of the essence, we skipped the provisional or PCT steps, and directly filed national applications.
For a national application, the invention needs to be distilled into a set of claims, making sure that they are broad enough to be effective, while constrained enough to be allowable, and that they follow the regulations of the specific jurisdiction regarding dependencies, language etc. This is a delicate process, and at this stage it is important to have a highly experienced patent attorney that knows the ins and outs of filing in different countries. For the past 12 years, since filing our first provisional patent, we were very fortunate to work with several excellent patent attorneys at the Reinhold Cohen Group, one of the leading IP firms in Israel, and we would like to take this opportunity to thank them for accompanying us through our IP journey.
After finalizing the patent claims, text and drawings, and filing the national application, what you need most is – patience… According to the USPTO, the average time between filing a non-provisional patent application and receiving the first response from the USPTO is around 15-16 months, and the total time until final disposition (grant or abandonment) is around 27 months. Add this time to the provisional and PCT process, and you are looking at several years between filing the initial provisional application and receiving the final grant notice. In some cases it’s possible to speed up the process by using the option of a modified examination in one jurisdiction, after the application gained allowance in another jurisdiction.
The chart below shows the number of granted patents Beamr has received in each passing year.
Sometimes, the invention, description and claims are straightforward enough that the examiner is convinced and simply allows the application as filed. However, this is quite a rare occurrence. Usually there is a process of Office Actions – where the examiner sends a written opinion, quoting prior art s/he believes is relevant to the invention and possibly rejecting some or even all the claims based on this prior art. We review the Office Action and decide on the next step: In some cases a simple clarification is required in order to make the novelty of our invention stand out. In others we find that adding some limitation to the claims makes it distinctive over the prior art. We then submit a response to the examiner, which may result either in acceptance or in another Office Action. Occasionally we choose to perform an interview with the examiner to better understand the objections, and discuss modifications that can bring the claims into allowance.
Finally, after what is sometimes a smooth, and sometimes a slightly bumpy route, hopefully a Notice Of Allowance is received. This means that once filing fees are paid – we have another granted patent! In some cases, at this point we decide to proceed with a divisional application, a continuation or continuation in part – which means that we claim additional aspects of the described invention in a follow up application, and then the patent cycle starts once again…
Summary
Receiving our 50th patent was a great opportunity to reflect back on the company’s IP journey over the past 12 years. It was a long and winding road, which will hopefully continue far into the future, with more patent applications, office actions and new grants to come.
Speaking of new grants – as this blog post went to press, we were informed that our 51st patent was granted! This patent covers “Auto-VISTA”, a method of “crowdsourcing” subjective user opinions on video quality, and aggregating the results to obtain meaningful metrics. You can learn more about Auto-VISTA in Episode 34 of The Video Insiders podcast.
AV1, the open source video codec developed by the Alliance for Open Media, is the most efficient open-source encoder available today. AV1’s compression efficiency has been found to be 30% better than VP9, the previous generation open source codec, meaning that AV1 can reach the same quality as VP9 with 30% less bits. Having an efficient codec is especially important now that video consumes over 80% of Internet bandwidth, and the usage of video for both entertainment and business applications is soaring due to social distancing measures.
Beamr’s Emmy® award-winning CABR technology reduces video bitrates by up to 50% while preserving perceptual quality. The technology creates fully-compliant standard video streams, which don’t require any proprietary decoder or add-on on the playback side. We have applied our CABR technology in the past to H.264, HEVC and VP9 codecs, using both software and hardware encoder implementations.
In this blog post we present the results of applying Beamr’s CABR technology to the AV1 codec, by integrating our CABR library with the libaom open source implementation of AV1. This integration results in a further 25-40% reduction in the bitrate of encoded streams, without any visible reduction in subjective quality. The reduced-bitrate streams are of course fully AV1 compatible, and can be viewed with any standard AV1 player.
CABR In Action
Beamr’s CABR (Content Adaptive BitRate) technology is based on our BQM (Beamr Quality Measure) metric, which was developed over 10 years of intensive research, and features very high correlation with subjective quality as judged by humans. BQM is backed by 37 granted patents, and has recently won the 2021 Technology and Engineering Emmy® award from the National Academy of Television Arts & Sciences.
Beamr’s CABR technology and the BQM quality measure can be integrated with any software or hardware video encoder, to create more bitrate-efficient encodes without sacrificing perceptual quality. In the integrated solution, the video encoder encodes each frame with additional compression levels, also known as QP values. The first QP (for the initial encode) is determined by the encoder’s own rate control mechanism, which can be either VBR, CRF or fixed QP. The other QPs (for the candidate encodes) are provided by the CABR library. The BQM quality measure then compares the quality of the initial encoded frame to the quality of the candidate encoded frames, and selects the encoded frame which has the smallest size in bits, but is still perceptually identical to the initial encoded frame. Finally, the selected frame is written to the output stream. Due to our adaptive method of searching for candidate QPs, in most cases a single candidate encode is sufficient to find a near-optimal frame, so the performance penalty is quite manageable.
Integrating Beamr’s CABR module with a video encoder
By applying this process to each and every video frame, the CABR mechanism ensures that each frame fully retains the subjective quality of the initial encode, while bitrate is reduced by up to 50% compared to encoding the videos using the encoders’ regular rate control mechanism.
Beamr’s CABR rate control library is integrated into Beamr 4 and Beamr 5, our software H.264 and HEVC encoder SDKs, and is also available as a standalone library that can be integrated with any software or hardware encoder. Beamr is now implementing BQM in silicon hardware, enabling massive scale content-adaptive encoding of user-generated content, surveillance videos and cloud gaming streams.
CABR Integration with libaom
When we approached the task of integrating our CABR technology with an AV1 encoder, we examined several available open source implementations of AV1, and eventually decided to integrate with libaom, the reference open source implementation of the AV1 encoder, developed by the members of the Alliance of Open Media. libaom was selected due to a good quality-speed tradeoff at the higher quality working points, and a well defined frame encode interface which made the integration more straightforward.
To apply CABR technology to any encoder, the encoder should be able to re-encode the same input frame with different QPs, a process that we call “roll-back”. Fortunately, the libaom AV1 encoder already includes a re-encode loop, designed for the purpose of meeting bitrate constraints. We were able to utilize this mechanism to enable the frame re-encode process needed for CABR.
Another important aspect of CABR integration is that although CABR reduces the actual bitrate relative to the requested “target” bitrate, we need the encoder’s rate control to believe that the target bitrate has actually been reached. Otherwise, it will try to compensate for the bits saved by CABR, by increasing bit allocation in subsequent frames, and this will undermine the process of CABR’s bitrate reduction. Therefore, we have modified the VBR rate-control feedback, reporting the bit-consumption of the initial encode back to the RC module, instead of the actual bit consumption of the selected output frame.
An additional point of integration between an encoder and the CABR library is that CABR uses “complexity” data from the encoder when calculating the BQM metric. The complexity data is based on the per-block QP and bit consumption reported by the encoder. In order to expose this information, we added code that extracts the QP and bit consumption per block, and sends it to the CABR library.
The current integration of CABR with libaom supports 8 bit encoding, in both fixed QP and single pass VBR modes. 10-bit encoding (including HDR) and dual-pass VBR encoding are already supported with CABR in our own H.264 and HEVC encoders, and can be easily added to our libaom integration as well.
Integration Challenges
Every integration has its challenges, and indeed we encountered several of them while integrating CABR with libaom. For example, the re-encode loop in libaom initiates prior to the deblocking and other loop-filters, so the frame it generates is not the final reconstructed frame. To overcome this issue, we moved the in-loop filters and applied them prior to evaluating the candidate frame quality.
Another challenge we encountered was that the CABR complexity data is based on the QP values and bit consumption per 16×16 block, while within the libaom encoder this information is only available for bigger blocks. To resolve this, we had to process the actual data in order to generate the QP and bit consumption at the required resolution.
The concept of non-display frames, which is unique to VP9 and AV1, also posed a challenge to our integration efforts. The reason is that CABR only compares quality for frames that are actually displayed to the end user. So we had to take this into account when computing the BQM quality measure and calculating the bits per frame.
Finally, while the QP range in H.264 and HEVC is between 0 and 51, in AV1 it is between 0 and 255. We have an algorithm in CABR called “QP Search” which finds the best candidate QPs for each frame, and it was tuned for the QP range of 0-51, since it was originally developed for H.264 and HEVC encoders. We addressed this discrepancy by performing a simple mapping of values, but in the future we may perform some additional fine tuning of the QP Search algorithm in order to better utilize the increased dynamic range.
Benchmarking Process
To evaluate the results of Beamr’s CABR integration with the libaom AV1 encoder, we selected 20 clips from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license. The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post.
We encoded the selected video clips with libaomx, our version of libaom integrated with the CABR library. The videos were encoded using libaom cpu-used=9, which is the fastest speed available in libaom, and therefore the most practical in terms of encoding time. We believe that using lower speeds, which provide improved encoding quality, can result in even higher savings.
Each clip was encoded twice: once using the regular VBR rate control without the CABR library, and a second time using the CABR rate control mode. In both cases, we used 3 target bitrates for each resolution: A high, medium and low bitrate, as specified in the table below.
Target bitrates used in the CABR-AV1 benchmark
Below is the command line we used to encode the files.
aomencx --cabr=<0 or 1> -w <width> -h <height> --fps=<fps>/1 --disable-kf --end-usage=vbr --target-bitrate=<bitrate in kbps> --cpu-used=9 -p 1 -o <outfile>.ivf <inputFIFO>.yuv
After we completed the encodes in both rate control modes, we compared the bitrate and subjective quality of both encodes. We calculated the % of difference in bitrate between the regular VBR encode and the CABR encode, and visually compared the quality of the clips to determine whether both encodes are perceptually identical to each other when viewed side by side in motion.
Benchmark Results
The table below shows the VBR and CABR bitrates for each file, and the savings obtained, which is calculated as (VBR bitrate – CABR bitrate) / VBR bitrate. As expected, the savings are higher for high bitrate clips, but still significant even for the lowest bitrates we used. Average savings are 26% for the low bitrates, 33% for the medium bitrates, and 40% for the high bitrates.
Note that savings differ significantly across different clips, even when they are encoded at the same resolution and target bitrate. For example, if you look at 1080p clips encoded to the lowest bitrate target (2 Mbps), you will find that some clips have very low savings (less than 3%), while other clips have very high savings (over 60%). This shows the content-adaptive nature of our technology, which is always committed to quality, and reduces the bitrate only in clips and frames where such reduction does not compromise quality.
Also note that the VBR bitrate may differ from the target bitrate. The reason is that the rate control does not always converge to the target bitrate, due to the short length of the clips. But in any case, the savings were calculated between the VBR bitrate and the CABR bitrate.
Savings – Low BitratesSavings – Medium BitratesSavings – High Bitrates
In addition to calculating the bitrate savings, we also performed subjective quality testing by viewing the videos side by side, using the YUView player software. In these viewings we verified that indeed for all clips, the VBR and CABR encodes are perceptually identical when viewed in motion at 100% zoom. Below are a few screenshots from these side-by-side viewings.
Conclusions
In this blog post we presented the results of integrating Beamr’s Content Adaptive BitRate (CABR) technology with the libaom implementation of the AV1 encoder. Even though AV1 is the most efficient open source encoder available, using CABR technology can reduce AV1 bitrates by a further 25-40% without compromising perceptual quality. The reduced bitrate can provide significant savings in storage and delivery costs, and enable reaching wider audiences with high-quality, high-resolution video content.
Appendix
The VBR and CABR encoded files can be found here. The source files can be downloaded directly from the YouTube UGC Dataset, using the links below.
The attention of Internet users, especially the younger generation, is shifting from professionally-produced entertainment content to user-generated videos and live streams on YouTube, Facebook, Instagram and most recently TikTok. On YouTube, creators upload 500 hours of video every minute, and users watch 1 billion hours of video every day. Storing and delivering this vast amount of content creates significant challenges to operators of user-generated content services. Beamr’s CABR (Content Adaptive BitRate) technology reduces video bitrates by up to 50% compared to regular encodes, while preserving perceptual quality and creating fully-compliant standard video streams that don’t require any proprietary decoder on the playback side. CABR technology can be applied to any existing or future block-based video codec, including AVC, HEVC, VP9, AV1, EVC and VVC.
In this blog post we present the results of a UGC encoding test, where we selected a sample database of videos from YouTube’s UGC dataset, and encoded them both with regular encoding and with CABR technology applied. We compare the bitrates, subjective and objective quality of the encoded streams, and demonstrate the benefits of applying CABR-based encoding to user-generated content.
Beamr CABR Technology
At the heart of Beamr’s CABR (Content-Adaptive BitRate) technology is a patented perceptual quality measure, developed during 10 years of intensive research, which features very high correlation with human (subjective) quality assessment. This correlation has been proven in user testing according to the strict requirements of the ITU BT.500 standard for image quality testing. For more information on Beamr’s quality measure, see our quality measure blog post.
When encoding a frame, Beamr’s encoder first applies a regular rate control mechanism to determine the compression level, which results in an initial encoded frame. Then, the Beamr encoder creates additional candidate encoded frames, each one with a different level of compression, and compares each candidate to the initial encoded frame using the Beamr perceptual quality measure. The candidate frame which has the lowest bitrate, but still meets the quality criteria of being perceptually identical to the initial frame, is selected and written to the output stream.
This process repeats for each video frame, thus ensuring that each frame is encoded to the lowest bitrate, while fully retaining the subjective quality of the target encode. Beamr’s CABR technology results in video streams that are up to 50% lower in bitrate than regular encodes, while retaining the same quality as the full bitrate encodes. The amount of CPU cycles required to produce the CABR encodes is only 20% higher than regular encodes, and the resulting streams are identical to regular encodes in every way except their lower bitrate. CABR technology can also be implemented in silicon for high-volume video encoding use cases such as UGC video clips, live surveillance cameras etc.
For more information about Beamr’s CABR technology, see our CABR Deep Dive blog post.
CABR for UGC
Beamr’s CABR technology is especially suited for User-Generated Content (UGC), due to the high diversity and variability of such content. UGC content is captured on different types of devices, ranging from low-end cellular phones to high-end professional cameras and editing software. The content itself varies from “talking head” selfie videos, to instructional videos shot in a home or classroom, to sporting events and even rock band performances with extreme lighting effects.
Encoding UGC content with a fixed bitrate means that such a bitrate might be too low for “difficult” content, resulting in degraded quality, while it may be too high for “easy” content, resulting in wasted bandwidth. Therefore, content-adaptive encoding is required to ensure that the optimal bitrate is applied to each UGC video clip.
Some UGC services use the Constant Rate Factor (CRF) rate control mode of the open-source x264 video encoder for processing UGC content, in order to ensure a constant quality level while varying the actual bitrate according to the content. However, CRF bases its compression level decisions on heuristics of the input stream, and not on a true perceptual quality measure that compares candidate encodes of a frame. Therefore, even CRF encodes waste bits that are unnecessary for a good viewing experience. Beamr’s CABR technology, which is content-adaptive at the frame level, is perfectly suited to remove these remaining redundancies, and create encodes that are smaller than CRF-based encodes but have the same perceptual quality.
Evaluation Methodology
To evaluate the results of Beamr’s CABR algorithm on UGC content, we used samples from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license, which was created to assist in video compression and quality assessment research. The dataset includes around 1500 source video clips (raw video), with a duration of 20 seconds each. The resolution of the clips ranges from 360p to 4K, and they are divided into 15 different categories such as animation, gaming, how-to, music videos, news, sports, etc.
To create the database used for our evaluation, we randomly selected one clip in each resolution from each category, resulting in a total of 67 different clips (note that not all categories in the YouTube UGC set have clips in all resolutions). The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post. As typical user-generated videos, many of the videos suffer from perceptual quality issues in the source, such as blockiness, banding, blurriness, noise, jerky camera movements, etc. which makes them specifically difficult to encode using standard video compression techniques.
We encoded the selected video clips using Beamr 4x, Beamr’s H.264 software encoder library, version 5.4. The videos were encoded using speed 3, which is typically used to encode VoD files in high quality. Two rate control modes were used for encoding: The first is CSQ mode, which is similar to x264 CRF mode – this mode aims to provide a Constant Subjective Quality level, and varies the encoded bitrate based on the content to reach that quality level. The second is CSQ-CABR mode, which creates an initial (reference) encode in CSQ mode, and then applies Beamr’s CABR technology to create a reduced-bitrate encode which has the same perceptual quality as the target CSQ encode. In both cases, we used a range of six CSQ values equally spaced from 16 to 31, representing a wide range of subjective video qualities.
After we completed the encodes in both rate control modes, we compared three attributes of the CSQ encodes to the CSQ-CABR encodes:
File Size – to determine the amount of bitrate savings achievable by the CABR-CSQ rate control mode
BD-Rate – to determine how the two rate control modes compare in terms of the objective quality measures PSNR, SSIM and VMAF, computed between each encode and the source (uncompressed) video
Subjective quality – to determine whether the CSQ encode and the CABR-CSQ encode are perceptually identical to each other when viewed side by side in motion.
Results
The table below shows the bitrate savings of CABR-CSQ vs. CSQ for various values of the CSQ parameter. As expected, the savings are higher for low CSQ values, which correlate with higher subjective quality and higher bitrates. As the CSQ increases, quality decreases, bitrate decreases, and the savings of the CABR-CSQ algorithm are decreased as well.
Table 1: Savings by CSQ value
The overall average savings across all clips and all CSQ values is close to 26%. If we average the savings only for the high CSQ values (16-22), which correspond to high quality levels, the average savings are close to 32%. Obviously, saving one quarter or one third of the storage cost, and moreover the CDN delivery cost, can be very significant for UGC service providers.
Another interesting analysis would be to look at how the savings are distributed across specific UGC genres. Table 2 shows the average savings for each of the 15 content categories available on the YouTube UGC Dataset.
Table 2: Savings by Genre
As we can see, simple content such as lyric videos and “how to” videos (where the camera is typically fixed) get relatively higher savings, while more complex content such as gaming (which has a lot of detail) and live music (with many lights, flashes and motion) get lower savings. However, it should be noted that due to the relatively low number of selected clips from each genre (one in each resolution, for a total of 2-5 clips per genre), we cannot draw any firm conclusions from the above table regarding the expected savings for each genre.
Next, we compared the objective quality metrics PSNR, SSIM and VMAF for the CSQ encodes and the CABR-CSQ encodes, by creating a BD-Rate graph for each clip. To create the graph, we computed each metric between the encodes at each CSQ value and the source files, resulting in 6 points for CSQ and 6 points for CABR-CSQ (corresponding to the 6 CSQ values used in both encodes). Below is an example of the VMAF BD-Rate graph comparing CSQ with CABR-CSQ for one of clips in the lyric video category.
Figure 1: CSQ vs. CSQ-CABR VMAF scores for the 1920×1080 LyricVIdeo file
As we can see, the BD-Rate curve of the CABR-CSQ graph follows the CSQ curve, but each CSQ point on the original graph is moved down and to the left. If we compare, for example, the CSQ 19 point to the CABR-CSQ 19 point, we find that CSQ 19 has a bitrate of around 8 Mbps and a VMAF score of 95, while the CABR-CSQ 19 point has a bitrate of around 4 Mbps, and a VMAF score of 91. However, when both of these files are played side-by-side, we can see that they are perceptually identical to each other (see screenshot from the Beamr View side by side player below). Therefore, the CABR-CSQ 19 encode can be used as a lower-bitrate proxy for the CSQ 19 encode.
Figure 2: Side-by-side comparison in Beamr View of CSQ 19 vs. CSQ-CABR 19 encode for the 1920×1080 LyricVIdeo file
Finally, to verify that the CSQ and CABR-CSQ encodes are indeed perceptually identical, we performed subjective quality testing using the Beamr VISTA application. Beamr VISTA enables visually comparing pairs of video sequences played synchronously side by side, with a user interface for indicating the relative subjective quality of the two video sequences (for more information on Beamr VISTA, listen to episode 34 of The Video Insiders podcast). The set of target comparison pairs comprised 78 pairs of 10 second segments of Beamr4x CSQ encodes vs. corresponding Beamr4x CABR-CSQ encodes. 30 test rounds were performed, resulting in 464 valid target pair views (e.g. by users who correctly recognized mildly distorted control pairs), or on average 6 views per pair. The results show that on average, close to 50% of the users selected CABR-CSQ as having lower quality, while a similar percentage of users selected CSQ as having lower quality, therefore we can conclude that the two encodes are perceptually identical with a statistical significance exceeding 95%.
Figure 3: Percentage of users who selected CABR-CSQ as having lower quality per file
Conclusions
In this blog post we presented the results of applying Beamr’s Content Adaptive BitRate (CABR) encoding to a random selection of user-generated clips taken from the YouTube UGC Dataset, across a range of quality (CSQ) values. The CABR encodes had 25% lower bitrate on average than regular encodes, and at high quality values, 32% lower bitrate on average. The Rate-Distortion graph is unaffected by applying CABR technology, and the subjective quality of the CABR encodes is the same as the subjective quality of the regular encodes. By shaving off a quarter of the video bitrate, significant storage and delivery cost savings can be achieved, and the strain on today’s bandwidth-constrained networks can be relieved, for the benefit of all netizens.
Appendix
Below are links to all the source clips used in the Beamr 4x CABR UGC test.
Post COVID-19, the work from home trend will continue, and this will extend the pressure on the Internet from video traffic. Even with the EU Commissioner’s call for video services to reduce their traffic by 25%, as Internet traffic patterns shift from corporate networks to mobile, fixed wireless, and broadband networks, the need to reduce video bandwidths will continue beyond COVID-19. Consumers will still demand the highest quality, and those streaming services meeting their expectations while delivering video in as small a footprint as possible will dominate the market. Now is the time for the streaming video industry to play an active role in adopting more efficient codecs and content-adaptive bitrate technology so that streaming video services can ensure a great user experience without disrupting the Internet.
https://youtu.be/ltYNDQkiyl8
The Internet is a shared resource to preserve.
For the video streaming industry, Thursday, March 19th, marked the day of reckoning for runaway bitrates and seemingly never-ending network capacity. On March 19th, Thierry Breton, the European Commissioner for the Internal Market tweeted, “let’s #SwitchToStandard definition when HD is not necessary.” The result is that most of the best known US video services, including Facebook and Instagram, agreed to a 25% reduction in bandwidth used for video delivered in Europe, UK, and Israel. With other countries rumored to follow suit.
We can blame COVID-19 for the strain as a result of closed schools and businesses, leading to increased use of video conferencing, streaming video services, and cloud gaming. Verizon reported that total web traffic is up 22% between March 12th and March 19th, while week-over-week usage patterns for streaming video services increased by 12%. However, it’s easily predictable that these numbers are trending even higher as the quarantine and shelter in place orders expanded, as evidenced by Cloudflare reporting Internet traffic is 40% higher than pre COVID-19 levels.
The purpose of this article is to provide a framework for how video streaming services may want to think about the Internet post COVID-19 where video streaming services and video-centric applications will need to consider their utilization of the Internet as a shared and not an unlimited resource.
Content-Adaptive Encoding is no longer a nice to have for streaming services.
There are multiple technical and technology options available for reducing video bitrate. The fastest to implement, however, is to drop the resolution of the video. By manipulating the video playlist (called a manifest) that organizes the various resolutions and bit rates that enable the video player to adapt to the speed of the network, a video service can achieve immediate savings by merely serving a lighter weight version of the video. Standard-definition (SD) instead of high-definition (HD). This approach is what most of the complying services have taken, but, it is not a sustainable answer since dropping resolution impacts video experience negatively.
A more advanced technique known as Content-Adaptive Encoding works by guiding the encoder to adapt the bitrate allocation to the needs of the video content.
Reducing resolution is not what consumers want, and this will make content-adaptive encoding essential for many video encoding workflows. Because content-adaptive encoding solutions require integration, for some services, it was relegated to the “nice to have” list. But now, with the sweeping changes to video consumption that is driving network saturation, those services that must compete with high visual quality, are shifting the priority to “must-have.”
Effective tools and methods to be a good citizen of the Internet.
If we are going to be a good citizen of the Internet, we should understand what tools and methods are available to preserve this precious shared resource while delivering a suitable UX and visual quality.
Engineering for video encoding is about tradeoffs. The three primary levers are 1) bitrate, 2) resolution, 3) encoder performance. These levers are interconnected and dependent. For example, it’s not possible to achieve high bitrate efficiency at higher resolutions without affecting encoder performance (increasing CPU cycles).
From a video quality perspective, lever one and lever two are the levers available to most video encoding engineers. While from an operational point of view, the third lever is what most impacts bitrate and quality.
The tools that we can use to reduce bandwidth include the use of advanced video codecs such as HEVC. HEVC (H.265) provides up to 50% reduction in bitrate at the same quality level as H.264, the current dominant codec used around the world. The other tool available is advanced technology, such as content-adaptive encoding, implemented inside the encoder.
Beamr’s Content-Adaptive Bitrate (CABR) rate-control is an example of advanced technology that brings an additional 20-40% reduction in bitrate. Using HEVC and CABR, a 4K HDR video file can be as small as 10Mbps, an added savings of as much 6Mbps without CABR. With the promise of a 50% bitrate reduction using HEVC, and over 2 billion devices supporting HEVC decoding in hardware, it’s the obvious thing to do for a video service concerned about the sustainability of the Internet.
If a technical integration of a new codec is not possible, the three most popular methods for reducing bitrate are Per-Category, Per-Title, and Per-Frame Encoding optimization.
Per-Category Encoding optimization.
The Per-Category Encoding approach is least practical for premium movies, and TV shows since the range of encoding complexity within a category can vary significantly. Animated videos are typically easier to compress than video captured from a camera sensor, given the wide range of complexity. Animation techniques are highly diverse, from hand-drawn to 2D to 3D, and that makes it challenging to create an encoding ladder that works across animated content equally.
Per-Category Encoding is the easiest of all the methods to implement, but also produces the lowest real bitrate reduction because of the variability of scenes. For example, a sports broadcast may include talking head in-studio shots along with fast action gameplay and slow-motion recaps, each requiring different bitrate values to preserve the quality level.
Per-Title Encoding optimization.
Per-Title Encoding received a big boost when Netflix published a blog post explaining their encoding schema that creates a custom encoding ladder for each video file. The system performs a series of test encodes at different CRF levels and resolutions that are analyzed using the Video Multimethod Assessment Fusion (VMAF) quality metric. Netflix uses the scores to identify the best quality resolution at each applicable data rate.
Though many video services have adopted their variation, Per-Title Encoding, or some variation of it can now be found in many video encoding workflows. It’s a great way to rethink fixed ABR recipes that are the primary source of wasted bandwidth, or poor video quality. Per-Title Encoding only works when you have a smaller library as it requires extensive computing resources to run the hundreds of fractional encodes needed for each title.
Per-Title Encoding helps to reduce bitrate but is limited in its ability since the rudimentary VBR rate-control bounds the encoder QP setting with no additional intelligence.
Per-Frame Encoding optimization.
The weakness of a category or title based optimization method is that this approach cannot adapt to the specific needs of the video at the frame level. Only by steering the encoder decisions frame by frame is it possible to achieve the ultimate result of producing high quality with the least number of bits required.
Beamr’s CABR technology is the primary feature of the Beamr 4x and Beamr 5x encoding engines. CABR operates at the frame level to deliver the smallest possible size for each video frame while ensuring the highest overall quality of each frame within the video sequence. This approach avoids transient quality issues in other optimization techniques. The Beamr Quality Measure Analyzer has a higher correlation with subjective results than existing quality measures such as PSNR, and SSIM. CABR is protected by the majority of Beamr’s 48 granted patents.
To learn more about Beamr’s Content-Adaptive Bitrate technology, you can hear Tamar Shoham, Head of Algorithms at Beamr, explain CABR here.
We must all play our part in preserving the integrity of the Internet.
Just as environmental sustainability is an essential initiative for companies who want to be good citizens of the world, in the COVID-19 world that we are living in, video sustainability is now an equally vital initiative. And this is likely to be unchanged in the future as the work from home and virtual meeting trends continue post COVID-19. Now is the time for the streaming video industry to play an active role in adopting more efficient codecs and content-adaptive bitrate technology so that we can ensure a great user experience without disrupting the Internet.
TL;DR: Beamr CABR operating with the Intel Media SDK hardware encoder powered by Intel GPUs is the perfect video encoding engine for cloud gaming services like Google Stadia. The Intel GPU hardware encoder reaches real-time performance with a power envelope that is 90% less than a CPU based software solution. When combined with Beamr CABR (Content-Adaptive Bitrate) technology, the required bandwidth for cloud gaming is reduced by as much as 49% while delivering higher quality 65% of the time. Using the Intel hardware encoder combined with Beamr CABR enables players to enjoy a gaming experience that is competitive to a console and able to be streamed by cloud gaming platforms. Get more information about how CABR works.
The era of cloud gaming.
With the launch of Google Stadia, we have entered a new era in the games industry called cloud gaming. Just as streaming video services opened media and entertainment content to a broader audience by freeing it from the fixed frameworks of terrestrial (over-the-air), cable, and satellite distribution, so to will cloud gaming open gameplay to a larger audience. Besides extending gameplay to virtually anywhere the user has a network-connected device, the ability for a player to access an extensive library of games without needing to use a specific piece of hardware will push 25.9 million players to cloud gaming platforms by 2023, according to the media research group Kagan.
In addition to opening up gameplay to an “anywhere/anytime” experience. A major user experience benefit of cloud gaming is that players will not necessarily need to purchase a game, but in many cases will be free to access a vast library of their choosing instantaneously. Cloud gaming services promise the quality of a console or PC experience, but without the need to own expensive hardware and the configuration and software installation work that comes with that.
The one constraint that could cause cloud gaming to never catch up with the console experience.
With the wholesale transition of video entertainment content from traditional broadcast and physical media to streaming distribution, it is not hard to project the same pattern will occur for games. Except now, unlike the early days of video streaming where a 3Mbps home Internet connection was “high speed,” and the number of devices able to decode and reliably play back H.264 video was limited, even the lowest cost smartphone can stream video with acceptable quality.
Yet, there is a fundamental constraint that must be overcome for cloud gaming to reach its full market potential, and that is the bandwidth required to deliver a competitive video experience at 1080p60 or 4kp60 resolution. To better understand the bandwidth squeeze that is unique to cloud gaming, let’s examine the data and signal flow.
In FIGURE 1 we see the cloud gaming architecture moves compute-intensive operations, like the graphics rendering engine, to the cloud.
FIGURE 1
Shifting the compute-intensive function to the cloud eliminates device technical capability from being a bottleneck. However, as a result of the video rendering and encoding function not being local to the user, it means the video stream needs to be delivered over the network, with latency in the tens of milliseconds. And, at a framerate that is double the entertainment video frame rate of 24, 25, or 30 frames per second. Additionally, video game resolutions need to be HD with 4K preferable. Also, HDR is an increasingly important capability for many AAA game titles.
None of these requirements are impossible to meet, except as a result of needing fast encoding speed, the encoder must be operated in a mode that makes it difficult to produce high-quality and with small stream size. Because of the added time needed for the encoder to create B frames, and without the benefit of a look-ahead buffer, producing high quality with low bitrate is not possible. Hence why cloud gaming services require a significantly higher bitrate than what is possible with traditional video on demand streaming video services.
Beamr has been innovating in the area of performance, allowing us to encode H.264 and HEVC in software with breathtaking speed, even when running our most advanced Content-Adaptive Bitrate (CABR) rate-control. For video applications where a single encoder can serve hundreds of thousands or even millions of users, the compute requirement to do this in software, given the tremendous benefits of lower bitrate and higher quality, makes it easy to justify. But, in an application like cloud gaming, where the video encoder is matched 1:1 to every user, the computing cost to do this in software makes it uneconomical. The answer is to use a hardware encoder controlled by software, and running a content-adaptive optimization process which can deliver the additional bitrate savings needed.
FIGURE 2 illustrates the required Google Stadia bitrates.
FIGURE 2
The answer is to leverage hardware and software.
The Intel Media SDK and GPU engines occupy a well-established position in the market, with many video services relying on its included HEVC hardware encoder for real-time encoding. However, using the VBR rate-control only, there is a limit to the quality available when bitrate efficiency is essential. The advantage of Beamr’s next-generation rate-control technology, CABR (Content-Adaptive Bitrate), combined with Intel GPUs, is the secret to delivering bitrate efficiency and quality, in real-time, with 90% less power than software alone.
In verified testing, Beamr has shown that the Intel Media SDK hardware encoder controlled by CABR will produce the same perceptual quality as VBR encodes, with a confidence level greater than 95%. Using CABR gives a meaningful impact on user experience. 65% of the time, the player will perceive better quality at the same bandwidth, even while the gaming platform experiences up to a 49% reduction in the bandwidth required to provide the same quality level.
Watch Beamr Founder Sharon Carmel present Beamr CABR integrated with Intel Gen 11 hardware encoder at Intel Experience Day October 29, 2019 in Moscow.
Proof of performance.
As an image science company, Beamr is committed to proof of performance with all claims. For this reason, the industry recognizes that all technology, products, and solutions which carry the Beamr name, represent the pinnacle of quality. For this reason, it was insufficient to integrate CABR with the Intel Media SDK without being able to prove that the original quality of the stream is always preserved and that the user experience is improved. Testing comprised corresponding 10-second segments extracted from clips created with the Intel hardware encoder using VBR, and clips encoded using the Intel hardware encoder but with the integrated Beamr CABR rate-control.
The only way to test perceptual quality is with subjective techniques. We used a process similar to forced-choice double stimulus (FCDS), and closely approximating the ITU BT.500 method. Using the Beamr Auto-VISTA framework, we recruited anonymous viewers from Amazon Mechanical Turk where each viewer was shown corresponding segment pairs and asked to select which video had lower quality. The VBR and CABR encoded files were placed at random on the left and right sides. Validation pairs were used to verify the user’s capabilities with visible artifacts inserted, and only test results for users who correctly answered all four validation pairs were incorporated into the analysis. The viewers had up to five attempts to view the pairs before making a decision. Each viewer watched 20 segment pairs consisting of sixteen actual CABR, and VBR encodes, and four validation pairs.
Games used for testing were: CSGO, Fallout, and GTA5. To reflect realistic bitrates, we only tested the middle four bitrates out of the six bitrates provided. This was because the bitrate for the top layer was very high, and the bottom layer quality was very low. The four bitrates tested were spaced one JND (just noticeable difference) apart. Each target test pair was viewed 13 to 21 times by valid users, with a total of 800 target pair viewings, or about 17 viewings per pair on average. The total number of valid test sessions were 50, completed by more than 40 unique viewers.
Peeling back the data, you will notice that the per-pair statistical distribution is quite symmetrical above and below 50%. With the sampling base, this phenomenon is no surprise; human perception varies. The overall results had 800 views of 48 pairs, which make the statistical certainty higher, indicating that CABR is not compromising perceptual quality.
FIGURE 4 shows CABR encodes had the same perceptual quality as VBR and with a confidence level of more than 95%.
FIGURE 4
Better quality, lower bitrate.
Beamr CABR encoded streams offer higher quality when compared subjectively to a VBR equivalent encode, while offering a bitrate savings of up to 49%. Benefits of CABR for cloud gaming or any live streaming service, are quantified by better quality, greater bandwidth savings, and a reduction in storage cost. For the files that we tested, the aggregated metrics were as follows:
65% of the time, users will experience better quality for a given bandwidth.
40% bandwidth savings on average across all three titles (GTA5 had a savings of 49%).
30% overall storage savings.
FIGURE 5, 6, and 7 illustrate for the three video samples used that for a given User Bandwidth, CABR provides higher quality. You will interpret the chart by observing that where VBR is blue, CABR is BLACK (higher quality), and where VBR is turquoise, CABR is BLUE.
FIGURE 5
FIGURE 6FIGURE 7
Conclusion.
Beamr CABR controlling the Intel Media SDK hardware encoder is the perfect video encoding engine for cloud gaming services like Google Stadia. The Beamr CABR rate-control and optimization process works with all Intel codecs, including AVC, HEVC, VP9, and AV1. All bitstreams produced by the Intel + Beamr CABR solution are fully standard-compliant and work with every player in the field today. Beamr CABR is proven and protected by 46 International patents, meaning there is no other solution that can reduce bitrate by as much as 49% while working in real-time using a closed-loop perceptually aligned quality measure to guarantee the original quality.
The single most important technical hurdle for anyone building or operating a cloud gaming service or platform is the bandwidth consumption required to deliver a player experience on par with the console. Now, with Intel + Beamr CABR, the ideal solution is here; one that can reach the performance and density needed for cloud gaming at scale, so that more players can enjoy a premium gaming experience. Streaming video upended the media and entertainment business, with the rise of Netflix, Hulu, Amazon Prime Video, Disney+, Apple TV Plus, and dozens of other tier-one streaming services. In the same way, cloud gaming will create new service platforms, gaming experiences, and business models.
To experience the power of Beamr CABR controlling the Intel hardware encoder, send an email to info@beamr.com.
It has been two years since we published a comparison of the two leading HEVC software encoder SDKs; Beamr 5, and x265. In this article you will learn how Beamr’s AVC and HEVC software codec SDKs have widened the computing performance gap further over x264 and x265 for live broadcast quality streaming.
Why Performance Matters
With the performance of our AVC (Beamr 4) and HEVC (Beamr 5) software encoders improving several orders of magnitude over the 2017 results, it felt like the right time to refresh our benchmarks, this time with real-time operational data.
It’s no secret that x264 and x265 have benefited greatly, as open-source projects, from having thousands of developers working on the code. This is what makes x264 and x265 a very high bar to beat. Yet even with so many bright and talented engineers donating tens of thousands of development hours to the code base, the architectural constraints of how these encoders were built limit the performance on multicore processors as you will see in the data below.
Creative solutions have been developed which enable live encoding workflows to be built using open-source. But it’s no secret that they come with inherent flaws that include being overly compute intensive and encumbered with quality issues as a result of not being able to encode a full ABR stack, or even a single 4K profile, on a single machine.
The reason this matters is because the resolutions that video services deliver continues to increase. And as a result of exploding consumer demand for video, data consumption is consuming network bandwidth to the point that Cisco reports in 2022, 82% of Internet traffic will be video.
Cisco says in their Visual Networking Index that by 2022 SD resolution video will comprise just 23.4% of Internet video traffic, compared to the 60.2% of Internet video traffic that SD video consumed in 2017. What use to represent the middle-quality tier, 480p (SD), has now become the lowest rung of the ABR ladder for many video distributors.
1080p (HD) will makeup 56.2% of Internet video traffic by 2022, an increase from 36.1% in 2017. And if you thought the resolution expansion was going to end with HD, Cisco is claiming in 2022, 4K (UHD) will comprise 20.3% of all Internet-delivered video.
Live video services are projected to expand 15x between 2017 and 2022, meaning within the next three years, 17.1% of all Internet video traffic will be comprised of live streams.
These trends demonstrate the industry’s need to prepare for this shift to higher resolution video and real-time delivery with software encoding solutions that can meet the requirement for live broadcast quality 4K.
Blazing Software Performance on Multicore Processors
The Beamr 5 software encoder utilizes an advanced thread management architecture. This represents a key aspect of how we can achieve such fantastic speed over x265 at the same quality level.
x265 works by creating software threads and adding them to a thread pool where each task must wait its turn. In contrast, Beamr 5 tracks all the serial dependencies involved with the video coding tasks it must perform, and it creates small micro-tasks which are efficiently distributed across all of the CPU cores in the system. This allows the Beamr codec to utilize each available core at almost 100% capacity.
All tasks added to the Beamr codec thread pool may be executed immediately so that no hardware thread is wasted on tasks where the data is not yet available. Interestingly, under certain conditions, x265 can appear to have higher CPU utilization. But, this utilization includes software threads which are not doing any useful work. This means they are “active” but not processing data that is required for the encoding process.
Adding to Beamr encoders thread efficiency, we have implemented patented algorithms for more effective and efficient video encoding, including a fast motion estimation process and a heuristic early-termination algorithm which enables the encoder to reach a targeted quality using fewer compute resources (cycles). Furthermore, Beamr encoders utilize the latest AVX-512 SIMD instruction set for squeezing even more performance out of advanced CPUs.
The end result of the numerous optimizations found in the Beamr 4 (AVC) and Beamr 5 (HEVC) software encoders is that they are able to operate nearly twice as fast as x264 and x265 at the same quality, and with similar settings and tools utilization.
Video streaming services can benefit from this performance advantage in many ways, such as higher density (more channels per server) which reduces operational costs. To illustrate what this performance advantage can do for you- consider that at the top-end, Beamr 5 is able to encode 4K, 10-bit video at 60 FPS in real-time using just 9 Intel Xeon Scalable cores where x265 is unable to achieve this level of performance with any number of computing cores (at least on a single machine). And, as a result of being twice as efficient, Beamr 4 and Beamr 5 can deliver higher quality at the same computing envelope as x264 and x265.
The Test Results
For our test to be as real-world as possible, we devised two methodologies. In the first, we measured the compute performance of an HEVC ABR stack operating both Beamr 5 and x265 at live speed. And for the second test, our team measured the number of simultaneous live streams at 1080p, comparing Beamr 4 with x264, and Beamr 5 with x265; and for 4K comparing Beamr 5 with x265. All tests were run on a single machine.
Live HEVC ABR Stack: Number of ABR Profiles (Channels)
This test was designed to find the maximum number of full ABR channels which can be encoded live by Beamr 5 and x265 on an AWS EC2 c5.24xlarge instance.
Each AVC channel comprises 4 layers of 8-bit 60 FPS video starting from 1080p, and the HEVC channel comprises either 4 layers of 10-bit 60 FPS video (starting from 1080p), or 5 layers of 10-bit 60 FPS video (starting from 4K).
Live HEVC ABR Stack Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Presets:
Beamr 5: INSANELY_FAST
x265: ultrafast
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all layers): 60 FPS
Encoded Bit Depth (all layers): 10-bit
Encoded Resolutions and Bitrates:
4Kp60@18000 Kbps (only in 4K ABR stack)
1080p60@3750 Kbps
720p60@2500 Kbps
576p60@1250 Kbps
360p@625 Kbps
Live HEVC ABR Stack Test – RESULTS
NOTES:
(1) When encoding 2 full ABR stacks with Beamr 5, 25% of the CPU is unused and available for other tasks.
(2) x265 cannot encode even a single 4K ABR stack channel at 60 FPS. The maximum FPS for the 4K layer of a single 4K ABR stack channel using x265 is 35 FPS.
Live AVC & HEVC Single-Resolution: Number of Channels (1080p & 4K)
In this test, we are trying to discover the maximum number of single-resolution 4K and HD channels that can be encoded live by Beamr 4 and Beamr 5 as compared with x264 and x265, on a c5.24xlarge instance. As with the Live ABR Channels test, the quality between the two encoders as measured by PSNR, SSIM and VMAF was always found to be equal, and in some cases better with Beamr 4 and Beamr 5 (see the “Quality Results” section below).
Live AVC Beamr 4 vs. x264 Channels Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Speeds / Presets:
Beamr 4: speed 3
x264: preset medium
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all channels): 60 FPS
Encoded Bit Depth (all channels): 8-bit
Channel Resolutions and Bitrates:
1080p60@5000 Kbps
Live AVC Beamr 4 vs. x264 Channels Test – RESULTS
Live HEVC Beamr 5 vs. x265 Channels Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Speeds / Presets:
Beamr 5: INSANELY_FAST
x265: ultrafast
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all channels): 60 FPS
Encoded Bit Depth (all channels): 10-bit
Channel Resolutions and Bitrates:
4K@18000 Kbps
1080p60@3750 Kbps
Live HEVC Beamr 5 vs. x265 Channels Test – RESULTS
NOTES:
(1) x265 was unable to reach 60 FPS for a single 4K channel, achieving just 35 FPS at comparable quality.
Quality Comparisons (PSNR, SSIM, VMAF)
Beamr 5 vs. x265
NOTES:
As previously referenced, x265 was unable to reach 4Kp60 and thus PSNR, SSIM, and VMAF scores could not be calculated, hence the ‘N/A’ designation in the 3840×2160 cells.
Video engineers are universally focused on the video encoding pillars of computing efficiency (performance), bitrate efficiency, and quality. Even as technology has enabled each of these pillars to advance with new tool sets, it’s well known that there is still a tradeoff between each that is required.
On one hand, bitrate efficiency requires tools that sap performance, and on the other hand, to reach a performance (speed) target, tools which could positively affect quality cannot be used without harming the performance characteristics of the encoding pipeline. As a result, many video encoding practitioners have adapted to the reality of these tradeoffs and simply accept them for what they are. Now, there is a solution…
The impact of adopting Beamr 4 for AVC and Beamr 5 for HEVC transcends a TCO calculation. With Beamr’s high-performance software encoders, services can achieve bitrate efficiency and performance, all without sacrificing video quality.
The use of Beamr 4 and Beamr 5 opens up improved UX with an increase in resolution or frame-rate which means it is now possible for everyone to stream higher quality video. As the competitive landscape for video delivery services continues to evolve, never has the need been greater for an AVC and HEVC codec implementation that can deliver the best of all three pillars: performance, bitrate efficiency, and quality. With the performance data presented above, it should be clear that Beamr 4 and Beamr 5 continue to be the codec implementations to beat.
For sports fans that grew up before the 90s, you likely have fond memories of collecting baseball memorabilia or going to your first game. To hear eSports fans rattle off statistics and kill ratios from their favorite Twitch stream or boasting about their latest Fortnite Skin may bring back memories of spilling over your favorite team’s stats in the newspaper on Sundays or trading cards with your friends. Today’s eSports fans may engage a little differently than the traditional sporting fans of yonder, but they may be the most engaged fans in history – and they’re making their mark.
With over 2.2M creators streaming on Amazon’s popular video game streaming service, Twitch, every month, 517M watching gaming on YouTube, and another 185M consuming their video content on Twitch, eSports viewership has surpassed HBO, Netflix, ESPN & Hulu, combined. The massive online gaming viewership is changing the sporting landscape and technology requirements for fans, content providers, and ISPs alike.
To meet online gamers demand for faster & higher quality gaming experiences, Cox Cable recently launched a trial of their Elite Gamer package. The Elite Gamer package is a premium offer that they claim will result in “34% less lag, 55% fewer ping spikes, and 45% less jitter” by speeding up the connection between the player and the desired gaming server.
We believe this marks one of the first in what will be standard practices for ISPs and content providers. When you factor in the massive amount of content delivered and consumed via Twitch & YouTube into perspective, it’s no wonder that vendors are starting to consider how they will address the bandwidth & technology requirements that are needed to maintain the eGaming industry. For the casual gamer, they are required to have a download speed of at least 3 Mbps, an upload speed of at least 1 Mbps, and a ping rate under 150 ms and those figures multiply with each concurrent player in your household.
At Beamr, we live and breathe optimization. For us, the quality and bandwidth challenges introduced by the gaming industry are an opportunity to see how far we can push the limits of balancing video compression with the highest video quality possible.
If you are passionate about gaming and are curious about what it takes to deliver a high-quality cloud gaming experience, you will enjoy this episode from our podcast, The Video Insiders, where we interviewed Yuval Noimark from Electronic Arts. Listen to Episode 15 here.
Game of Thrones entered its eighth and final season on April 14th, 2019. Though Game of Thrones has been a cultural phenomenon since the beginning of its airing, the attention and eyeballs on these final episodes have been higher than ever. Right now, every aspect of season eight is under a microscope and up for discussion, from the battle of Winterfell to theories on the Azor Ahai prophecy, super fans are taking to the internet and social media to debate and swap theories. Yet, even if you’ve installed the Chrome extension GameOfSpoilers to block Game of Thrones news from popular social networks, you probably did not miss all the fans who flocked to social media to report their dissatisfaction with the poor quality of Season 8 Episode 3, “The Longest Night.”
If nothing else that episode of #GameofThrones is proof positive that @Xfinity needs to start broadcasting in 4K pronto because the night scenes in that kind of looked like ass in their HBO broadcast. If I’m going to pay this much for cable I want better picture quality.
Though not all viewers experienced degraded visual quality for “The Longest Night”, a sufficiently high number did report a poor viewer experience, which triggered TechCrunch to write an article titled, “Why did last night’s ‘Game of Thrones’ look so bad?”
And TechCrunch wasn’t alone, The Verge also wrote an entire piece on how to setup your TV for a rewatch of “The Longest Night”, something that seems hardly possible. After all how is it possible that fans could need to rewatch an episode, not because the plot was so twisted or complicated that they needed a second pass at deciphering it, but because they couldn’t see what was happening on the screen? And in fact, Game of Thrones super fans were not shy in taking to Twitter with their quality assessments.
Why does this look so bad?
@HBO@GameOfThrones the picture quality tonight is absolutely terrible. All that time spent filming this beautiful episode and it looks like it’s running off a 1989 VHS tape…
Before you throw a Valyrian steel dagger at your TV, let’s take a close look at what happened to create this poor video quality by diving into the operational structure of the underlying video codecs that are used by all commercial streaming services.
The video compression schemes used in video streaming, including AVC which is also used by most PayTV cable and satellite distributors, utilize hybrid block-based video codecs. These codecs use block-based encoding methods which mean each video frame or picture is partitioned into blocks during the compression process, and they also apply motion estimation between frames. Though the effective compression that is made possible by these techniques is impressive, hybrid block-based compression schemes are inherently prone to creating blockiness and banding artifacts, which can be particularly evident in dark scenes.
Blockiness is a video artifact where areas of a video image appear to be comprised of small squares, rather than proper detail and smooth edges as the viewer would expect to see. The blockiness artifact happens when not enough detail is preserved in each of the coding blocks, resulting in inconsistencies between adjacent blocks and making each block appear separate from its neighbors. The video quality will suffer from blockiness when too much detail is lost within each block.
There are two main causes of blockiness. The first is when there is a mismatch between the content complexity and the target bitrate. It can be present either in highly complex content which is encoded at typical bitrates, or in standard content which is compressed to overly aggressive bitrates. Content providers can avoid this by using content adaptive solutions which match the encoder bitrate to the content complexity. The second cause of blockiness is from poor quality decisions made by the encoder, such as discarding information which is crucial for visual quality.
As noted by TechCrunch, for the specific Game of Thrones episode “The Longest Night”, the images are very dark and have a limited range of colors and brightness levels, basically between grey and dark grey. Encoding this limited range of grey shades, which filled up most of the screen, resulted in “banding” artifacts which is where there are visible transitions as a result of the video being represented by just a few shades of grey, which look like “stripes” instead of smooth gradients. Video suffers from banding when the color or brightness palette being used has too few values to accurately describe the shades present in part or all of the video frame.
The prevalent assumption even among some video engineers is that increasing bitrate is the cure-all to video quality problems. But as we’ll see in this case, it’s likely that even if the bitrate had been doubled, the systemic artifacts would still be present. Thus, the solution is not likely external to the video encoding process, but rather can only be addressed at the codec implementation level.
That is, the video encoding engine must be improved to prevent situations like this in the future. That, or HBO and other premium content owners could instruct their filmmakers to avoid dark scenes – We’ll stick with Option #1!
In this case, the video quality issues were not caused by the video encoding bitrate being too low. In fact, the bitrate used was more than sufficient to represent the limited range of colors and brightness. The issue was in the decisions made by the specific video encoder used by HBO. These are decisions regarding how to allocate the available bitrate, or how the bits should be used for different elements of the compressed video stream.
Without getting too deep into video compression technology, it is sufficient to say that a compressed video stream consists basically of two types of data.
The first type is prediction data which enables the creation of a prediction block from previously decoded pixels (in either the same or a reference frame). This prediction block acts as a rough estimate of the block and is complemented by the residual or error block, encoded in the stream. This is essentially a block that fills in the difference between the predicted block and the actual source video block.
The second type of data that is key to how a block-based video encoder works can be found in the rate-distortion algorithm which optimizes the selection of the prediction modes which the prediction data represents. This determines the level of detail to preserve in the residual block. The decisions are made in an attempt to minimize distortion, or maximize quality, for a specific bit allocation which is derived from the target bit-rate.
When a scene is very dark and consists of small variations of pixel colors, the numerical estimates of distortion may be skewed. Components of the video encoder including motion estimation and the rate-distortion algorithm should adapt to optimize the allocations and decisions for this particular use case.
For example, the motion estimation might classify the differences in pixel values as noise instead of actual motion, thus providing inferior prediction information. In another example, if the distortion measures are not tuned correctly, the residual may be considered noise rather than true pixel information and may be discarded or aggressively quantized.
Another common encoder technique that is affected and often “fooled” by very dark scenes is early terminations. Many encoders use this technique to “guess” what would be the best encoding decision they should make, instead of making an exhaustive search of all possibilities, and computing their cost. This technique improves the performance of the video encoder, but in the case of dark scenes with small variations, it can cause the encoder to make the wrong decision.
Some encoding engineers use a technique called “capped CRF” for encoding video at a constant quality instead of a pre-defined bitrate. This is a simple form of “content-adaptive” or “content-aware” encoding, which produces different bitrates for each video clip or scene based on its content. In some implementations, when this technique is used for dark scenes, it can also be “fooled” by the limited range of color and brightness values and may perform very aggressive quantization thus removing too much information form the residual blocks, resulting in these blockiness and banding artifacts.
In summary, we can conclude that dark scenes can lead to various encoding issues if the encoder is not “well prepared” for this type of content, and it seems that this is what happened with this Game of Thrones episode.
In order to ensure good quality video across different content types, the encoder must be able to correctly adapt to each and every frame being encoded. In Beamr encoders we tackle this with a combination of tools and algorithms to provide the best video quality possible.
Beamr encoders use unique, patented and patent-pending approaches, to calculate psycho-visual distortions to be used by the rate control module when deciding on prediction modes and on the bit allocations for different components of the compressed video data. This means that the actual visual impact of different decisions is taken into account, resulting in the improved visual quality of the content across a variety of content types.
Beamr encoders offer a wide variety of encoding speeds for different use cases, ranging from lightning fast to enable a full 4K ABR stack to be generated on a single server for live or VOD. When airing premium content of the caliber of Game of Thrones, one should opt for the maximum quality by using slower encoder speeds.
In these speeds, the encoder is wary of invoking early termination methods and thus does not overlook the data that may be hiding in the small deviations of the dark pixel values. We invest a huge effort to discover the optimal combinations for all the internal controls of the algorithm such as the most optimum lambda values for the different rate-distortion cost calculations, optimal values for the deblocking filter (and SAO in the case of Beamr 5 for HEVC) , and many other details – none of which are overlooked.
Rather than use a CRF like based approach for constant quality encoding, Beamr employs a sophisticated technique for content-adaptive encoding called CABR. The content-adaptive bitrate mode of encoding operates in a closed loop and examines the quality of each frame using a patented perceptual quality measure. Our perceptual quality measure is also specifically tuned to adapt to the “darkness” of the frame and each region within the frame, which makes it highly effective even when processing very dark scenes such as the “The Longest Night”, or fade-in and fade-out sequences.
Looking to the Future
For content providers, viewer expectations and demands for quality will continue to rise each year. A decade ago, you could slide by not delivering a consistent experience across devices. Today, not only is video degradation noticed by your viewers, it can have a massive impact on your audience and churn if you’re not delivering an experience inline with their quality expectations.
To see what high quality at the lowest bitrate should look like, try Beamr Transcoder for free or contact our team by sending an email to sales@beamr.com to learn about our comparison tool Beamr View.
Today, video streaming services must offer solutions that can evolve as the demand for content availability across a wide range of devices increases. Though new innovations in display and capture technology are making headlines, the core pillars that differentiate every service are still video quality and user experience.
Beamr’s HEVC & H.264 codecs having been engineered to reduce the bitrate of video files and streams while maintaining the perceptual quality of the content, ultimately reducing the bandwidth required to stream video to every viewer’s device while offering the best visual quality possible.
Leveraging our 44 granted patents, the content-adaptive bitrate (CABR) technology enables a 20 to 40 percent (sometimes higher) reduction in bitrate without any degradation to the video.
We know that this sounds too good to be true, which is why we are providing a real example, including download links to the original files, and special free access to Beamr View so you can test yourself.
In this example, we are comparing an HEVC VBR encode with an HEVC CABR version.
We took the Test File and created an original using Beamr 5x VBR rate control, then compared it with the file on the right which was encoded with CABR. This dropped the bitrate from 3.09 Mbps to 1.44 Mbps.
Would you like to test the results yourself?
To see the results, follow one of these testing methods:
1. Download the pre-encoded files and then use Beamr View to compare the visual quality of the HEVC VBR file against the file encoded with HEVC CABR.
2. Download the Test File and run your own encodes of the Test File using Beamr Transcoder. Compare the visual quality of your results by comparing the HEVC CABR with the HEVC VBR encoded file using Beamr View.
When is comes to assessing and comparing video quality, do you know what to look for? Our team of image scientists have put together the following tips for you to use during your tests.
Quality is in the eye of the beholder
When a user is comparing visual quality, the best measurement tool is the human eye. Visual quality is a subjective measure, meaning that image scientists and video engineers must rely on physically looking at an encoded file to determine whether the visual quality is better or worse than the comparison. To go off of quality metrics such as PSNR and SSIM alone isn’t enough because even if a video has the highest possible PSNR or SSIM score, it may not have the highest visual quality at a given bitrate.
Speed, bitrate, and rate control
There are multiple methods to encode video and blocks can be encoded in various ways for speed, bitrate, and quality. In order to validate your test, you must configure both encoders to operate at similar speeds in order to assess whether the bitrate-to-quality tradeoff is favorable or not. To take it a step further, leveraging rate controls enables the user to maintain bitrate limits throughout a clip or video to replicate the needs of a scalable application.
Comparing moving video instead of still frames
The only way to effectively assess the quality of video is to comparing moving video instead of still frames. In order to accurately compare the visual quality of two decoded frames, artifacts, motion inaccuracy, and other visual degradation must be assessed while the content is moving.