Adding Beamr’s Frame-Level Content-Adaptive Rate Control to the AV1 Encoder

Posted on March 21, 2021by Dror Gill

Introduction

AV1, the open source video codec developed by the Alliance for Open Media, is the most efficient open-source encoder available today. AV1’s compression efficiency has been found to be 30% better than VP9, the previous generation open source codec, meaning that AV1 can reach the same quality as VP9 with 30% less bits. Having an efficient codec is especially important now that video consumes over 80% of Internet bandwidth, and the usage of video for both entertainment and business applications is soaring due to social distancing measures.

Beamr’s Emmy® award-winning CABR technology reduces video bitrates by up to 50% while preserving perceptual quality. The technology creates fully-compliant standard video streams, which don’t require any proprietary decoder or add-on on the playback side. We have applied our CABR technology in the past to H.264, HEVC and VP9 codecs, using both software and hardware encoder implementations.

In this blog post we present the results of applying Beamr’s CABR technology to the AV1 codec, by integrating our CABR library with the libaom open source implementation of AV1. This integration results in a further 25-40% reduction in the bitrate of encoded streams, without any visible reduction in subjective quality. The reduced-bitrate streams are of course fully AV1 compatible, and can be viewed with any standard AV1 player.

CABR In Action

Beamr’s CABR (Content Adaptive BitRate) technology is based on our BQM (Beamr Quality Measure) metric, which was developed over 10 years of intensive research, and features very high correlation with subjective quality as judged by humans. BQM is backed by 37 granted patents, and has recently won the 2021 Technology and Engineering Emmy® award from the National Academy of Television Arts & Sciences.

Beamr’s CABR technology and the BQM quality measure can be integrated with any software or hardware video encoder, to create more bitrate-efficient encodes without sacrificing perceptual quality. In the integrated solution, the video encoder encodes each frame with additional compression levels, also known as QP values. The first QP (for the initial encode) is determined by the encoder’s own rate control mechanism, which can be either VBR, CRF or fixed QP. The other QPs (for the candidate encodes) are provided by the CABR library. The BQM quality measure then compares the quality of the initial encoded frame to the quality of the candidate encoded frames, and selects the encoded frame which has the smallest size in bits, but is still perceptually identical to the initial encoded frame. Finally, the selected frame is written to the output stream. Due to our adaptive method of searching for candidate QPs, in most cases a single candidate encode is sufficient to find a near-optimal frame, so the performance penalty is quite manageable.

*Integrating Beamr’s CABR module with a video encoder*

By applying this process to each and every video frame, the CABR mechanism ensures that each frame fully retains the subjective quality of the initial encode, while bitrate is reduced by up to 50% compared to encoding the videos using the encoders’ regular rate control mechanism.

Beamr’s CABR rate control library is integrated into Beamr 4 and Beamr 5, our software H.264 and HEVC encoder SDKs, and is also available as a standalone library that can be integrated with any software or hardware encoder. Beamr is now implementing BQM in silicon hardware, enabling massive scale content-adaptive encoding of user-generated content, surveillance videos and cloud gaming streams.

CABR Integration with libaom

When we approached the task of integrating our CABR technology with an AV1 encoder, we examined several available open source implementations of AV1, and eventually decided to integrate with libaom, the reference open source implementation of the AV1 encoder, developed by the members of the Alliance of Open Media. libaom was selected due to a good quality-speed tradeoff at the higher quality working points, and a well defined frame encode interface which made the integration more straightforward.

To apply CABR technology to any encoder, the encoder should be able to re-encode the same input frame with different QPs, a process that we call “roll-back”. Fortunately, the libaom AV1 encoder already includes a re-encode loop, designed for the purpose of meeting bitrate constraints. We were able to utilize this mechanism to enable the frame re-encode process needed for CABR.

Another important aspect of CABR integration is that although CABR reduces the actual bitrate relative to the requested “target” bitrate, we need the encoder’s rate control to believe that the target bitrate has actually been reached. Otherwise, it will try to compensate for the bits saved by CABR, by increasing bit allocation in subsequent frames, and this will undermine the process of CABR’s bitrate reduction. Therefore, we have modified the VBR rate-control feedback, reporting the bit-consumption of the initial encode back to the RC module, instead of the actual bit consumption of the selected output frame.

An additional point of integration between an encoder and the CABR library is that CABR uses “complexity” data from the encoder when calculating the BQM metric. The complexity data is based on the per-block QP and bit consumption reported by the encoder. In order to expose this information, we added code that extracts the QP and bit consumption per block, and sends it to the CABR library.

The current integration of CABR with libaom supports 8 bit encoding, in both fixed QP and single pass VBR modes. 10-bit encoding (including HDR) and dual-pass VBR encoding are already supported with CABR in our own H.264 and HEVC encoders, and can be easily added to our libaom integration as well.

Integration Challenges

Every integration has its challenges, and indeed we encountered several of them while integrating CABR with libaom. For example, the re-encode loop in libaom initiates prior to the deblocking and other loop-filters, so the frame it generates is not the final reconstructed frame. To overcome this issue, we moved the in-loop filters and applied them prior to evaluating the candidate frame quality.

Another challenge we encountered was that the CABR complexity data is based on the QP values and bit consumption per 16×16 block, while within the libaom encoder this information is only available for bigger blocks. To resolve this, we had to process the actual data in order to generate the QP and bit consumption at the required resolution.

The concept of non-display frames, which is unique to VP9 and AV1, also posed a challenge to our integration efforts. The reason is that CABR only compares quality for frames that are actually displayed to the end user. So we had to take this into account when computing the BQM quality measure and calculating the bits per frame.

Finally, while the QP range in H.264 and HEVC is between 0 and 51, in AV1 it is between 0 and 255. We have an algorithm in CABR called “QP Search” which finds the best candidate QPs for each frame, and it was tuned for the QP range of 0-51, since it was originally developed for H.264 and HEVC encoders. We addressed this discrepancy by performing a simple mapping of values, but in the future we may perform some additional fine tuning of the QP Search algorithm in order to better utilize the increased dynamic range.

Benchmarking Process

To evaluate the results of Beamr’s CABR integration with the libaom AV1 encoder, we selected 20 clips from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license. The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post.

We encoded the selected video clips with libaomx, our version of libaom integrated with the CABR library. The videos were encoded using libaom cpu-used=9, which is the fastest speed available in libaom, and therefore the most practical in terms of encoding time. We believe that using lower speeds, which provide improved encoding quality, can result in even higher savings.

Each clip was encoded twice: once using the regular VBR rate control without the CABR library, and a second time using the CABR rate control mode. In both cases, we used 3 target bitrates for each resolution: A high, medium and low bitrate, as specified in the table below.

*Target bitrates used in the CABR-AV1 benchmark*

Below is the command line we used to encode the files.

aomencx --cabr=<0 or 1> -w <width> -h <height> --fps=<fps>/1 --disable-kf --end-usage=vbr --target-bitrate=<bitrate in kbps> --cpu-used=9 -p 1 -o <outfile>.ivf <inputFIFO>.yuv

After we completed the encodes in both rate control modes, we compared the bitrate and subjective quality of both encodes. We calculated the % of difference in bitrate between the regular VBR encode and the CABR encode, and visually compared the quality of the clips to determine whether both encodes are perceptually identical to each other when viewed side by side in motion.

Benchmark Results

The table below shows the VBR and CABR bitrates for each file, and the savings obtained, which is calculated as (VBR bitrate – CABR bitrate) / VBR bitrate. As expected, the savings are higher for high bitrate clips, but still significant even for the lowest bitrates we used. Average savings are 26% for the low bitrates, 33% for the medium bitrates, and 40% for the high bitrates.

Note that savings differ significantly across different clips, even when they are encoded at the same resolution and target bitrate. For example, if you look at 1080p clips encoded to the lowest bitrate target (2 Mbps), you will find that some clips have very low savings (less than 3%), while other clips have very high savings (over 60%). This shows the content-adaptive nature of our technology, which is always committed to quality, and reduces the bitrate only in clips and frames where such reduction does not compromise quality.

Also note that the VBR bitrate may differ from the target bitrate. The reason is that the rate control does not always converge to the target bitrate, due to the short length of the clips. But in any case, the savings were calculated between the VBR bitrate and the CABR bitrate.

In addition to calculating the bitrate savings, we also performed subjective quality testing by viewing the videos side by side, using the YUView player software. In these viewings we verified that indeed for all clips, the VBR and CABR encodes are perceptually identical when viewed in motion at 100% zoom. Below are a few screenshots from these side-by-side viewings.

Conclusions

In this blog post we presented the results of integrating Beamr’s Content Adaptive BitRate (CABR) technology with the libaom implementation of the AV1 encoder. Even though AV1 is the most efficient open source encoder available, using CABR technology can reduce AV1 bitrates by a further 25-40% without compromising perceptual quality. The reduced bitrate can provide significant savings in storage and delivery costs, and enable reaching wider audiences with high-quality, high-resolution video content.

Appendix

The VBR and CABR encoded files can be found here.
The source files can be downloaded directly from the YouTube UGC Dataset, using the links below.

720P/Animation_720P-620f.mkv

1080P/Animation_1080P-3dbf.mkv

1080P/Gaming_1080P-6dc6.mkv

720P/HowTo_720P-37d0.mkv

1080P/HowTo_1080P-64f7.mkv

1080P/Lecture_1080P-0c8a.mkv

720P/LiveMusic_720P-66df.mkv

1080P/LiveMusic_1080P-14af.mkv

720P/NewsClip_720P-7745.mkv

720P/NewsClip_720P-6016.mkv

1080P/NewsClip_1080P-5b53.mkv

720P/Sports_720P-5bfd.mkv

720P/Sports_720P-531c.mkv

1080P/Sports_1080P-15d1.mkv

2160P/Sports_2160P-1b70.mkv

1080P/TelevisionClip_1080P-39e3.mkv

1080P/TelevisionClip_1080P-5e68.mkv

1080P/TelevisionClip_1080P-68c6.mkv

720P/VerticalVideo_720P-4ca7.mkv

1080P/VerticalVideo_1080P-3a9b.mkv

Uncategorized

Optimizing Bitrates of User-generated Videos with Beamr CABR

Posted on July 12, 2020by Dror Gill

Introduction

The attention of Internet users, especially the younger generation, is shifting from professionally-produced entertainment content to user-generated videos and live streams on YouTube, Facebook, Instagram and most recently TikTok. On YouTube, creators upload 500 hours of video every minute, and users watch 1 billion hours of video every day. Storing and delivering this vast amount of content creates significant challenges to operators of user-generated content services. Beamr’s CABR (Content Adaptive BitRate) technology reduces video bitrates by up to 50% compared to regular encodes, while preserving perceptual quality and creating fully-compliant standard video streams that don’t require any proprietary decoder on the playback side. CABR technology can be applied to any existing or future block-based video codec, including AVC, HEVC, VP9, AV1, EVC and VVC.

In this blog post we present the results of a UGC encoding test, where we selected a sample database of videos from YouTube’s UGC dataset, and encoded them both with regular encoding and with CABR technology applied. We compare the bitrates, subjective and objective quality of the encoded streams, and demonstrate the benefits of applying CABR-based encoding to user-generated content.

Beamr CABR Technology

At the heart of Beamr’s CABR (Content-Adaptive BitRate) technology is a patented perceptual quality measure, developed during 10 years of intensive research, which features very high correlation with human (subjective) quality assessment. This correlation has been proven in user testing according to the strict requirements of the ITU BT.500 standard for image quality testing. For more information on Beamr’s quality measure, see our quality measure blog post.

When encoding a frame, Beamr’s encoder first applies a regular rate control mechanism to determine the compression level, which results in an initial encoded frame. Then, the Beamr encoder creates additional candidate encoded frames, each one with a different level of compression, and compares each candidate to the initial encoded frame using the Beamr perceptual quality measure. The candidate frame which has the lowest bitrate, but still meets the quality criteria of being perceptually identical to the initial frame, is selected and written to the output stream.

This process repeats for each video frame, thus ensuring that each frame is encoded to the lowest bitrate, while fully retaining the subjective quality of the target encode. Beamr’s CABR technology results in video streams that are up to 50% lower in bitrate than regular encodes, while retaining the same quality as the full bitrate encodes. The amount of CPU cycles required to produce the CABR encodes is only 20% higher than regular encodes, and the resulting streams are identical to regular encodes in every way except their lower bitrate. CABR technology can also be implemented in silicon for high-volume video encoding use cases such as UGC video clips, live surveillance cameras etc.

For more information about Beamr’s CABR technology, see our CABR Deep Dive blog post.

CABR for UGC

Beamr’s CABR technology is especially suited for User-Generated Content (UGC), due to the high diversity and variability of such content. UGC content is captured on different types of devices, ranging from low-end cellular phones to high-end professional cameras and editing software. The content itself varies from “talking head” selfie videos, to instructional videos shot in a home or classroom, to sporting events and even rock band performances with extreme lighting effects.

Encoding UGC content with a fixed bitrate means that such a bitrate might be too low for “difficult” content, resulting in degraded quality, while it may be too high for “easy” content, resulting in wasted bandwidth. Therefore, content-adaptive encoding is required to ensure that the optimal bitrate is applied to each UGC video clip.

Some UGC services use the Constant Rate Factor (CRF) rate control mode of the open-source x264 video encoder for processing UGC content, in order to ensure a constant quality level while varying the actual bitrate according to the content. However, CRF bases its compression level decisions on heuristics of the input stream, and not on a true perceptual quality measure that compares candidate encodes of a frame. Therefore, even CRF encodes waste bits that are unnecessary for a good viewing experience. Beamr’s CABR technology, which is content-adaptive at the frame level, is perfectly suited to remove these remaining redundancies, and create encodes that are smaller than CRF-based encodes but have the same perceptual quality.

Evaluation Methodology

To evaluate the results of Beamr’s CABR algorithm on UGC content, we used samples from the YouTube UGC Dataset. This is a set of user-generated videos uploaded to YouTube, and distributed under the Creative Commons license, which was created to assist in video compression and quality assessment research. The dataset includes around 1500 source video clips (raw video), with a duration of 20 seconds each. The resolution of the clips ranges from 360p to 4K, and they are divided into 15 different categories such as animation, gaming, how-to, music videos, news, sports, etc.

To create the database used for our evaluation, we randomly selected one clip in each resolution from each category, resulting in a total of 67 different clips (note that not all categories in the YouTube UGC set have clips in all resolutions). The list of the selected source clips, including links to download them from the YouTube UGC Dataset website, can be found at the end of this post. As typical user-generated videos, many of the videos suffer from perceptual quality issues in the source, such as blockiness, banding, blurriness, noise, jerky camera movements, etc. which makes them specifically difficult to encode using standard video compression techniques.

We encoded the selected video clips using Beamr 4x, Beamr’s H.264 software encoder library, version 5.4. The videos were encoded using speed 3, which is typically used to encode VoD files in high quality. Two rate control modes were used for encoding: The first is CSQ mode, which is similar to x264 CRF mode – this mode aims to provide a Constant Subjective Quality level, and varies the encoded bitrate based on the content to reach that quality level. The second is CSQ-CABR mode, which creates an initial (reference) encode in CSQ mode, and then applies Beamr’s CABR technology to create a reduced-bitrate encode which has the same perceptual quality as the target CSQ encode. In both cases, we used a range of six CSQ values equally spaced from 16 to 31, representing a wide range of subjective video qualities.

After we completed the encodes in both rate control modes, we compared three attributes of the CSQ encodes to the CSQ-CABR encodes:

File Size – to determine the amount of bitrate savings achievable by the CABR-CSQ rate control mode
BD-Rate – to determine how the two rate control modes compare in terms of the objective quality measures PSNR, SSIM and VMAF, computed between each encode and the source (uncompressed) video
Subjective quality – to determine whether the CSQ encode and the CABR-CSQ encode are perceptually identical to each other when viewed side by side in motion.

Results

The table below shows the bitrate savings of CABR-CSQ vs. CSQ for various values of the CSQ parameter. As expected, the savings are higher for low CSQ values, which correlate with higher subjective quality and higher bitrates. As the CSQ increases, quality decreases, bitrate decreases, and the savings of the CABR-CSQ algorithm are decreased as well.

The overall average savings across all clips and all CSQ values is close to 26%. If we average the savings only for the high CSQ values (16-22), which correspond to high quality levels, the average savings are close to 32%. Obviously, saving one quarter or one third of the storage cost, and moreover the CDN delivery cost, can be very significant for UGC service providers.

Another interesting analysis would be to look at how the savings are distributed across specific UGC genres. Table 2 shows the average savings for each of the 15 content categories available on the YouTube UGC Dataset.

As we can see, simple content such as lyric videos and “how to” videos (where the camera is typically fixed) get relatively higher savings, while more complex content such as gaming (which has a lot of detail) and live music (with many lights, flashes and motion) get lower savings. However, it should be noted that due to the relatively low number of selected clips from each genre (one in each resolution, for a total of 2-5 clips per genre), we cannot draw any firm conclusions from the above table regarding the expected savings for each genre.

Next, we compared the objective quality metrics PSNR, SSIM and VMAF for the CSQ encodes and the CABR-CSQ encodes, by creating a BD-Rate graph for each clip. To create the graph, we computed each metric between the encodes at each CSQ value and the source files, resulting in 6 points for CSQ and 6 points for CABR-CSQ (corresponding to the 6 CSQ values used in both encodes). Below is an example of the VMAF BD-Rate graph comparing CSQ with CABR-CSQ for one of clips in the lyric video category.

*Figure 1: CSQ vs. CSQ-CABR VMAF scores for the 1920×1080 LyricVIdeo file*

As we can see, the BD-Rate curve of the CABR-CSQ graph follows the CSQ curve, but each CSQ point on the original graph is moved down and to the left. If we compare, for example, the CSQ 19 point to the CABR-CSQ 19 point, we find that CSQ 19 has a bitrate of around 8 Mbps and a VMAF score of 95, while the CABR-CSQ 19 point has a bitrate of around 4 Mbps, and a VMAF score of 91. However, when both of these files are played side-by-side, we can see that they are perceptually identical to each other (see screenshot from the Beamr View side by side player below). Therefore, the CABR-CSQ 19 encode can be used as a lower-bitrate proxy for the CSQ 19 encode.

*Figure 2: Side-by-side comparison in Beamr View of CSQ 19 vs. CSQ-CABR 19 encode for the 1920×1080 LyricVIdeo file*

Finally, to verify that the CSQ and CABR-CSQ encodes are indeed perceptually identical, we performed subjective quality testing using the Beamr VISTA application. Beamr VISTA enables visually comparing pairs of video sequences played synchronously side by side, with a user interface for indicating the relative subjective quality of the two video sequences (for more information on Beamr VISTA, listen to episode 34 of The Video Insiders podcast). The set of target comparison pairs comprised 78 pairs of 10 second segments of Beamr4x CSQ encodes vs. corresponding Beamr4x CABR-CSQ encodes. 30 test rounds were performed, resulting in 464 valid target pair views (e.g. by users who correctly recognized mildly distorted control pairs), or on average 6 views per pair. The results show that on average, close to 50% of the users selected CABR-CSQ as having lower quality, while a similar percentage of users selected CSQ as having lower quality, therefore we can conclude that the two encodes are perceptually identical with a statistical significance exceeding 95%.

*Figure 3:* Percentage of users who selected CABR-CSQ as having lower quality per file

Conclusions

In this blog post we presented the results of applying Beamr’s Content Adaptive BitRate (CABR) encoding to a random selection of user-generated clips taken from the YouTube UGC Dataset, across a range of quality (CSQ) values. The CABR encodes had 25% lower bitrate on average than regular encodes, and at high quality values, 32% lower bitrate on average. The Rate-Distortion graph is unaffected by applying CABR technology, and the subjective quality of the CABR encodes is the same as the subjective quality of the regular encodes. By shaving off a quarter of the video bitrate, significant storage and delivery cost savings can be achieved, and the strain on today’s bandwidth-constrained networks can be relieved, for the benefit of all netizens.