Every video encoding professional faces the dilemma of how best to detect artifacts and measure video quality. If you have the luxury of dealing with high bitrate files then this becomes less of an “issue” since for many videos throwing enough bits at the problem means an acceptably high video quality is nearly guaranteed. However, for those living in the real world where 3 Mbps is the average bitrate that they must target, then compressing at scale requires metrics (algorithms) to help measure and analyze the visual artifacts in a file after encoding. This process is becoming even more sophisticated as some tools enable a quality measure to feed back into the encoding decision matrix, but more commonly quality measures are used as a part of the QC step. For this post we are going to focus on the application of quality measures used as part of the encoding process.
There are two common quality measures, PSNR and SSIM that we will discuss, but as you will see there is a third one and that is the Beamr quality measure that the bulk of this article will focus on.
PSNR, the original objective quality measure
PSNR, peak signal-to-noise ratio represents the ratio between the highest power of an original signal and the power level of the distortion. PSNR is one of the original engineering metrics that is used to measure the quality of image and video codecs. When comparing or measuring the quantitative quality of two files such as an original and a compressed version, PSNR attempts to approximate the difference between the compressed and the original. A significant shortcoming is that PSNR may indicate that the reconstruction is of suitably high quality when in some cases it is not. For this reason a user must be careful to not hold the results in high regard.
What is SSIM?
SSIM or the structured similarity index is a technique to predict the perceived quality of digital images and videos. The initial version was developed at the University of Texas at Austin while the full SSIM routine was developed jointly at New York University’s Laboratory for Computational Vision. SSIM is a perceptual model based algorithm that takes into account image degradation as a perceived shift in structural information, while including crucial perceptual detail, such as luminance and contrast masking. The difference compared with other techniques like PSNR is that this approach attempts to estimate absolute errors.
The basis of SSIM is the assumption that pixels have strong inter-dependencies and these dependencies contain needed information about the structure of the object in the scene, GOP or adjacent frames. Put simply, structured similarity is used for computing the similarity of two images. SSIM is a full reference metric where the computation and measurement of image quality is based on an uncompressed image as a reference. SSIM was developed as a step up over traditional methods such as PSNR (peak signal-to-noise ratio) which has proven to be uncorrelated with human vision. Yet, unfortunately SSIM itself is not perfect and can be easily fooled as shown by the following graphic which illustrates that though the original and compressed are closely correlated visually, PSNR and SSIM scored them as being not similar. Meanwhile, Beamr and MOS (mean opinion score), show them as being closely correlated.
Beamr Quality Measure
The Beamr quality measure is based on a proprietary, low complexity, reliable, perceptually aligned quality measure. The existence of this measure enables controlling a video encoder, to obtain an output clip with (near) maximal compression of the video input, while still maintaining the input video resolution, format and visual quality (PQ). This is performed by controlling the compression level of each frame, or GOP, in the video sequence, in such a way that is as deeply compressed as it can be, while still resulting in a perceptually identical output.
The Beamr quality measure is also a full-reference measure, i.e. it indicates a quality of a recompressed image or video frame when compared to a reference or original image or video frame, which is in accordance with the challenges our technology aims to tackle such as reducing bitrates to the maximum extent possible without imposing any quality degradation from the original. (as perceived by the human visual system). The Beamr quality measure calculation consists of two parts: A pre-process of the input video frames in order to obtain various score configuration parameters, and an actual score calculation done per candidate recompressed frame. Following is a system diagram of how the Beamr quality measure would interact with an encoder.
Application of the Beamr Quality Measure in an Encoder
The Beamr quality measure when integrated with an encoder enables the bitrate of video files to be reduced by up to an additional 50% over the current state of the art standard compliant block based encoders, without compromising image quality or changing the artistic intent. If you view a source video and a Beamr-optimized video side by side, they will look exactly the same to the human eye.
A question we get asked frequently is “How do you perform the “magic” of removing bits with no visual impact?”
Well, believe it or not there is no magic here, just solid technology that has been actively in development since 2009, and is now covered by 26 granted patents and over 30 additional patent applications.
When we first approached the task of reducing video bitrates based on the needs of the content and not a rudimentary bitrate control mechanism, we asked ourselves a simple starting question, “Given that the video file has already been compressed, how many additional bits can the encoder remove before the typical viewer would notice?”
There is a simple manual method of answering this question, just take a typical viewer, show them the source video and the processed video side by side, and then start turning down the bitrate knob on the processed video, by gradually increasing the compression. And at some point, the user will say “Stop! Now I can see the videos are no longer the same!”
At that point, turn the compression knob slightly backwards, and there you have it – a video clip that has an acceptably lower bitrate than the source, and just at the point before the average user can notice the visual differences.
Of course I recognize what you are likely thinking, “Yes, this solution clearly works, but it doesn’t scale!” and you are correct. Unfortunately many academic solutions suffer from this problem. They make for good hand built demos in carefully controlled environments with hand picked content, but put them out in the “wild” and they fall down almost immediately. And I won’t even go into the issues of varying perception among viewers of different ages, or across multiple viewing conditions.
Another problem with such a solution is that different parts of the videos, such as different scenes and frames, require different bitrates. So the question is, how do you continually adjust the bitrate throughout the video clip, all the time confirming with your test viewer that the quality is still acceptable? Clearly this is not feasible.
Automation to the rescue.
Today, it seems the entire world is being infected with artificial intelligence which in many cases is not much more than automation that is smart and able to adapt to its environment. So we too looked for a way to automate this image analysis process. That is take a source video, and discover a way to reduce the “non-visible” bits in a fully automatic manner, with no human intervention involved. A suitable solution would enable the bitrate to vary continuously throughout the video clip based on the needs of the content at that moment.
What is CABR?
You’ve heard of VBR or variable bitrate, Beamr has coined the term CABR or content-adaptive bitrate to summarize the process just described where the encoder is adjusted at the frame level based on quality requirements, rather than relying only on a bit budget to make decisions of where bits are applied and the number needed. But we understood that in order to accomplish the vision of CABR, we would need to be able to simulate perception of a human viewer.
We needed an algorithm that would answer the question, “Given two videos, can a human viewer tell them apart?” This algorithm is called a Perceptual Quality Measure and it is the very essence of what sets Beamr so far apart from every other encoding solution in the market today.
A quality measure is a mathematical formula, which tries to quantify the differences between two video frames. To implement our video optimization technology, we could have used one of the well-known quality measures, such as PSNR (Peak Signal to Noise Ratio) or SSIM (Structural SIMilarity). But as already discussed, the problem with these existing quality measures is that they are simply not reliable enough as they do not correlate highly enough with human vision.
There are other sophisticated quality measures which correlate highly enough with human viewer opinions to be useful, but since they require extensive CPU power they cannot be utilized in an encoding optimization process, which requires computing the quality measures several times for each input frame.
Beamr Quality Measure
With the constraints of objective quality measures we had no choice but to develop our own quality measure, and we developed it with a very focused goal: To identify and quantify the specific artifacts created by block-based compression methods.
All of the current image and video compression standards, including JPEG, MPEG-1, MPEG-2, H.264 (AVC) and H.265 (HEVC) are built upon block based principles.
They divide an image into blocks, attempt to predict the block from previously encoded pixels, and then transform the block into the frequency domain, and quantize it.
All of these steps create specific artifacts, which the Beamr quality measure is trained to detect and measure. So instead of looking for general deformations, such as out of focus images, missing pixels etc. which is what the more general quality measures do, we look exactly for what we need: Artifacts that were created by the video encoder.
This means that our quality measure is tightly focused and extremely efficient, and as a result, the CPU requirements of our quality measure are much lower than quality measures that try to model the Human Visual System (HVS).
Beamr Quality Measure and the Human Visual System
After years of developing our quality measure, we put it to the test, under the strict requirements of ITU BT-500, which is an international standard for testing image quality. We were happy to find that the correlation of our quality measure with subjective (human) results was extremely high.
When the testing was complete, we felt certain this revolutionary quality measure was ready for the task of accurately comparing two images for similarity, from a human point of view.
But compression artifacts are only part of the secret. When a human looks at an image or video, the eye and the brain are drawn to particular places in the scene, for example, places where there is movement, and in fact we are especially “tuned” to capture details in faces.
Since our attention is focused on these areas, artifacts are more disturbing than the same artifacts in other areas of the image, such as background regions or out-of-focus areas. For this reason the Beamr quality measure takes this into account, and it ensures that when we measure quality proper attention is given to the areas that require it.
Furthermore, the Beamr quality measure takes into account temporal artifacts, introduced by the encoder, because it is not sufficient to ensure that each frame is not degraded, it is also necessary to preserve the quality and feel of the video’s temporal flow.
And that’s the magic of Beamr.
With the acquisition last year of Vanguard Video many industry observers have gone public with the idea that the combination of our highly innovative quality measure tightly integrated with the world’s best encoder, could lead to a real shake up of the ecosystem.
As the saying goes, “we can neither confirm or deny this idea”, but since you’ve taken the time to read this entire article, it means you are not average and are looking for every advantage in the way of bitrate and quality that you can give your encoding operations. At NAB 2017 you will see a product that is quite simply going to reset the state of the art for HEVC and H.264 video encoding. We encourage you to see for yourself what is possible when the world’s most advanced perceptual quality measure becomes the rate-control mechanism for the industry’s best quality software encoder.