This post was originally featured at 

In this post, we attempt to de-mystify the topic of perceptual video quality, which is the foundation of Beamr’s content adaptive encoding and content adaptive optimization solutions. 

National Geographic has a hit TV franchise on its hands. It’s called Brain Games starring Jason Silva, a talent described as “a Timothy Leary of the viral video age” by the Atlantic. Brain Games is accessible, fun and accurate. It’s a dive into brain science that relies on well-produced demonstrations of illusions and puzzles to showcase the power — and limitation — of the human brain. It’s compelling TV that illuminates how we perceive the world.(Intrigued? Watch the first minute of this clip featuring Charlie Rose, Silva, and excerpts from the show: )

At Beamr, we’re passionate about the topic of perceptual quality. In fact, we are so passionate, that we built an entire company based on it. Our technology leverages science’s knowledge about the human vision system to significantly reduce video delivery costs, reduce buffering & speed-up video starts without any change in the quality perceived by viewers. We’re also inspired by the show’s ability to turn complex things into compelling and accessible, without distorting the truth. No easy feat. But let’s see if we can pull it off with a discussion about video quality measurement which is also a dense topic.

Basics of Perceptual Video Quality

Our brains are amazing, especially in the way we process rich visual information. If a picture’s worth 1,000 words. What’s 60 frames per second in 4k HDR worth?

The answer varies based on what part of the ecosystem or business you come from, but we can all agree that it’s really impactful. And data intensive, too. But our eyeballs aren’t perfect and our brains aren’t either – as Brain Games points out. As such, it’s odd that established metrics for video compression quality in the TV business have been built on the idea that human vision is mechanically perfect.

See, video engineers have historically relied heavily on two key measures to evaluate the quality of a video encode: Peak Signal to Noise Ratio, or PSNR, and Structured Similarity, or SSIM. Both metrics are ‘objective’ metrics. That is, we use tools to directly measure the physics of the video signal and construct mathematical algorithms from that data to create metrics. But is it possible to really quantify a beautiful landscape with a number? Let’s see about that.

PSNR and SSIM look at different physics properties of a video, but the underlying mechanics for both metrics are similar. You compress a source video where the properties of the “original” and derivative are then analyzed using specific inputs, and metrics calculated for both. The more similar the two metrics are, the more we can say that the properties of each video are similar, and the closer we can define our manipulation of the video, i.e. our encode, as having a high or acceptable quality.

Objective Quality vs. Subjective Quality

However, it turns out that these objectively calculated metrics do not correlate well to the human visual experience. In other words, in many cases, humans cannot perceive variations that objective metrics can highlight while at the same time, objective metrics can miss artifacts a human easily perceives.

The concept that human visual processing might be less than perfect is intuitive. It’s also widely understood in the encoding community. This fact opens a path to saving money, reducing buffering and speeding-up time-to-first-frame. After all, why would you knowingly send bits that can’t be seen?

But given the complexity of the human brain, can we reliably measure opinions about picture quality to know what bits can be removed and which cannot? This is the holy grail for anyone working in the area of video encoding.

Measuring Perceptual Quality

Actually, a rigorous, scientific and peer-reviewed discipline has developed over the years to accurately measure human opinions about the picture quality on a TV. The math and science behind these methods are memorialized in an important ITU standard on the topic originally published in 2008 and updated in 2012. ITU BT.500 (International Telecommunications Union is the largest standards committee in global telecom.) I’ll provide a quick rundown.

First, a set of clips is selected for testing. A good test has a variety of clips with diverse characteristics: talking heads, sports, news, animation, UGC – the goal is to get a wide range of videos in front of human subjects.

Then, a subject pool of sufficient size is created and screened for 20/20 vision. They are placed in a light-controlled environment with a screen or two, depending on the set-up and testing method.

Instructions for one method is below, as a tangible example.

In this experiment, you will see short video sequences on the screen that is in front of you. Each sequence will be presented twice in rapid succession: within each pair, only the second sequence is processed. At the end of each paired presentation, you should evaluate the impairment of the second sequence with respect to the first one.

You will express your judgment by using the following scale:

5 Imperceptible

4 Perceptible but not annoying

3 Slightly annoying

2 Annoying

1 Very annoying

Observe carefully the entire pair of video sequences before making your judgment.

As you can imagine, testing like this is an expensive proposition indeed. It requires specialized facilities, trained researchers, vast amounts of time, and a budget to recruit subjects.

Thankfully, the rewards were worth the effort for teams like Beamr that have been doing this for years.

It turns out, if you run these types of subjective tests, you’ll find that there are numerous ways to remove 20 – 50% of the bits from a video signal without losing the ‘eyeball’ video quality – even when the objective metrics like PSNR and SSIM produce failing grades.

But most of the methods that have been tried are still stuck in academic institutions or research labs. This is because the complexities of upgrading or integrating the solution into the playback and distribution chain make them unusable. Have you ever had to update 20 million set-top boxes? Well if you have, you know exactly what I’m talking about.

We know the broadcast and large scale OTT industry, which is why when we developed our approach to measuring perceptual quality and applied it to reducing bitrates, we were insistent on staying 100% inside the standard of AVC H.264 and HEVC H.265.

By pioneering the use of perceptual video quality metrics, Beamr is enabling media and entertainment companies of all stripes to reduce the bits they send by up to 50%. This reduces re-buffering events by up to 50%, improves video start time by 20% or more, and reduces storage and delivery costs.

Fortunately, you now understand the basics of perceptual video quality. You also see why most of the video engineering community believes content adaptive sits at the heart of next-generation encoding technologies.

Unfortunately, when we stated above that there were “all kinds of ways” to reduce bits up to 50% without sacrificing ‘eyeball video quality’, we skipped over some very important details. Such as, how we can utilize subjective testing techniques on an entire catalog of videos at scale, and cost efficiently.

Next time: Part 2 and the Opinionated Robot

Looking for better tools to assess subjective video quality?

You definitely want to check out Beamr’s VCT which is the best software player available on the market to judge HEVC, AVC, and YUV sequences in modes that are highly useful for a video engineer or compressionist.

VCT is available for Mac and PC. And best of all, we offer a FREE evaluation to qualified users.

Learn more about VCT: