Translating Opinions into Fact When it Comes to Video Quality

This post was originally featured at https://www.linkedin.com/pulse/translating-opinions-fact-when-comes-video-quality-mark-donnigan 

In this post, we attempt to de-mystify the topic of perceptual video quality, which is the foundation of Beamr’s content adaptive encoding and content adaptive optimization solutions. 

National Geographic has a hit TV franchise on its hands. It’s called Brain Games starring Jason Silva, a talent described as “a Timothy Leary of the viral video age” by the Atlantic. Brain Games is accessible, fun and accurate. It’s a dive into brain science that relies on well-produced demonstrations of illusions and puzzles to showcase the power — and limitation — of the human brain. It’s compelling TV that illuminates how we perceive the world.(Intrigued? Watch the first minute of this clip featuring Charlie Rose, Silva, and excerpts from the show: https://youtu.be/8pkQM_BQVSo )

At Beamr, we’re passionate about the topic of perceptual quality. In fact, we are so passionate, that we built an entire company based on it. Our technology leverages science’s knowledge about the human vision system to significantly reduce video delivery costs, reduce buffering & speed-up video starts without any change in the quality perceived by viewers. We’re also inspired by the show’s ability to turn complex things into compelling and accessible, without distorting the truth. No easy feat. But let’s see if we can pull it off with a discussion about video quality measurement which is also a dense topic.

Basics of Perceptual Video Quality

Our brains are amazing, especially in the way we process rich visual information. If a picture’s worth 1,000 words. What’s 60 frames per second in 4k HDR worth?

The answer varies based on what part of the ecosystem or business you come from, but we can all agree that it’s really impactful. And data intensive, too. But our eyeballs aren’t perfect and our brains aren’t either – as Brain Games points out. As such, it’s odd that established metrics for video compression quality in the TV business have been built on the idea that human vision is mechanically perfect.

See, video engineers have historically relied heavily on two key measures to evaluate the quality of a video encode: Peak Signal to Noise Ratio, or PSNR, and Structured Similarity, or SSIM. Both metrics are ‘objective’ metrics. That is, we use tools to directly measure the physics of the video signal and construct mathematical algorithms from that data to create metrics. But is it possible to really quantify a beautiful landscape with a number? Let’s see about that.

PSNR and SSIM look at different physics properties of a video, but the underlying mechanics for both metrics are similar. You compress a source video where the properties of the “original” and derivative are then analyzed using specific inputs, and metrics calculated for both. The more similar the two metrics are, the more we can say that the properties of each video are similar, and the closer we can define our manipulation of the video, i.e. our encode, as having a high or acceptable quality.

Objective Quality vs. Subjective Quality


However, it turns out that these objectively calculated metrics do not correlate well to the human visual experience. In other words, in many cases, humans cannot perceive variations that objective metrics can highlight while at the same time, objective metrics can miss artifacts a human easily perceives.

The concept that human visual processing might be less than perfect is intuitive. It’s also widely understood in the encoding community. This fact opens a path to saving money, reducing buffering and speeding-up time-to-first-frame. After all, why would you knowingly send bits that can’t be seen?

But given the complexity of the human brain, can we reliably measure opinions about picture quality to know what bits can be removed and which cannot? This is the holy grail for anyone working in the area of video encoding.

Measuring Perceptual Quality

Actually, a rigorous, scientific and peer-reviewed discipline has developed over the years to accurately measure human opinions about the picture quality on a TV. The math and science behind these methods are memorialized in an important ITU standard on the topic originally published in 2008 and updated in 2012. ITU BT.500 (International Telecommunications Union is the largest standards committee in global telecom.) I’ll provide a quick rundown.

First, a set of clips is selected for testing. A good test has a variety of clips with diverse characteristics: talking heads, sports, news, animation, UGC – the goal is to get a wide range of videos in front of human subjects.

Then, a subject pool of sufficient size is created and screened for 20/20 vision. They are placed in a light-controlled environment with a screen or two, depending on the set-up and testing method.

Instructions for one method is below, as a tangible example.

In this experiment, you will see short video sequences on the screen that is in front of you. Each sequence will be presented twice in rapid succession: within each pair, only the second sequence is processed. At the end of each paired presentation, you should evaluate the impairment of the second sequence with respect to the first one.

You will express your judgment by using the following scale:

5 Imperceptible

4 Perceptible but not annoying

3 Slightly annoying

2 Annoying

1 Very annoying

Observe carefully the entire pair of video sequences before making your judgment.

As you can imagine, testing like this is an expensive proposition indeed. It requires specialized facilities, trained researchers, vast amounts of time, and a budget to recruit subjects.

Thankfully, the rewards were worth the effort for teams like Beamr that have been doing this for years.

It turns out, if you run these types of subjective tests, you’ll find that there are numerous ways to remove 20 – 50% of the bits from a video signal without losing the ‘eyeball’ video quality – even when the objective metrics like PSNR and SSIM produce failing grades.

But most of the methods that have been tried are still stuck in academic institutions or research labs. This is because the complexities of upgrading or integrating the solution into the playback and distribution chain make them unusable. Have you ever had to update 20 million set-top boxes? Well if you have, you know exactly what I’m talking about.

We know the broadcast and large scale OTT industry, which is why when we developed our approach to measuring perceptual quality and applied it to reducing bitrates, we were insistent on staying 100% inside the standard of AVC H.264 and HEVC H.265.

By pioneering the use of perceptual video quality metrics, Beamr is enabling media and entertainment companies of all stripes to reduce the bits they send by up to 50%. This reduces re-buffering events by up to 50%, improves video start time by 20% or more, and reduces storage and delivery costs.

Fortunately, you now understand the basics of perceptual video quality. You also see why most of the video engineering community believes content adaptive sits at the heart of next-generation encoding technologies.

Unfortunately, when we stated above that there were “all kinds of ways” to reduce bits up to 50% without sacrificing ‘eyeball video quality’, we skipped over some very important details. Such as, how we can utilize subjective testing techniques on an entire catalog of videos at scale, and cost efficiently.

Next time: Part 2 and the Opinionated Robot

Looking for better tools to assess subjective video quality?

You definitely want to check out Beamr’s VCT which is the best software player available on the market to judge HEVC, AVC, and YUV sequences in modes that are highly useful for a video engineer or compressionist.

VCT is available for Mac and PC. And best of all, we offer a FREE evaluation to qualified users.

Learn more about VCT: http://beamr.com/h264-hevc-video-comparison-player/

 

VCT, the Secret to Confident Subjective Video Quality Testing

We can all agree that analyzing video quality is one of the biggest challenges when evaluating codecs. Companies use a combination of objective and subjective tests to validate encoder efficiency. In this post, I’ll explore why it is difficult to measure video quality with quantitative metrics alone because they fail to meet the subjective quality perception ability of the human eye.

Furthermore, we’ll look at why it’s important to equip yourself with the best resources when doing subjective testing, and how Beamr’s VCT visual comparison tool can help you with video quality testing.

But first, if you haven’t done so already, be sure to download your free trial of VCT here.

OBJECTIVE TESTING

The most common objective measurement used today is pixel-based Peak Signal to Noise Ratio (PSNR). PSNR is a popular test to use because it is easy to calculate and nearly everyone working in video is familiar with interpreting its values. But it does have limitations. Typically a higher PSNR value correlates to higher quality, while a lower PSNR value correlates to lower quality. However, since this test measures pixel-based mean-squared error over an entire frame; measuring the quality of a frame (or collection of frames) using a single number does not always parallel true subjective quality.

PSNR gives equal weight to every pixel in the frame and each frame in a sequence, ignoring many factors that can affect human perception. For example, below are 2 encoded images of the same frame.1 Image (a) and Image (b) have the same PSNR, which should theoretically correlate to two encoded images of the same quality. However, it is easy to see the difference in this example of perceived quality as viewers would rate Image (a) as exceptionally higher quality than Image (b).

Example: 

PSNR value example of why it shouldn't be the absolute measurement for assessing video quality

Due to the inconsistencies of error-based methods, like PSNR to adequately mimic human eye perception, other methods for analyzing video quality have been developed, including the Structural Similarity Index Metric (SSIM) which measures structural distortion. Unlike PSNR, SSIM addresses image degradation as measures of the perceived change in three major aspects of images: luminance, contrast, and correction. SSIM has gained popularity, but as with PSNR, it has its limitations. Studies have suggested that SSIM’s performance is equal to PSNR’s performance and some have cited evidence of a systematic relationship between SSIM and Mean Squared Error (MSE).2

While SSIM and other quantitative measures including multi-scale structural similarity (MS-SSIM) and the Sarnoff Picture Quality Rating (PQR) have made significant gains, none can truly deliver the same assurance as subjective evaluation, using the human eye. It is also important to note that the two most widely used objective quality metrics mentioned above, PSNR and SSIM, were designed to evaluate static image quality. This means that both algorithms provide no meaningful information regarding motion artifacts, whereby limiting the effectiveness of the metric with regards to video.

SUBJECTIVE TESTING

While objective methods attempt to model human perception, there are no substitutes for subjective “golden-eye” tests. But we are all familiar with the drawbacks of subjectivity analysis, including variance of individual quality perception and the difficulties of executing proper subjective tests in 100% controlled viewing environments so that a large number of testers can participate. Evaluating video using subjective visual tests can reveal key differences that may not get caught by objective measures alone. Which is why it is important to use a combination of both objective and subjective testing methodologies.

One of the logistic difficulties of performing subjective quality comparisons is coordinating simultaneous playback of two streams. Recognizing some of the drawbacks of current subjective evaluation methods, in particular single-stream playback or awkward dual-stream review workarounds, Beamr spent years in research and development to build a tool that offers simultaneous playback of two videos with various comparison modes, to significantly improve the golden-eye test execution necessary to properly evaluate encoder efficiency.

Powered by our professional HEVC and H.264 codec SDK decoders, the Beamr video comparison tool VCT allows encoding engineers and compressionists to play back two frame-synchronized independent HEVC, H.264, or YUV sequences simultaneously. And compare the quality of these streams in four modes:

  1. Split screen
  2. Side-by-side
  3. Overlay
  4. and the newest mode Butterfly

MPEG2-TS and MP4 files containing either HEVC or H.264 elementary streams are also supported. Additionally, VCT displays valuable clip information such as bit-rate, screen resolution, frame rate, number of frames, and other important video information.

Developed in 2012, VCT was the industry’s first internal software player offered as a tool to help Beamr customers conduct subjective testing while evaluating our encoder’s efficiency. Today, VCT has been tested by many content and equipment companies from around the world in multiple markets including broadcast, mobile, and internet streaming, making it the defacto standard for subjective golden-eye video quality testing and evaluation.

VCT BENEFITS AND TIPS

Your FREE trial of VCT will come with an extensive user guide that contains everything you need to get started. But we know you are eager to begin your testing, so following are a few quick tips we trust you will find useful. Take advantage of this “golden” opportunity and get started today!

Note: use Command (⌘) instead of Ctrl for the OS X version of VCT.

  1.      Split Screen Comparison Mode:
    • Benefits:
      • Great for viewing two clips when only one screen is available.
      • Moving slider bar allows you to clearly see quality difference between two streams in your desired region of interest. For example, you can move the slider bar back and forth across a face to see quality differences between two discrete files.
    • Pro Tips:
      • Use the keyboard shortcut Ctrl + \ to re-center the slider bar after it is moved.
      • Shortcut key Ctrl + Tab allows you to change which video appears on the left or right of the slider bar.

VCT split screen comparison mode for subjective video quality assessment

 

  1.       Side-by-side Comparison Mode:
    • Benefits:
      • Great for tradeshows. Solves the lack of synchronization of side by side comparison tests when using two independent players.
      • Single control for both streams.
    • Pro Tip:
      • Shortcut key Ctrl + Tab allows you to change which video appears on which screen without moving the windows.

VCT side-by-side comparison mode for subjective video quality assessment

 

  1.       Overlay Comparison Mode:
    • Benefits:
      • Great for viewing the full frame of one stream on a single window.
    • Tips:
      • Shortcut key Ctrl + Tab allows you to cycle between the two videos. If you do this fast it is a great way to easily see quality differences between the two streams that you might not have noticed.

Overlay Mode

 

  1.      Butterfly Comparison Mode:
    • Benefits:
      • Very useful for determining the accuracy of the encoding process. The butterfly mode displays mirrored images of two sequences to help you assess whether an artifact occurs in the source when comparing an encoded sequence to the original.
    • Tips:
      • Use shortcut key Ctrl + \ to reset the frame to the leftmost view in and use shortcut Ctrl + Alt + \ to switch to the rightmost view in butterfly mode.
      • Use shortcut key Ctrl + [ and Ctrl + ] to move image in butterfly mode left/right.

VCT butterfly comparison mode for subjective video quality assessment

  1.      Other Useful Tips:
    • Ctrl + m allows you to toggle through the 4 comparison modes.
    • Shift + Left Click opens the magnifier tool that allows you to zoom into hard to see areas of the video.
    • Easily scale frames of different resolutions to the same resolution by clicking “scale to same look” on the main menu
    • NEW automatic download feature on the splash screen notifies you of the latest version updates to ensure you’re always up to date.
    • For more great features be sure to check out the VCT userguide beamr.com/vct/userguide.com.

 

Reference:

(1)   P. M. Arun Kumar and S. Chandramathi. Video Quality Assessment Methods: A Bird’s-Eye View

(2)   Richard Dosselmann and Xue Dong Yang. A Formal Assessment of the Structural Similarity Index