Connecting Virtual Reality with the History of Encoding Technology

Two fun and surprising brain factoids are revealed that connect virtual reality with the history of encoding technology.

Bloomberg featured a Charlie Rose interview with Jeremy Bailenson, the founding director of Stanford University’s Virtual Human Interaction Lab. Not surprisingly, the lab houses the sharpest minds and insightful datasets in its discipline of focus: Virtual Reality.

It’s a 20-minute video that touches on some fascinating elements of VR – very few of which are about commercial television or sports entertainment experiences.

In fact, it is as much an interview about the brain, human interaction, and the physical body, as it is about media and entertainment.

As Jeremy says: “The medium [of VR] puts you inside of the media. It feels like you are actually doing something.”

Then, he states our first stunning fact about the brain, which illustrates why VR will be so impactful on modern civilization:

We can’t tell the difference!

Professor Bailenson: “The brain is going to treat that as if it is a real experience. Humans have been around for a very long time [evolving in the real world.] The brain hasn’t yet evolved to really understand the difference between a compelling virtual reality experience and a real one.”

The full video is here.

So there you have it. Our brains are nothing short of miraculous, but they’ve evolved some peculiar wiring to say the least. To put it bluntly, while humans are exceptionally clever in many ways, we’re not so much in others.

Which is the perfect segue into my second surprising factoid about the brain, and it’s taken 25 years for commercial video markets to exploit this fact!

To be fair, that’s not an exact statement, but here’s the timeline for reference.

According to Wikipedia, Cinepak was one of the very first commercial implementations of video compression technology. It made it possible to watch video utilizing CD-ROM. (Just typing the words taps into nostalgia.) Cinepak was released in 1991 and became part of Apple’s QuickTime toolset a year later.

It was 16 years later, in 2007, that the Video Quality Experts Group decided to create and benchmark a new metric that – while not perfect – served as a milestone amongst the video coding community. For the first time, there was a recognition that maximum compression required us to take human vision biology into account when designing algorithms to shrink video files. Their perceptual metric was known as Perceptual Evaluation of Video Quality, and despite its impracticality for implementation, it became a part of the International Telecommunications Union standards.

Then in 2009, Beamr was formed to solve the very real need to reduce file sizes while retaining quality. This need became evident after an encounter with a consumer technology products company who indicated the massive cost of storage for digital media was an inhibitor for them to offer services that could extend the capacity of their devices. So we set out to solve the technical challenge of reducing redundant bits without compromising quality, and to do this in a fully automatic manner. The result? 50 patents have now been granted or are pending. And we have commercial implementations of our solution that have been working in some of the largest new media and video distribution platforms for more than three years.

But beyond this, there is another subjective datapoint taken from Beamr’s experience over the last few quarters where many of the conversations and evaluations that we are entering into about next-generation encoding are not limited to advanced codecs but rather subjective quality metrics – and leveraging our knowledge of the human vision system to remove bits in a compressed video file with no human noticeable difference.

As VR, 360, UHD, HDR and other exciting new consumer entertainment technologies are beginning to take hold in the market, never before has there been a greater need to advance the state of the art in the area of maximizing quality at a given bitrate. Beamr was the first company to step up to address and solve this challenge, and with our demonstrable quality, it’s not a stretch to suggest that we have the lead.

More information on Beamr’s software encoding and optimization solutions can be found at beamr.com

Translating Opinions into Fact When it Comes to Video Quality

This post was originally featured at https://www.linkedin.com/pulse/translating-opinions-fact-when-comes-video-quality-mark-donnigan 

In this post, we attempt to de-mystify the topic of perceptual video quality, which is the foundation of Beamr’s content adaptive encoding and content adaptive optimization solutions. 

National Geographic has a hit TV franchise on its hands. It’s called Brain Games starring Jason Silva, a talent described as “a Timothy Leary of the viral video age” by the Atlantic. Brain Games is accessible, fun and accurate. It’s a dive into brain science that relies on well-produced demonstrations of illusions and puzzles to showcase the power — and limitation — of the human brain. It’s compelling TV that illuminates how we perceive the world.(Intrigued? Watch the first minute of this clip featuring Charlie Rose, Silva, and excerpts from the show: https://youtu.be/8pkQM_BQVSo )

At Beamr, we’re passionate about the topic of perceptual quality. In fact, we are so passionate, that we built an entire company based on it. Our technology leverages science’s knowledge about the human vision system to significantly reduce video delivery costs, reduce buffering & speed-up video starts without any change in the quality perceived by viewers. We’re also inspired by the show’s ability to turn complex things into compelling and accessible, without distorting the truth. No easy feat. But let’s see if we can pull it off with a discussion about video quality measurement which is also a dense topic.

Basics of Perceptual Video Quality

Our brains are amazing, especially in the way we process rich visual information. If a picture’s worth 1,000 words. What’s 60 frames per second in 4k HDR worth?

The answer varies based on what part of the ecosystem or business you come from, but we can all agree that it’s really impactful. And data intensive, too. But our eyeballs aren’t perfect and our brains aren’t either – as Brain Games points out. As such, it’s odd that established metrics for video compression quality in the TV business have been built on the idea that human vision is mechanically perfect.

See, video engineers have historically relied heavily on two key measures to evaluate the quality of a video encode: Peak Signal to Noise Ratio, or PSNR, and Structured Similarity, or SSIM. Both metrics are ‘objective’ metrics. That is, we use tools to directly measure the physics of the video signal and construct mathematical algorithms from that data to create metrics. But is it possible to really quantify a beautiful landscape with a number? Let’s see about that.

PSNR and SSIM look at different physics properties of a video, but the underlying mechanics for both metrics are similar. You compress a source video where the properties of the “original” and derivative are then analyzed using specific inputs, and metrics calculated for both. The more similar the two metrics are, the more we can say that the properties of each video are similar, and the closer we can define our manipulation of the video, i.e. our encode, as having a high or acceptable quality.

Objective Quality vs. Subjective Quality


However, it turns out that these objectively calculated metrics do not correlate well to the human visual experience. In other words, in many cases, humans cannot perceive variations that objective metrics can highlight while at the same time, objective metrics can miss artifacts a human easily perceives.

The concept that human visual processing might be less than perfect is intuitive. It’s also widely understood in the encoding community. This fact opens a path to saving money, reducing buffering and speeding-up time-to-first-frame. After all, why would you knowingly send bits that can’t be seen?

But given the complexity of the human brain, can we reliably measure opinions about picture quality to know what bits can be removed and which cannot? This is the holy grail for anyone working in the area of video encoding.

Measuring Perceptual Quality

Actually, a rigorous, scientific and peer-reviewed discipline has developed over the years to accurately measure human opinions about the picture quality on a TV. The math and science behind these methods are memorialized in an important ITU standard on the topic originally published in 2008 and updated in 2012. ITU BT.500 (International Telecommunications Union is the largest standards committee in global telecom.) I’ll provide a quick rundown.

First, a set of clips is selected for testing. A good test has a variety of clips with diverse characteristics: talking heads, sports, news, animation, UGC – the goal is to get a wide range of videos in front of human subjects.

Then, a subject pool of sufficient size is created and screened for 20/20 vision. They are placed in a light-controlled environment with a screen or two, depending on the set-up and testing method.

Instructions for one method is below, as a tangible example.

In this experiment, you will see short video sequences on the screen that is in front of you. Each sequence will be presented twice in rapid succession: within each pair, only the second sequence is processed. At the end of each paired presentation, you should evaluate the impairment of the second sequence with respect to the first one.

You will express your judgment by using the following scale:

5 Imperceptible

4 Perceptible but not annoying

3 Slightly annoying

2 Annoying

1 Very annoying

Observe carefully the entire pair of video sequences before making your judgment.

As you can imagine, testing like this is an expensive proposition indeed. It requires specialized facilities, trained researchers, vast amounts of time, and a budget to recruit subjects.

Thankfully, the rewards were worth the effort for teams like Beamr that have been doing this for years.

It turns out, if you run these types of subjective tests, you’ll find that there are numerous ways to remove 20 – 50% of the bits from a video signal without losing the ‘eyeball’ video quality – even when the objective metrics like PSNR and SSIM produce failing grades.

But most of the methods that have been tried are still stuck in academic institutions or research labs. This is because the complexities of upgrading or integrating the solution into the playback and distribution chain make them unusable. Have you ever had to update 20 million set-top boxes? Well if you have, you know exactly what I’m talking about.

We know the broadcast and large scale OTT industry, which is why when we developed our approach to measuring perceptual quality and applied it to reducing bitrates, we were insistent on staying 100% inside the standard of AVC H.264 and HEVC H.265.

By pioneering the use of perceptual video quality metrics, Beamr is enabling media and entertainment companies of all stripes to reduce the bits they send by up to 50%. This reduces re-buffering events by up to 50%, improves video start time by 20% or more, and reduces storage and delivery costs.

Fortunately, you now understand the basics of perceptual video quality. You also see why most of the video engineering community believes content adaptive sits at the heart of next-generation encoding technologies.

Unfortunately, when we stated above that there were “all kinds of ways” to reduce bits up to 50% without sacrificing ‘eyeball video quality’, we skipped over some very important details. Such as, how we can utilize subjective testing techniques on an entire catalog of videos at scale, and cost efficiently.

Next time: Part 2 and the Opinionated Robot

Looking for better tools to assess subjective video quality?

You definitely want to check out Beamr’s VCT which is the best software player available on the market to judge HEVC, AVC, and YUV sequences in modes that are highly useful for a video engineer or compressionist.

VCT is available for Mac and PC. And best of all, we offer a FREE evaluation to qualified users.

Learn more about VCT: http://beamr.com/h264-hevc-video-comparison-player/

 

Will Virtual Reality Determine the Future of Streaming?

As video services take a more aggressive approach to virtual reality (VR), the question of how to scale and deliver this bandwidth intensive content must be addressed to bring it to a mainstream audience.

While we’ve been talking about VR for a long time you can say that it was reinvigorated when Oculus grabbed the attention of Facebook who injected 2 billion in investment based on Mark Zuckerberg’s vision that VR is a future technology that people will actively embrace. Industry forecasters tend to agree, suggesting VR will be front and center in the digital economy within the next decade. According to research by Canalys, vendors will ship 6.3 million VR headsets globally in 2016 and CCS Insights suggest that as many as 96 million headsets will get snapped up by consumers by 2020.

One of VR’s key advantages is the fact that you have the freedom to look anywhere in 360 degrees using a fully panoramic video in a highly intimate setting. Panoramic video files and resolution dimensions are large, often 4K (4096 pixels wide, 2048 pixels tall, depending on the standard) or bigger.

While VR is considered to be the next big revolution in the consumption of media content, we also see it popping up in professional fields such as education, health, law enforcement, defense telecom and media. It can provide a far more immersive live experience than TV, by adding presence, the feeling that “you are really there.”

Development of VR projects have already started to take off and high-quality VR devices are surprisingly affordable. Earlier this summer, Google announced that 360-degree live streaming support was coming to YouTube.

Of course, all these new angles and sharpness of imagery creates new and challenging sets of engineering hurdles which we’ll discuss below.

Resolution and, Quality?

Frame rate, resolution, and bandwidth are affected by the sheer volume of pixels that VR transmits. Developers and distributors of VR content will need to maximize frame rates and resolution throughout the entire workflow. They must keep up with the wide range of viewers’ devices as sporting events in particular, demand precise detail and high frame rates, such as what we see with instant replay, slow motion, and 360-degree cameras.

In a recent Vicon industry survey, 28 percent of respondents stated that high-quality content was important to ensuring a good VR experience. Let’s think about simple file size comparisons – we already know that ultra HD file sizes take up considerably more storage space than SD and the greater the file size, the greater a chance it will impede the delivery. VR file sizes are no small potatoes.  When you’re talking about VR video you’re talking about four to six times the foundational resolution that you are transmitting. And, if you thought that Ultra HD was cumbersome, think about how you’re going to deal with resolutions beyond 4K for an immersive VR HD experience.

In order to catch up with the file sizes we need to continue to develop video codecs that can quickly interpret the frame-by-frame data. HEVC is a great starting point but frankly given hardware device limitations many content distributors are forced to continue using H.264 codecs. For this reason we must harness advanced tools in image processing and compression. An example of one approach would be content adaptive perceptual optimization.

I want my VR now! Reaching End Users

Because video content comes in a variety of file formats including combinations of stereoscopic 3D, 360 degree panoramas and spherical views – they all come with obvious challenges such as added strain on processors, memory, and network bandwidth. Modern codecs today use a variety of algorithms to quickly and efficiently detect these similarities, but they are usually tailored to 2D content. However, a content delivery mechanism must be able to send this to every user and should be smart to optimize the processing and transmitting of video.

Minimizing latency, how long can you roll the boulder up the hill?

We’ve seen significant improvements in the graphic processing capabilities of desktops and laptops. However, to take advantage of the immersive environment that VR offers, it’s important that high-end graphics are delivered to the viewer as quickly and smoothly as possible. The VR hardware also needs to display large images properly and with the highest fidelity and lowest latency. There really is very limited room for things like color correction or for adjusting panning from different directions for instance. If you have to stitch or rework artifacts, you will likely lose ground. You need to be smart about it. Typical decoders for tablets or smart TVs are more likely to cause latency and they only support lower framerates. This means how you build the infrastructure will be the key to offering image quality and life-like resolution that consumers expect to see.

Bandwidth, where art thou?

According to Netflix, for an Ultra HD streaming experience, your Internet connection must have a speed of 25 Mbps or higher. However, according to Akamai, the average Internet speed in the US is only approximately 11 Mbps. Effectively, this prohibits live streaming on any typical mobile VR device which to achieve the quality and resolution needed may need 25 Mbps minimum.

Most certainly the improvements in graphic processing and hardware will continue to drive forward the realism of the immersive VR content, as the ability to render an image quickly becomes easier and cheaper. Just recently, Netflix jumped on the bandwagon and became the first of many streaming media apps to launch on Oculus’ virtual reality app store. As soon as all the VR display devices are able to integrate with these higher resolution screens, we will see another step change in the quality and realism of virtual environments. But will the available bandwidth be sufficient, is a very real question. 

To understand the applications for VR, you really have to see it to believe it

A heart-warming campaign from Expedia recently offered children at a research hospital in Memphis Tennessee the opportunity to be taken on a journey of their dreams through immersive, real-time virtual travel – all without getting on a plane:  https://www.youtube.com/watch?time_continue=179&v=2wQQh5tbSPw

The National Multiple Sclerosis Society also launched a VR campaign that inventively used the tech to give two people with MS the opportunity to experience their lifelong passions. These are the type of immersive experiences we hope will unlock a better future for mankind. We applaud the massive projects and time spent on developing meaningful VR content and programming such as this.

Frost & Sullivan estimates that $1.5 billion is the forecasted revenue from Pay TV operators delivering VR content by 2020. The adoption of VR in my estimation is only limited by the quality of the user experience, as consumer expectation will no doubt be high.

For VR to really take off, the industry needs to address some of these challenges making VR more accessible and most importantly with unique and meaningful content. But it’s hard to talk about VR without experiencing it. I suggest you try it – you will like it.

Applications for On-the-Fly Modification of Encoder Parameters

As video encoding workflows modernize to include content adaptive techniques, the ability to change encoder parameters “on-the-fly” will be required. With the ability to change encoder resolution, bitrate, and other key elements of the encoding profile, video distributors can achieve a significant advantage by creating recipes appropriate to each piece of content.

For VOD or file-based encoding workflows, the advantages of on-the-fly reconfigurability are to enable content specific encoding recipes without resetting the encoder and disrupting the workflow. At the same time, on-the-fly functionality is a necessary feature for supporting real-time encoding on a network with variable capacity.  This way the application can take appropriate steps to react to changing bandwidth, network congestion or other operational requirements.

Vanguard by Beamr V.264 AVC Encoder SDK and V.265 HEVC Encoder SDK have supported on-the-fly modification of the encoder settings for several years. Let’s take a look at a few of the more common applications where having the feature can be helpful.

On-the-fly control of Bitrate

Adjusting bitrate while the encoder is in operation is an obvious application. All Vanguard by Beamr codec SDKs allow for the maximum bitrate to be changed via a simple “C-style” API.  This will enable bitrate adjustments to be made based on the available bandwidth, dynamic channel lineups, or other network conditions.

On-the-fly control of Encoder Speed

Encoder speed control is an especially useful parameter which directly translates into video encoding quality and encoding processing time. Calling this function triggers a different set of encoding algorithms, and internal codec presets. This scenario applies with unicast transmissions where a service may need to adjust the encoder speed for ever-changing network conditions and client device capabilities.

On-the-fly control of Video Resolution

A useful parameter to access on the fly is video resolution. One use case is in telecommunications where the end user may shift his viewing point from a mobile device operating on a slow and congested cellular network, to a broadband WiFi network, or hard wired desktop computer. With control of video resolution, the encoder output can be changed during its operation to accommodate the network speed or to match the display resolution, all without interrupting the video program stream.

On-the-fly control of HEVC SAO and De-blocking Filter

HEVC presents additional opportunities to enhance “on the fly” control of the encoder and the Vanguard by Beamr V.265 encoder leads the market with the capability to turn on or off SAO and De-blocking filters to adjust quality and performance in real-time.

On-the-fly control of HEVC multithreading

V.265 is recognized for having superior multithreading capability.  The V.265 codec SDK provides access to add or remove encoding execution threads dynamically. This is an important feature for environments with a variable number of tasks running concurrently such as encoding functionality that is operating alongside a content adaptive optimization process, or the ABR packaging step.

Beamr’s implementation of on-the-fly controls in our V.264 Codec SDK and V.265 Codec SDK demonstrate the robust design and scalable performance of the Vanguard by Beamr encoder software.

For more information on Vanguard by Beamr Codec SDK’s, please visit the V.264 and V.265 pages.  Or visit http://beamr.com for more on the company and our technology.