With numerous advantages, AV1 is now supported on about 60% of devices and all major web browsers. To accelerate its adoption – Beamr has introduced an easy, automated upgrade to the codec that is in the forefront of today’s video technology
Four years ago we explored the different video codecs, analyzing their strengths and weaknesses, and took a look at current and predicted market share. While it is gratifying to see that many of our predictions were pretty accurate, that is accompanied by some degree of disappointment: while AV1 strengths are well known in the industry, significant change in adoption of new codecs has yet to materialize.
The bottom line of the 2020 post was: “Only time will tell which will have the highest market share in 5 years’ time, but one easy assessment is that with AVC current market share estimated at around 70%, this one is not going to disappear anytime soon. AV1 is definitely gaining momentum, and with the giants backing we expect to see it used a fair bit in online streaming. “
Indeed we are living in a multi-codec reality, where AVC still accounts for, by far, the largest percentage of video content, but adoption of AV1 is starting to increase with large players such as Netflix and YouTube incorporating it into their workflows, and many others using it for specific high value use cases.
Thus, we are faced with a mixture of the still dominant AVC, HEVC (serving primarily UHD and HDR use cases), AV1 and some additional codecs such as VP9, VVC which are being used in quite small amounts.
The Untapped Potential of AV1
So while AV1 adoption is increasing, there is still significant untapped potential. One of the causes for slower than hoped rollout of AV1 is the obstacle present for adoption of any new standard – critical mass of decoding support in H/W on edge devices.
While for AVC and HEVC the coverage is very extensive, for AV1 that has only recently become the case, with support across an estimate of 60% of devices and all major web browsers, and complementing the efficient software decoding offered by Dav1d.
Another obstacle AV1 faces involves the practicalities of deployment. While there is extensive knowledge, within the industry and available online, of how best to configure AVC encoding, and what presets and encoding parameters work well for which use cases – there is no such equivalent knowledge for AV1. Thus, in order to deploy it, extensive research is needed by those who intend to use it.
Additionally, AV1 encoding is complicated, resulting in much higher processing power required to perform software encoding. In a world that is constantly trying to cut back costs, and use lower power solutions, this can pose a problem. Even when using software solutions at the fastest settings, the compute required is still significantly slower than AVC encoding at typical speeds. This is a strong motivator to upgrade to AV1 using H/W accelerated solutions (Learn more about Beamr solution to the challenge).
The upcoming codec possibilities are also a deterrent for some. With AV2 in the works, VVC finalized and gaining some traction, and various groups working on AI based encoding solutions, there will always be players waiting for ‘the next big thing’, rather than having to switch out codecs twice.
In a world where JPEG, a 30+ year old standard, is still used in over 70% of websites and is the most popular format on the web for photographic content, it is no surprise that adoption of new video codecs is taking time.
While a multi codec reality is probably going to stay with us, we can at least hope that when we revisit this topic in a blog a few years down the line, the balance between deployed codecs leans more towards the higher efficiency codecs, like AV1, to yield the best bitrate – quality options for the video world.
Easy & Safe Codec Modernization with Beamr using Nvidia GPUs
Following a decade where AVC/H.264 was the clear ruler of the video encoding world, the last years have seen many video coding options battling to conquer the video arena. For some insights on the race between modern coding standards you can check out our corresponding blog post.
Today we want to share how easy it can be to upgrade your content to a new and improved codec in a fast, fully automatic process which guarantees the visual quality of the content will not be harmed. This makes the switchover to newer encoders a smooth, easy and low cost process which can help accelerate the adoption of new standards such as HEVC and AV1. When this transformation is done using a combination of Beamr’s technology with the Nvidia NVENC encoder, using their recently released APIs, it becomes a particularly cutting-edge solution, enjoying the benefits of the leading solution in hardware AV1 encoding.
The benefit of switching to more modern codecs lies of course in the higher compression efficiency that they offer. While the extent of improvement is very dependent on the actual content, bitrates and encoders used, HEVC is considered to offer gains of 30%-50% over AVC, meaning that for the same quality you can spend up to 50% fewer bits. For AV1 this increase is generally a bit higher.. As more and more on-device support is added for these newer codecs, the advantage of utilizing them to reduce both storage and bandwidth is clear.
Generally speaking, performing such codec modernization involves some non-trivial steps.
First, you need to get access to the modern encoder you want to use, and know enough about it in order to configure the encoder correctly for your needs. Then you can proceed to encoding using one of the following approaches.
The first approach is to perform bit-rate driven encoding. One possibility is to use conservative bitrates, in which case the potential reduction in size will not be achieved. Another possibility is to set target bitrates that reflect the expected savings, in which case there is a risk of losing quality. For example, In an experimental test of files which were converted from their AVC source to HEVC, we found that on average, a bitrate reduction of 50% could be obtained when using the Beamr CABR codec modernization approach. However, when the same files were all brute-force encoded to HEVC at 50% reduced bitrate, using the same encoder and configuration, the quality took a hit for some of the files.
This example shows the full AVC source frame on top, with the transcodes to HEVC below it. Note the distortion in the blind HEVC encode, shown on the left, compared to the true-to-source video transformed with CABR on the right.
The second approach is to perform the transcode using a quality driven encode, for instance using the constant QP (Quantization Parameter) or CRF (Constant Rate Factor) encoding modes with conservative values, which will in all likelihood preserve the quality. However, in this case you are likely to unnecessarily “blow up” some of your files to much higher bitrates. For example, for the UGC content shown below, transcoding to HEVC using a software encoder and CRF set to 21 almost doubled the file size.
Yet another approach is to use a trial and error encode process for each file or even each scene, manually verifying that a good target encoding setup was selected which minimizes the bitrate while preserving the quality. This is of course an expensive and cumbersome process, and entirely unscalable.
By using Beamr CABR this is all done for you under the hood, in a fully automatic process, which makes optimized choices for each and every frame in your video, selecting the lowest bitrate that will still perfectly preserve the source visual quality. When performed using the Nvidia NVENC SDK with interfaces to Beamr’s CABR technology, this transformation is significantly accelerated and becomes even more cost effective.
The codec modernization flow is demonstrated for AVC to HEVC conversion in the above high-level block diagram. As shown here, the CABR controller interacts with NVENC, Nvidia’s hardware video encoder, using the new APIs Nvidia has created for this purpose. At the heart of the CABR controller lies Beamr’s Quality Measure, BQM, a unique, patented, Emmy award winning perceptual video quality measure. BQM has now been adapted and ported to the Nvidia GPU platform, resulting in significant acceleration of the optimization process .
The Beamr optimization technology can be used not only for codec modernization, but also to reduce bitrate of an input video, or of a target encode, while guaranteeing the perceptual quality is preserved, thus creating encodes with the same perceptual quality at lower bitrates or file sizes. In any and every usage of the Beamr CABR solution, size or bitrate are reduced as much as possible while each frame of the optimized encode is guaranteed to be perceptually identical to the reference. The codec modernization use case is particularly exciting as it puts the ability to migrate to more efficient and sophisticated codecs, previously used primarily by video experts, into the hands of any user with video content.
For more information please contact us at info@beamr.com
It has been two years since we published a comparison of the two leading HEVC software encoder SDKs; Beamr 5, and x265. In this article you will learn how Beamr’s AVC and HEVC software codec SDKs have widened the computing performance gap further over x264 and x265 for live broadcast quality streaming.
Why Performance Matters
With the performance of our AVC (Beamr 4) and HEVC (Beamr 5) software encoders improving several orders of magnitude over the 2017 results, it felt like the right time to refresh our benchmarks, this time with real-time operational data.
It’s no secret that x264 and x265 have benefited greatly, as open-source projects, from having thousands of developers working on the code. This is what makes x264 and x265 a very high bar to beat. Yet even with so many bright and talented engineers donating tens of thousands of development hours to the code base, the architectural constraints of how these encoders were built limit the performance on multicore processors as you will see in the data below.
Creative solutions have been developed which enable live encoding workflows to be built using open-source. But it’s no secret that they come with inherent flaws that include being overly compute intensive and encumbered with quality issues as a result of not being able to encode a full ABR stack, or even a single 4K profile, on a single machine.
The reason this matters is because the resolutions that video services deliver continues to increase. And as a result of exploding consumer demand for video, data consumption is consuming network bandwidth to the point that Cisco reports in 2022, 82% of Internet traffic will be video.
Cisco says in their Visual Networking Index that by 2022 SD resolution video will comprise just 23.4% of Internet video traffic, compared to the 60.2% of Internet video traffic that SD video consumed in 2017. What use to represent the middle-quality tier, 480p (SD), has now become the lowest rung of the ABR ladder for many video distributors.
1080p (HD) will makeup 56.2% of Internet video traffic by 2022, an increase from 36.1% in 2017. And if you thought the resolution expansion was going to end with HD, Cisco is claiming in 2022, 4K (UHD) will comprise 20.3% of all Internet-delivered video.
Live video services are projected to expand 15x between 2017 and 2022, meaning within the next three years, 17.1% of all Internet video traffic will be comprised of live streams.
These trends demonstrate the industry’s need to prepare for this shift to higher resolution video and real-time delivery with software encoding solutions that can meet the requirement for live broadcast quality 4K.
Blazing Software Performance on Multicore Processors
The Beamr 5 software encoder utilizes an advanced thread management architecture. This represents a key aspect of how we can achieve such fantastic speed over x265 at the same quality level.
x265 works by creating software threads and adding them to a thread pool where each task must wait its turn. In contrast, Beamr 5 tracks all the serial dependencies involved with the video coding tasks it must perform, and it creates small micro-tasks which are efficiently distributed across all of the CPU cores in the system. This allows the Beamr codec to utilize each available core at almost 100% capacity.
All tasks added to the Beamr codec thread pool may be executed immediately so that no hardware thread is wasted on tasks where the data is not yet available. Interestingly, under certain conditions, x265 can appear to have higher CPU utilization. But, this utilization includes software threads which are not doing any useful work. This means they are “active” but not processing data that is required for the encoding process.
Adding to Beamr encoders thread efficiency, we have implemented patented algorithms for more effective and efficient video encoding, including a fast motion estimation process and a heuristic early-termination algorithm which enables the encoder to reach a targeted quality using fewer compute resources (cycles). Furthermore, Beamr encoders utilize the latest AVX-512 SIMD instruction set for squeezing even more performance out of advanced CPUs.
The end result of the numerous optimizations found in the Beamr 4 (AVC) and Beamr 5 (HEVC) software encoders is that they are able to operate nearly twice as fast as x264 and x265 at the same quality, and with similar settings and tools utilization.
Video streaming services can benefit from this performance advantage in many ways, such as higher density (more channels per server) which reduces operational costs. To illustrate what this performance advantage can do for you- consider that at the top-end, Beamr 5 is able to encode 4K, 10-bit video at 60 FPS in real-time using just 9 Intel Xeon Scalable cores where x265 is unable to achieve this level of performance with any number of computing cores (at least on a single machine). And, as a result of being twice as efficient, Beamr 4 and Beamr 5 can deliver higher quality at the same computing envelope as x264 and x265.
The Test Results
For our test to be as real-world as possible, we devised two methodologies. In the first, we measured the compute performance of an HEVC ABR stack operating both Beamr 5 and x265 at live speed. And for the second test, our team measured the number of simultaneous live streams at 1080p, comparing Beamr 4 with x264, and Beamr 5 with x265; and for 4K comparing Beamr 5 with x265. All tests were run on a single machine.
Live HEVC ABR Stack: Number of ABR Profiles (Channels)
This test was designed to find the maximum number of full ABR channels which can be encoded live by Beamr 5 and x265 on an AWS EC2 c5.24xlarge instance.
Each AVC channel comprises 4 layers of 8-bit 60 FPS video starting from 1080p, and the HEVC channel comprises either 4 layers of 10-bit 60 FPS video (starting from 1080p), or 5 layers of 10-bit 60 FPS video (starting from 4K).
Live HEVC ABR Stack Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Presets:
Beamr 5: INSANELY_FAST
x265: ultrafast
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all layers): 60 FPS
Encoded Bit Depth (all layers): 10-bit
Encoded Resolutions and Bitrates:
4Kp60@18000 Kbps (only in 4K ABR stack)
1080p60@3750 Kbps
720p60@2500 Kbps
576p60@1250 Kbps
360p@625 Kbps
Live HEVC ABR Stack Test – RESULTS
NOTES:
(1) When encoding 2 full ABR stacks with Beamr 5, 25% of the CPU is unused and available for other tasks.
(2) x265 cannot encode even a single 4K ABR stack channel at 60 FPS. The maximum FPS for the 4K layer of a single 4K ABR stack channel using x265 is 35 FPS.
Live AVC & HEVC Single-Resolution: Number of Channels (1080p & 4K)
In this test, we are trying to discover the maximum number of single-resolution 4K and HD channels that can be encoded live by Beamr 4 and Beamr 5 as compared with x264 and x265, on a c5.24xlarge instance. As with the Live ABR Channels test, the quality between the two encoders as measured by PSNR, SSIM and VMAF was always found to be equal, and in some cases better with Beamr 4 and Beamr 5 (see the “Quality Results” section below).
Live AVC Beamr 4 vs. x264 Channels Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Speeds / Presets:
Beamr 4: speed 3
x264: preset medium
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all channels): 60 FPS
Encoded Bit Depth (all channels): 8-bit
Channel Resolutions and Bitrates:
1080p60@5000 Kbps
Live AVC Beamr 4 vs. x264 Channels Test – RESULTS
Live HEVC Beamr 5 vs. x265 Channels Test – CONFIGURATION
Platform:
AWS EC2 c5.24xlarge instance
Intel Xeon Scaleable Cascade Lake @ 3.6 GHz
48 cores, 96 threads
Speeds / Presets:
Beamr 5: INSANELY_FAST
x265: ultrafast
Content: Netflix 10-bit 4Kp60 sample clips (DinnerScene and PierSeaside)
Encoded Frame Rate (all channels): 60 FPS
Encoded Bit Depth (all channels): 10-bit
Channel Resolutions and Bitrates:
4K@18000 Kbps
1080p60@3750 Kbps
Live HEVC Beamr 5 vs. x265 Channels Test – RESULTS
NOTES:
(1) x265 was unable to reach 60 FPS for a single 4K channel, achieving just 35 FPS at comparable quality.
Quality Comparisons (PSNR, SSIM, VMAF)
Beamr 5 vs. x265
NOTES:
As previously referenced, x265 was unable to reach 4Kp60 and thus PSNR, SSIM, and VMAF scores could not be calculated, hence the ‘N/A’ designation in the 3840×2160 cells.
Video engineers are universally focused on the video encoding pillars of computing efficiency (performance), bitrate efficiency, and quality. Even as technology has enabled each of these pillars to advance with new tool sets, it’s well known that there is still a tradeoff between each that is required.
On one hand, bitrate efficiency requires tools that sap performance, and on the other hand, to reach a performance (speed) target, tools which could positively affect quality cannot be used without harming the performance characteristics of the encoding pipeline. As a result, many video encoding practitioners have adapted to the reality of these tradeoffs and simply accept them for what they are. Now, there is a solution…
The impact of adopting Beamr 4 for AVC and Beamr 5 for HEVC transcends a TCO calculation. With Beamr’s high-performance software encoders, services can achieve bitrate efficiency and performance, all without sacrificing video quality.
The use of Beamr 4 and Beamr 5 opens up improved UX with an increase in resolution or frame-rate which means it is now possible for everyone to stream higher quality video. As the competitive landscape for video delivery services continues to evolve, never has the need been greater for an AVC and HEVC codec implementation that can deliver the best of all three pillars: performance, bitrate efficiency, and quality. With the performance data presented above, it should be clear that Beamr 4 and Beamr 5 continue to be the codec implementations to beat.
Video engineers dedicated to engineering encoding technologies are highly skilled and hyper-focused on developing the foundation for future online media content. Such a limited pool of experts in this field creates a lot of opportunity for growth and development, it also means there must be a level of camaraderie and cooperation between different methodologies.
In past episodes, you’ve seen The Video Insiders compare codecs head-to-head and debate over their strengths and weaknesses. Today, they are tackling a deeper debate between encoding experts: the advantages and disadvantages of proprietary technology vs. community-driven open source.
In Episode 05, Tom Vaughan surprises The Video Insiders as he talks through his take on open source vs. proprietary technology.
Want to join the conversation? Reach out to TheVideoInsiders@beamr.com
TRANSCRIPTION (lightly edited to improve readability only)
Mark Donnigan: 00:00 In this episode, we talk with a video pioneer who drove a popular open source codec project before joining a commercial codec company. Trust me, you want to hear what he told us about proprietary technology, open source, IP licensing, and royalties.
Announcer: 00:18 The Video Insiders is the show that makes sense of all that is happening in the world of online video, as seen through the eyes of a second generation codec nerd and a marketing guy who knows what iframes and macroblocks are. Here are your hosts, Mark Donnigan and Dror Gill.
Mark Donnigan: 00:56 You know, we’re like encoding geeks. I mean, are there even 180 of us in the world?
Dror Gill: 01:01 I don’t know. I think you should count the number of people who come to Ben Wagoner’s compressionist breakfast at NAB, that’s about the whole industry, right?
Mark Donnigan: 01:09 Yeah. That’s the whole industry.
Mark Donnigan: 01:11 Hey, we want to thank, seriously in all seriousness, all the listeners who have been supporting us and we just really appreciate it. We have an amazing guest lined up for today. This is a little personal for me. It was IBC 2017, I had said something about a product that he was representing, driving, developing at the time. In fact, it was factually true. He didn’t like it so much and we exchanged some words. Here’s the ironic thing, this guy now works for us. Isn’t that amazing, Dror?
Mark Donnigan: 01:52 You know what, and we love each other. The story ended well, talk about a good Hollywood ending.
Mark Donnigan: 01:58 Well, we are talking today with Tom Vaughn. I’m going to let you introduce yourself. Tell the listeners about yourself.
Tom Vaughn: 02:10 Hey Mark, hey Dror. Good to be here.
Tom Vaughn: 02:12 As Mark mentioned, I’m Beamr’s VP of strategy. Joined Beamr in January this year. Before that I was Beamr’s, probably, primary competitor, the person who started and led the x265 project at MulticoreWare. We were fierce competitors, but we were always friendly and always friends. Got to know the Beamr team when Beamr first brought their image compression science from the photo industry to the video industry, which was three or four years ago. Really enjoyed collaborating with them and brainstorming and working with them, and we’ve always been allies in the fight to make new formats successful and deal with some of the structural issues in the industry.
Dror Gill: 03:02 Let me translate. New formats, that means HEVC. Structural issues, that means patent royalties.
Tom Vaughn: 03:13 Yeah, we had many discussions over the years about how to deal with the challenging macro environment in the codec space. I decided to join the winning team at Beamr this year, and it’s been fantastic.
Mark Donnigan: 03:28 Well, we’re so happy to have you aboard, Tom.
Mark Donnigan: 03:32 I’d like to just really jump in. You have a lot of expertise in the area of open source, and in the industry, there’s a lot of discussion and debate, and some would even say there’s religion, around open source versus proprietary technology, but you’ve been on both sides and I’d really like to jump into the conversation and have you give us a real quick primer as to what is open source.
Tom Vaughn: 04:01 Well, open source is kind of basic what it says is that you can get the full source code to that software. Now, there isn’t just one flavor of open source in terms of the software license that you get, there are many different open source licenses. Some have more restrictions and some have less restrictions on what you can do. There are some well known open source software programs and platforms, Linux is probably the most well known in the multimedia space, there’s FFmpeg and Libav. There’s VLC, the multimedia player. In the codec space, x264, x265, VP9, AV1, et cetera.
Dror Gill: 04:50 I think the main attraction of open source, I think, the main feature is that people from all over the world join together, collaborate, each one contributes their own piece, then somehow this is managed together. Every bug that is discovered, anyone can fix it, because the source is open. This creates kind of a community and together a piece of software is created that is much larger and more robust than anything that a single developer could do on his own.
Tom Vaughn: 05:23 Yeah, ideally the fact that the source code is open means that you have many sets of eyes, not only trying the program, but able to go through the source code and see exactly how it was written and therefore more code review can happen. On the collaboration side, you’re looking for volunteers, and if you can find and energize many, many people worldwide to become enthusiastic and devote time or get their companies motivated to allocate developers full- or part-time to a particular open source project, you get that collaboration from many different types of people with different individual use cases and motivations. There are patches submitted from many different people, but someone has to decide, does that patch get committed or are there problems with that? Should it be changed?
Tom Vaughn: 06:17 Designed by a committee isn’t always the optimal, so someone or some small group has to decide what should be included, what should be left out.
Dror Gill: 06:27 It’s interesting to see, actually, the difference between x264 and x265 in this respect, because x264, the open source implementation of x264 was led by a group of developers, really independent developers, and no single company was owning or leading the development of that open source project. However, with x265, which is the open source implementation of HEVC, your previous company, MulticoreWare, has taken the lead and devoted, I assume, most of the development resources that have gone into the open source development, most of the contributions came from that company, but it is still an open source project.
Tom Vaughn: 07:06 That’s right. x264 was started by some students at a French university, and when they were graduating, leaving the university, they convinced the university to enable them to take the code with them, essentially under an open source license. It was very much grassroots open source beginnings and execution where developers may come and go, but it was a community collaboration.
Tom Vaughn: 07:31 I started x265 at MulticoreWare with a couple of other individuals, and the way we started it was finding some commercial companies who expressed a strong interest in such a thing coming to life and who were early backers commercially. It was quite different. Then, because there’s a small team of full-time developers on it working 40 hours plus a week, that team is moving very fast, it’s organized, it’s within a company. There was less of a need for a community. While we did everything we could to attract more external contributors, attracting contributors is always a challenge of open source projects.
Mark Donnigan: 08:14 What I hear you saying, Tom, is it sounds like compared to the x264 project, the x265 project didn’t have as large of a independent group of contributors. Is that …?
Tom Vaughn: 08:29 Well, x264 was all independent contributors.
Tom Vaughn: 08:33 And still is, essentially. There are many companies that fund x264 developers explicitly. Chip companies will fund individual developers to optimize popular open source software projects for their instruction set. AVX, AVX2, AVX512, essentially, things like that.
Tom Vaughn: 08:58 HEVC is significantly more complex than AVC, and I think, if I recall correctly, x265 already has three times the number of commits than x264, even though it’s only been in existence for one third of the life.
Dror Gill: 09:12 So Tom, what’s interesting to me is everybody’s talking about open source software being almost synonymous with free software. Is open source really free? Is it the same?
Tom Vaughn: 09:23 It can be at times. One part depends on the license and the other part depends on how you’re using the software. For example, if it’s a very open license like Apache, or BSD, or UIUC, that’s an attribution only license, and you’re pretty much free to create modifications, incorporate the software in your own works and distribute the resulting system.
Tom Vaughn: 09:49 Software programs like x264 and x265 are licensed under the GNU GPL V2, that is an open source license that has a copyleft requirement. That means if you incorporate that in a larger work and distribute that larger work, you have to open source not only your modifications, but you have to open source the larger work. Most commercial companies don’t want to incorporate some open source software in their commercial product, and then have to open source the commercial product. The owners of the copyright of the GPL V2 code, x264 LLC or MulticoreWare, also offer a commercial license, meaning you get access to that software, not under the GNU GPL V2, but under a separate, different license, in which case for you, it’s not open source anymore. Your commercial license dictates what you can and can’t do. Generally that commercial license doesn’t include the copyleft requirement, so you can incorporate it in some commercial product and distribute that commercial product without open sourcing your commercial product.
Dror Gill: 10:54 Then you’re actually licensing that software as you would license it from a commercial company.
Tom Vaughn: 10:59 Exactly. In that case it’s not open source at all, it’s a commercial license.
Dror Gill: 11:04 It’s interesting what you said about the GPL, the fact that anything that you compile with it, create derivatives of, incorporate into your software, you need to open source those components that you integrate with as well. I think this is what triggered Steve Ballmer to say in 2001, he said something like, “Open source is a cancer that spreads throughout your company and eats your IP.” That was very interesting. I think he meant mostly GPL because of that requirement, but the interesting thing is that he said that in 2001, and in 2016 in an interview, he said, “I was wrong and I really love Linux.” Today Microsoft itself open sources a lot of its own development.
Mark Donnigan: 11:48 That’s right. Yeah, that’s right.
Mark Donnigan: 11:50 Well Tom, let’s … This has been an awesome discussion. Let’s bring it to a conclusion. When is proprietary technology the right choice and when is open source maybe the correct choice? Can you give the listeners some guidelines?
Tom Vaughn: 12:08 Sure, people are trying to solve problems. Engineers, companies are trying to build products and services, and they have to compete in their own business environment. Let’s say you’re a video service and you run a video business. The quality of that video and the efficiency that you can deliver that video matters a lot. We know what those advantages of open source are, and all things being equal, people gravitate towards open source a lot because engineers feel comfortable actually seeing the source code, being able to read through it, find bugs themselves if pushed to the limit.
Tom Vaughn: 12:45 At the end of the day, if an open source project can’t produce the winning implementation of something, you shouldn’t necessarily use it just because it’s open source. At the end of the day you have a business to run and what you want is the most performant libraries and platforms to build your business around. If you find that a proprietary implementation in the long run is more cost effective, more efficient, higher performance, and the company that is behind that proprietary implementation is solid and is going to be there for you and provide a contractual commitment to support you, there’s no reason to not choose some proprietary code to incorporate into your product or service.
Tom Vaughn: 13:32 When we’re talking about codecs, there are particular qualities I’m looking for, performance, how fast does it run? How efficiently does it utilize compute resources? How many cores do I need in my server to run this in real time? And compression efficiency, what kind of video quality can I get at a given bit rate under a given set of conditions? I don’t want the second best implementation, I want the best implementation of that standard, because at scale, I can save a lot of money if I have a more efficient implementation of that standard.
Mark Donnigan: 14:01 Those are excellent pointers. It just really comes back to we’re solving problems, right? It’s easy to get sucked into religious debates about some of these things, but at the end of the day we all have an obligation to do what’s right and what’s best for our companies, which includes selecting the best technology, what is going to do the best job at solving the problems.
Mark Donnigan: 14:24 Thank you again for joining us.
Mark Donnigan: 14:33 Well, we want to thank you the listener for, again, joining The Video Insiders. We hope you will subscribe. You can go to thevideoinsiders.com and you can stream from your browser, you can subscribe on iTunes. We’re on Spotify. We are on Google Play. We’re expanding every day.
Announcer: 14:57 Thank you for listening to The Video Insiders podcast, a production of Beamr Limited. To begin using Beamr’s codecs today, go to Beamr.com/free to receive up to 100 hours of no-cost HEVC and H.264 transcoding every month.
At the 2018 Consumer Electronics Show, video hardware manufacturers came out swinging on the innovation front—including 8K TVs and a host of whiz-bang UX improvements—leading to key discussions around the business and economic models around content and delivery.
On the hardware side, TV has dominated at CES, with LG and Samsung battling it out over premium living room gear. LG, in addition to debuting a 65-inch rollable OLED screen, made headlines with its announcement of an 88-inch 8K prototype television. It’s backed by the new Alpha 9 intelligent processor, which provides seven times the color reproduction over existing models, and can handle up to 120 frames per second for improved gaming and sports viewing.
Not to be outdone, Samsung has debuted its Q9S 8K offering (commercially available in the second half of the year), featuring an 85-inch screen with built-in artificial intelligence that uses a proprietary algorithm to continuously learn from itself to intelligently upscale the resolution of the content it displays — no matter the source of that content.
The Korean giant also took the wraps off of what it is calling “the Wall,” which, true to its name, is an enormous 146-inch display. It’s not 8K, but it’s made up of micro LEDs that it says will let consumers “customize their television sizes and shapes to suit their needs.” It also said that its newest TVs will incorporate its artificial digital assistant Bixby and a universal programming guide with AI that learns your viewing preferences.
It’s clear that manufacturers are committed to upping their games when it comes to offering better consumer experiences. And it’s not just TVs that are leading this bleeding edge of hardware development: CES has seen announcements around 4K VR headsets (HTC), video-enabled drones, cars that can utilize a brain-hardware connection to tee up video-laden interactive apps, and a host of connected home gadgets—all of which will be driving the need for a combination of reliable hardware platforms, content availability and, perhaps above all, a positive economic model for content delivery.
This year CES provided a view into the next generation of video entertainment possibilities that are in active development. But it will all be for naught if content producers and distributors don’t have reliable and scalable delivery networks for compatible video, where costs don’t spiral out of control as the network becomes more content-intensive. For instance, driving down the bitrate requirements for delivering, say, 8K, whether it’s in a pay-TV traditional operator model or on an OTT basis, will be one linchpin for this vision of the future.
We’re committed to making sure we are in the strongest position to bring our extensive codec development resources to bear on this ecosystem. HEVC, for instance, is recognized to be 40 to 50 percent more efficient for delivering video than legacy format, AVC H.264. With Beamr’s advanced encoding offerings, content owners can optimize their encoding for reduced buffering, faster start times, and increased bandwidth savings.
We’re also keeping an eye on the progression of the Alliance for Open Media (AOMedia)’s AV1 codec standard, which recently added both Apple and Facebook to its list of supporters. It hopes to be up to 30 percent more efficient than HEVC, though it’s very much in the development stages.
We’re excited about the announcements coming out of CES this year, and the real proof that the industry is well on its way to delivering an exponential improvement on the consumer video experience. We also look forward to helping that ecosystem mature and doing our part to make sure that innovation succeeds, for 8K in the living room and very much beyond.
Every video encoding professional faces the dilemma of how best to detect artifacts and measure video quality. If you have the luxury of dealing with high bitrate files then this becomes less of an “issue” since for many videos throwing enough bits at the problem means an acceptably high video quality is nearly guaranteed. However, for those living in the real world where 3 Mbps is the average bitrate that they must target, then compressing at scale requires metrics (algorithms) to help measure and analyze the visual artifacts in a file after encoding. This process is becoming even more sophisticated as some tools enable a quality measure to feed back into the encoding decision matrix, but more commonly quality measures are used as a part of the QC step. For this post we are going to focus on the application of quality measures used as part of the encoding process.
There are two common quality measures, PSNR and SSIM that we will discuss, but as you will see there is a third one and that is the Beamr quality measure that the bulk of this article will focus on.
PSNR, the Original Objective Quality Measure
PSNR, peak signal-to-noise ratio represents the ratio between the highest power of an original signal and the power level of the distortion. PSNR is one of the original engineering metrics that is used to measure the quality of image and video codecs. When comparing or measuring the quantitative quality of two files such as an original and a compressed version, PSNR attempts to approximate the difference between the compressed and the original. A significant shortcoming is that PSNR may indicate that the reconstruction is of suitably high quality when in some cases it is not. For this reason a user must be careful to not hold the results in high regard.
What is SSIM?
SSIM or the structured similarity index is a technique to predict the perceived quality of digital images and videos. The initial version was developed at the University of Texas at Austin while the full SSIM routine was developed jointly at New York University’s Laboratory for Computational Vision. SSIM is a perceptual model based algorithm that takes into account image degradation as a perceived shift in structural information, while including crucial perceptual detail, such as luminance and contrast masking. The difference compared with other techniques like PSNR is that this approach attempts to estimate absolute errors.
The basis of SSIM is the assumption that pixels have strong inter-dependencies and these dependencies contain needed information about the structure of the object in the scene, GOP or adjacent frames. Put simply, structured similarity is used for computing the similarity of two images. SSIM is a full reference metric where the computation and measurement of image quality is based on an uncompressed image as a reference. SSIM was developed as a step up over traditional methods such as PSNR (peak signal-to-noise ratio) which has proven to be uncorrelated with human vision. Yet, unfortunately SSIM itself is not perfect and can be easily fooled as shown by the following graphic which illustrates that though the original and compressed are closely correlated visually, PSNR and SSIM scored them as being not similar. Meanwhile, Beamr and MOS (mean opinion score), show them as being closely correlated.
Beamr Quality Measure
The Beamr quality measure is based on a proprietary, low complexity, reliable, perceptually aligned quality measure. The existence of this measure enables controlling a video encoder, to obtain an output clip with (near) maximal compression of the video input, while still maintaining the input video resolution, format and visual quality (PQ). This is performed by controlling the compression level of each frame, or GOP, in the video sequence, in such a way that is as deeply compressed as it can be, while still resulting in a perceptually identical output.
The Beamr quality measure is also a full-reference measure, i.e. it indicates a quality of a recompressed image or video frame when compared to a reference or original image or video frame, which is in accordance with the challenges our technology aims to tackle such as reducing bitrates to the maximum extent possible without imposing any quality degradation from the original. (as perceived by the human visual system). The Beamr quality measure calculation consists of two parts: A pre-process of the input video frames in order to obtain various score configuration parameters, and an actual score calculation done per candidate recompressed frame. Following is a system diagram of how the Beamr quality measure would interact with an encoder.
Application of the Beamr Quality Measure in an Encoder
The Beamr quality measure when integrated with an encoder enables the bitrate of video files to be reduced by up to an additional 50% over the current state of the art standard compliant block based encoders, without compromising image quality or changing the artistic intent. If you view a source video and a Beamr-optimized video side by side, they will look exactly the same to the human eye.
A question we get asked frequently is “How do you perform the “magic” of removing bits with no visual impact?”
Well, believe it or not there is no magic here, just solid technology that has been actively in development since 2009, and is now covered by 26 granted patents and over 30 additional patent applications.
When we first approached the task of reducing video bitrates based on the needs of the content and not a rudimentary bitrate control mechanism, we asked ourselves a simple starting question, “Given that the video file has already been compressed, how many additional bits can the encoder remove before the typical viewer would notice?”
There is a simple manual method of answering this question, just take a typical viewer, show them the source video and the processed video side by side, and then start turning down the bitrate knob on the processed video, by gradually increasing the compression. And at some point, the user will say “Stop! Now I can see the videos are no longer the same!”
At that point, turn the compression knob slightly backwards, and there you have it – a video clip that has an acceptably lower bitrate than the source, and just at the point before the average user can notice the visual differences.
Of course I recognize what you are likely thinking, “Yes, this solution clearly works, but it doesn’t scale!” and you are correct. Unfortunately many academic solutions suffer from this problem. They make for good hand built demos in carefully controlled environments with hand picked content, but put them out in the “wild” and they fall down almost immediately. And I won’t even go into the issues of varying perception among viewers of different ages, or across multiple viewing conditions.
Another problem with such a solution is that different parts of the videos, such as different scenes and frames, require different bitrates. So the question is, how do you continually adjust the bitrate throughout the video clip, all the time confirming with your test viewer that the quality is still acceptable? Clearly this is not feasible.
Automation to the Rescue
Today, it seems the entire world is being infected with artificial intelligence which in many cases is not much more than automation that is smart and able to adapt to its environment. So we too looked for a way to automate this image analysis process. That is take a source video, and discover a way to reduce the “non-visible” bits in a fully automatic manner, with no human intervention involved. A suitable solution would enable the bitrate to vary continuously throughout the video clip based on the needs of the content at that moment.
What is CABR?
You’ve heard of VBR or variable bitrate, Beamr has coined the term CABR or content-adaptive bitrate to summarize the process just described where the encoder is adjusted at the frame level based on quality requirements, rather than relying only on a bit budget to make decisions of where bits are applied and the number needed. But we understood that in order to accomplish the vision of CABR, we would need to be able to simulate perception of a human viewer.
We needed an algorithm that would answer the question, “Given two videos, can a human viewer tell them apart?” This algorithm is called a Perceptual Quality Measure and it is the very essence of what sets Beamr so far apart from every other encoding solution in the market today.
A quality measure is a mathematical formula, which tries to quantify the differences between two video frames. To implement our video optimization technology, we could have used one of the well-known quality measures, such as PSNR (Peak Signal to Noise Ratio) or SSIM (Structural SIMilarity). But as already discussed, the problem with these existing quality measures is that they are simply not reliable enough as they do not correlate highly enough with human vision.
There are other sophisticated quality measures which correlate highly enough with human viewer opinions to be useful, but since they require extensive CPU power they cannot be utilized in an encoding optimization process, which requires computing the quality measures several times for each input frame.
Advantages of the Beamr Quality Measure
With the constraints of objective quality measures we had no choice but to develop our own quality measure, and we developed it with a very focused goal: To identify and quantify the specific artifacts created by block-based compression methods.
All of the current image and video compression standards, including JPEG, MPEG-1, MPEG-2, H.264 (AVC) and H.265 (HEVC) are built upon block based principles.
They divide an image into blocks, attempt to predict the block from previously encoded pixels, and then transform the block into the frequency domain, and quantize it.
All of these steps create specific artifacts, which the Beamr quality measure is trained to detect and measure. So instead of looking for general deformations, such as out of focus images, missing pixels etc. which is what general quality measures do, in contrast, we look for artifacts that were created by the video encoder.
This means that our quality measure is tightly focused and extremely efficient, and as a result, the CPU requirements of our quality measure are much lower than quality measures that try to model the Human Visual System (HVS).
Beamr Quality Measure and the Human Visual System
After years of developing our quality measure, we put it to the test, under the strict requirements of ITU BT-500, which is an international standard for testing image quality. We were happy to find that the correlation of our quality measure with subjective (human) results was extremely high.
When the testing was complete, we felt certain this revolutionary quality measure was ready for the task of accurately comparing two images for similarity, from a human point of view.
But compression artifacts are only part of the secret. When a human looks at an image or video, the eye and the brain are drawn to particular places in the scene, for example, places where there is movement, and in fact we are especially “tuned” to capture details in faces.
Since our attention is focused on these areas, artifacts are more disturbing than the same artifacts in other areas of the image, such as background regions or out-of-focus areas. For this reason the Beamr quality measure takes this into account, and it ensures that when we measure quality proper attention is given to the areas that require it.
Furthermore, the Beamr quality measure takes into account temporal artifacts, introduced by the encoder, because it is not sufficient to ensure that each frame is not degraded, it is also necessary to preserve the quality and feel of the video’s temporal flow.
The Magic of Beamr
With the acquisition last year of Vanguard Video many industry observers have gone public with the idea that the combination of our highly innovative quality measure tightly integrated with the world’s best encoder, could lead to a real shake up of the ecosystem.
We encourage you to see for yourself what is possible when the world’s most advanced perceptual quality measure becomes the rate-control mechanism for the industry’s best quality software encoder. Check out Beamr Optimizer.
National Geographic has a hit TV franchise on its hands. It’s called Brain Games starring Jason Silva, a talent described as “a Timothy Leary of the viral video age” by the Atlantic. Brain Games is accessible, fun and accurate. It’s a dive into brain science that relies on well-produced demonstrations of illusions and puzzles to showcase the power — and limitation — of the human brain. It’s compelling TV that illuminates how we perceive the world.(Intrigued? Watch the first minute of this clip featuring Charlie Rose, Silva, and excerpts from the show: https://youtu.be/8pkQM_BQVSo ) At Beamr, we’re passionate about the topic of perceptual quality. In fact, we are so passionate, that we built an entire company based on it. Our technology leverages science’s knowledge about the human vision system to significantly reduce video delivery costs, reduce buffering & speed-up video starts without any change in the quality perceived by viewers. We’re also inspired by the show’s ability to turn complex things into compelling and accessible, without distorting the truth. No easy feat. But let’s see if we can pull it off with a discussion about video quality measurement which is also a dense topic. Basics of Perceptual Video Quality Our brains are amazing, especially in the way we process rich visual information. If a picture’s worth 1,000 words. What’s 60 frames per second in 4k HDR worth? The answer varies based on what part of the ecosystem or business you come from, but we can all agree that it’s really impactful. And data intensive, too. But our eyeballs aren’t perfect and our brains aren’t either – as Brain Games points out. As such, it’s odd that established metrics for video compression quality in the TV business have been built on the idea that human vision is mechanically perfect. See, video engineers have historically relied heavily on two key measures to evaluate the quality of a video encode: Peak Signal to Noise Ratio, or PSNR, and Structured Similarity, or SSIM. Both metrics are ‘objective’ metrics. That is, we use tools to directly measure the physics of the video signal and construct mathematical algorithms from that data to create metrics. But is it possible to really quantify a beautiful landscape with a number? Let’s see about that. PSNR and SSIM look at different physics properties of a video, but the underlying mechanics for both metrics are similar. You compress a source video where the properties of the “original” and derivative are then analyzed using specific inputs, and metrics calculated for both. The more similar the two metrics are, the more we can say that the properties of each video are similar, and the closer we can define our manipulation of the video, i.e. our encode, as having a high or acceptable quality.
Objective Quality vs. Subjective Quality However, it turns out that these objectively calculated metrics do not correlate well to the human visual experience. In other words, in many cases, humans cannot perceive variations that objective metrics can highlight while at the same time, objective metrics can miss artifacts a human easily perceives. The concept that human visual processing might be less than perfect is intuitive. It’s also widely understood in the encoding community. This fact opens a path to saving money, reducing buffering and speeding-up time-to-first-frame. After all, why would you knowingly send bits that can’t be seen? But given the complexity of the human brain, can we reliably measure opinions about picture quality to know what bits can be removed and which cannot? This is the holy grail for anyone working in the area of video encoding. Measuring Perceptual Quality Actually, a rigorous, scientific and peer-reviewed discipline has developed over the years to accurately measure human opinions about the picture quality on a TV. The math and science behind these methods are memorialized in an important ITU standard on the topic originally published in 2008 and updated in 2012. ITU BT.500 (International Telecommunications Union is the largest standards committee in global telecom.) I’ll provide a quick rundown. First, a set of clips is selected for testing. A good test has a variety of clips with diverse characteristics: talking heads, sports, news, animation, UGC – the goal is to get a wide range of videos in front of human subjects. Then, a subject pool of sufficient size is created and screened for 20/20 vision. They are placed in a light-controlled environment with a screen or two, depending on the set-up and testing method. Instructions for one method is below, as a tangible example. In this experiment, you will see short video sequences on the screen that is in front of you. Each sequence will be presented twice in rapid succession: within each pair, only the second sequence is processed. At the end of each paired presentation, you should evaluate the impairment of the second sequence with respect to the first one. You will express your judgment by using the following scale: 5 Imperceptible 4 Perceptible but not annoying 3 Slightly annoying 2 Annoying 1 Very annoying Observe carefully the entire pair of video sequences before making your judgment. As you can imagine, testing like this is an expensive proposition indeed. It requires specialized facilities, trained researchers, vast amounts of time, and a budget to recruit subjects. Thankfully, the rewards were worth the effort for teams like Beamr that have been doing this for years. It turns out, if you run these types of subjective tests, you’ll find that there are numerous ways to remove 20 – 50% of the bits from a video signal without losing the ‘eyeball’ video quality – even when the objective metrics like PSNR and SSIM produce failing grades. But most of the methods that have been tried are still stuck in academic institutions or research labs. This is because the complexities of upgrading or integrating the solution into the playback and distribution chain make them unusable. Have you ever had to update 20 million set-top boxes? Well if you have, you know exactly what I’m talking about. We know the broadcast and large scale OTT industry, which is why when we developed our approach to measuring perceptual quality and applied it to reducing bitrates, we were insistent on staying 100% inside the standard of AVC H.264 and HEVC H.265. By pioneering the use of perceptual video quality metrics, Beamr is enabling media and entertainment companies of all stripes to reduce the bits they send by up to 50%. This reduces re-buffering events by up to 50%, improves video start time by 20% or more, and reduces storage and delivery costs. Fortunately, you now understand the basics of perceptual video quality. You also see why most of the video engineering community believes content adaptive sits at the heart of next-generation encoding technologies. Unfortunately, when we stated above that there were “all kinds of ways” to reduce bits up to 50% without sacrificing ‘eyeball video quality’, we skipped over some very important details. Such as, how we can utilize subjective testing techniques on an entire catalog of videos at scale, and cost efficiently. Next time: Part 2 and the Opinionated Robot Looking for better tools to assess subjective video quality? You definitely want to check out Beamr’s VCT which is the best software player available on the market to judge HEVC, AVC, and YUV sequences in modes that are highly useful for a video engineer or compressionist. VCT is available for Mac and PC. And best of all, we offer a FREE evaluation to qualified users. Learn more about VCT: http://beamr.com/h264-hevc-video-comparison-player/
We can all agree that analyzing video quality is one of the biggest challenges when evaluating codecs. Companies use a combination of objective and subjective tests to validate encoder efficiency. In this post, I’ll explore why it is difficult to measure video quality with quantitative metrics alone because they fail to meet the subjective quality perception ability of the human eye.
Furthermore, we’ll look at why it’s important to equip yourself with the best resources when doing subjective testing, and how Beamr’s VCT visual comparison tool can help you with video quality testing.
But first, if you haven’t done so already, be sure to download your free trial of VCT here.
OBJECTIVE TESTING
The most common objective measurement used today is pixel-based Peak Signal to Noise Ratio (PSNR). PSNR is a popular test to use because it is easy to calculate and nearly everyone working in video is familiar with interpreting its values. But it does have limitations. Typically a higher PSNR value correlates to higher quality, while a lower PSNR value correlates to lower quality. However, since this test measures pixel-based mean-squared error over an entire frame; measuring the quality of a frame (or collection of frames) using a single number does not always parallel true subjective quality.
PSNR gives equal weight to every pixel in the frame and each frame in a sequence, ignoring many factors that can affect human perception. For example, below are 2 encoded images of the same frame.1 Image (a) and Image (b) have the same PSNR, which should theoretically correlate to two encoded images of the same quality. However, it is easy to see the difference in this example of perceived quality as viewers would rate Image (a) as exceptionally higher quality than Image (b).
Example:
Due to the inconsistencies of error-based methods, like PSNR to adequately mimic human eye perception, other methods for analyzing video quality have been developed, including the Structural Similarity Index Metric (SSIM) which measures structural distortion. Unlike PSNR, SSIM addresses image degradation as measures of the perceived change in three major aspects of images: luminance, contrast, and correction. SSIM has gained popularity, but as with PSNR, it has its limitations. Studies have suggested that SSIM’s performance is equal to PSNR’s performance and some have cited evidence of a systematic relationship between SSIM and Mean Squared Error (MSE).2
While SSIM and other quantitative measures including multi-scale structural similarity (MS-SSIM) and the Sarnoff Picture Quality Rating (PQR) have made significant gains, none can truly deliver the same assurance as subjective evaluation, using the human eye. It is also important to note that the two most widely used objective quality metrics mentioned above, PSNR and SSIM, were designed to evaluate static image quality. This means that both algorithms provide no meaningful information regarding motion artifacts, whereby limiting the effectiveness of the metric with regards to video.
SUBJECTIVE TESTING
While objective methods attempt to model human perception, there are no substitutes for subjective “golden-eye” tests. But we are all familiar with the drawbacks of subjectivity analysis, including variance of individual quality perception and the difficulties of executing proper subjective tests in 100% controlled viewing environments so that a large number of testers can participate. Evaluating video using subjective visual tests can reveal key differences that may not get caught by objective measures alone. Which is why it is important to use a combination of both objective and subjective testing methodologies.
One of the logistic difficulties of performing subjective quality comparisons is coordinating simultaneous playback of two streams. Recognizing some of the drawbacks of current subjective evaluation methods, in particular single-stream playback or awkward dual-stream review workarounds, Beamr spent years in research and development to build a tool that offers simultaneous playback of two videos with various comparison modes, to significantly improve the golden-eye test execution necessary to properly evaluate encoder efficiency.
Powered by our professional HEVC and H.264 codec SDK decoders, the Beamr video comparison tool VCT allows encoding engineers and compressionists to play back two frame-synchronized independent HEVC, H.264, or YUV sequences simultaneously. And compare the quality of these streams in four modes:
Split screen
Side-by-side
Overlay
and the newest mode Butterfly
MPEG2-TS and MP4 files containing either HEVC or H.264 elementary streams are also supported. Additionally, VCT displays valuable clip information such as bit-rate, screen resolution, frame rate, number of frames, and other important video information.
Developed in 2012, VCT was the industry’s first internal software player offered as a tool to help Beamr customers conduct subjective testing while evaluating our encoder’s efficiency. Today, VCT has been tested by many content and equipment companies from around the world in multiple markets including broadcast, mobile, and internet streaming, making it the defacto standard for subjective golden-eye video quality testing and evaluation.
VCT BENEFITS AND TIPS
Your FREE trial of VCT will come with an extensive user guide that contains everything you need to get started. But we know you are eager to begin your testing, so following are a few quick tips we trust you will find useful. Take advantage of this “golden” opportunity and get started today!
Note: use Command (⌘) instead of Ctrl for the OS X version of VCT.
Split Screen Comparison Mode:
Benefits:
Great for viewing two clips when only one screen is available.
Moving slider bar allows you to clearly see quality difference between two streams in your desired region of interest. For example, you can move the slider bar back and forth across a face to see quality differences between two discrete files.
Pro Tips:
Use the keyboard shortcut Ctrl + \ to re-center the slider bar after it is moved.
Shortcut key Ctrl + Tab allows you to change which video appears on the left or right of the slider bar.
Side-by-side Comparison Mode:
Benefits:
Great for tradeshows. Solves the lack of synchronization of side by side comparison tests when using two independent players.
Single control for both streams.
Pro Tip:
Shortcut key Ctrl + Tab allows you to change which video appears on which screen without moving the windows.
Overlay Comparison Mode:
Benefits:
Great for viewing the full frame of one stream on a single window.
Tips:
Shortcut key Ctrl + Tab allows you to cycle between the two videos. If you do this fast it is a great way to easily see quality differences between the two streams that you might not have noticed.
Butterfly Comparison Mode:
Benefits:
Very useful for determining the accuracy of the encoding process. The butterfly mode displays mirrored images of two sequences to help you assess whether an artifact occurs in the source when comparing an encoded sequence to the original.
Tips:
Use shortcut key Ctrl + \ to reset the frame to the leftmost view in and use shortcut Ctrl + Alt + \ to switch to the rightmost view in butterfly mode.
Use shortcut key Ctrl + [ and Ctrl + ] to move image in butterfly mode left/right.
Other Useful Tips:
Ctrl + m allows you to toggle through the 4 comparison modes.
Shift + Left Click opens the magnifier tool that allows you to zoom into hard to see areas of the video.
Easily scale frames of different resolutions to the same resolution by clicking “scale to same look” on the main menu
NEW automatic download feature on the splash screen notifies you of the latest version updates to ensure you’re always up to date.
As video services take a more aggressive approach to virtual reality (VR), the question of how to scale and deliver this bandwidth intensive content must be addressed to bring it to a mainstream audience.
While we’ve been talking about VR for a long time you can say that it was reinvigorated when Oculus grabbed the attention of Facebook who injected 2 billion in investment based on Mark Zuckerberg’s vision that VR is a future technology that people will actively embrace. Industry forecasters tend to agree, suggesting VR will be front and center in the digital economy within the next decade. According to research by Canalys, vendors will ship 6.3 million VR headsets globally in 2016 and CCS Insights suggest that as many as 96 million headsets will get snapped up by consumers by 2020.
One of VR’s key advantages is the fact that you have the freedom to look anywhere in 360 degrees using a fully panoramic video in a highly intimate setting. Panoramic video files and resolution dimensions are large, often 4K (4096 pixels wide, 2048 pixels tall, depending on the standard) or bigger.
While VR is considered to be the next big revolution in the consumption of media content, we also see it popping up in professional fields such as education, health, law enforcement, defense telecom and media. It can provide a far more immersive live experience than TV, by adding presence, the feeling that “you are really there.”
Development of VR projects have already started to take off and high-quality VR devices are surprisingly affordable. Earlier this summer, Google announced that 360-degree live streaming support was coming to YouTube.
Of course, all these new angles and sharpness of imagery creates new and challenging sets of engineering hurdles which we’ll discuss below.
Resolution and, Quality?
Frame rate, resolution, and bandwidth are affected by the sheer volume of pixels that VR transmits. Developers and distributors of VR content will need to maximize frame rates and resolution throughout the entire workflow. They must keep up with the wide range of viewers’ devices as sporting events in particular, demand precise detail and high frame rates, such as what we see with instant replay, slow motion, and 360-degree cameras.
In a recent Vicon industry survey, 28 percent of respondents stated that high-quality content was important to ensuring a good VR experience. Let’s think about simple file size comparisons – we already know that ultra HD file sizes take up considerably more storage space than SD and the greater the file size, the greater a chance it will impede the delivery. VR file sizes are no small potatoes. When you’re talking about VR video you’re talking about four to six times the foundational resolution that you are transmitting. And, if you thought that Ultra HD was cumbersome, think about how you’re going to deal with resolutions beyond 4K for an immersive VR HD experience.
In order to catch up with the file sizes we need to continue to develop video codecs that can quickly interpret the frame-by-frame data. HEVC is a great starting point but frankly given hardware device limitations many content distributors are forced to continue using H.264 codecs. For this reason we must harness advanced tools in image processing and compression. An example of one approach would be content adaptive perceptual optimization.
I want my VR now! Reaching End Users
Because video content comes in a variety of file formats including combinations of stereoscopic 3D, 360 degree panoramas and spherical views – they all come with obvious challenges such as added strain on processors, memory, and network bandwidth. Modern codecs today use a variety of algorithms to quickly and efficiently detect these similarities, but they are usually tailored to 2D content. However, a content delivery mechanism must be able to send this to every user and should be smart to optimize the processing and transmitting of video.
Minimizing latency, how long can you roll the boulder up the hill?
We’ve seen significant improvements in the graphic processing capabilities of desktops and laptops. However, to take advantage of the immersive environment that VR offers, it’s important that high-end graphics are delivered to the viewer as quickly and smoothly as possible. The VR hardware also needs to display large images properly and with the highest fidelity and lowest latency. There really is very limited room for things like color correction or for adjusting panning from different directions for instance. If you have to stitch or rework artifacts, you will likely lose ground. You need to be smart about it. Typical decoders for tablets or smart TVs are more likely to cause latency and they only support lower framerates. This means how you build the infrastructure will be the key to offering image quality and life-like resolution that consumers expect to see.
Bandwidth, where art thou?
According to Netflix, for an Ultra HD streaming experience, your Internet connection must have a speed of 25 Mbps or higher. However, according to Akamai, the average Internet speed in the US is only approximately 11 Mbps. Effectively, this prohibits live streaming on any typical mobile VR device which to achieve the quality and resolution needed may need 25 Mbps minimum.
Most certainly the improvements in graphic processing and hardware will continue to drive forward the realism of the immersive VR content, as the ability to render an image quickly becomes easier and cheaper. Just recently, Netflix jumped on the bandwagon and became the first of many streaming media apps to launch on Oculus’ virtual reality app store. As soon as all the VR display devices are able to integrate with these higher resolution screens, we will see another step change in the quality and realism of virtual environments. But will the available bandwidth be sufficient, is a very real question.
To understand the applications for VR, you really have to see it to believe it
A heart-warming campaign from Expedia recently offered children at a research hospital in Memphis Tennessee the opportunity to be taken on a journey of their dreams through immersive, real-time virtual travel – all without getting on a plane: https://www.youtube.com/watch?time_continue=179&v=2wQQh5tbSPw
The National Multiple Sclerosis Society also launched a VR campaign that inventively used the tech to give two people with MS the opportunity to experience their lifelong passions. These are the type of immersive experiences we hope will unlock a better future for mankind. We applaud the massive projects and time spent on developing meaningful VR content and programming such as this.
Frost & Sullivan estimates that $1.5 billion is the forecasted revenue from Pay TV operators delivering VR content by 2020. The adoption of VR in my estimation is only limited by the quality of the user experience, as consumer expectation will no doubt be high.
For VR to really take off, the industry needs to address some of these challenges making VR more accessible and most importantly with unique and meaningful content. But it’s hard to talk about VR without experiencing it. I suggest you try it – you will like it.
As video encoding workflows modernize to include content adaptive techniques, the ability to change encoder parameters “on-the-fly” will be required. With the ability to change encoder resolution, bitrate, and other key elements of the encoding profile, video distributors can achieve a significant advantage by creating recipes appropriate to each piece of content.
For VOD or file-based encoding workflows, the advantages of on-the-fly reconfigurability are to enable content specific encoding recipes without resetting the encoder and disrupting the workflow. At the same time, on-the-fly functionality is a necessary feature for supporting real-time encoding on a network with variable capacity. This way the application can take appropriate steps to react to changing bandwidth, network congestion or other operational requirements.
Vanguard by BeamrV.264 AVC Encoder SDK and V.265 HEVC Encoder SDK have supported on-the-fly modification of the encoder settings for several years. Let’s take a look at a few of the more common applications where having the feature can be helpful.
On-the-fly control of Bitrate
Adjusting bitrate while the encoder is in operation is an obvious application. All Vanguard by Beamr codec SDKs allow for the maximum bitrate to be changed via a simple “C-style” API. This will enable bitrate adjustments to be made based on the available bandwidth, dynamic channel lineups, or other network conditions.
On-the-fly control of Encoder Speed
Encoder speed control is an especially useful parameter which directly translates into video encoding quality and encoding processing time. Calling this function triggers a different set of encoding algorithms, and internal codec presets. This scenario applies with unicast transmissions where a service may need to adjust the encoder speed for ever-changing network conditions and client device capabilities.
On-the-fly control of Video Resolution
A useful parameter to access on the fly is video resolution. One use case is in telecommunications where the end user may shift his viewing point from a mobile device operating on a slow and congested cellular network, to a broadband WiFi network, or hard wired desktop computer. With control of video resolution, the encoder output can be changed during its operation to accommodate the network speed or to match the display resolution, all without interrupting the video program stream.
On-the-fly control of HEVC SAO and De-blocking Filter
HEVC presents additional opportunities to enhance “on the fly” control of the encoder and the Vanguard by Beamr V.265 encoder leads the market with the capability to turn on or off SAO and De-blocking filters to adjust quality and performance in real-time.
On-the-fly control of HEVC multithreading
V.265 is recognized for having superior multithreading capability. The V.265 codec SDK provides access to add or remove encoding execution threads dynamically. This is an important feature for environments with a variable number of tasks running concurrently such as encoding functionality that is operating alongside a content adaptive optimization process, or the ABR packaging step.
Beamr’s implementation of on-the-fly controls in our V.264 Codec SDK and V.265 Codec SDK demonstrate the robust design and scalable performance of the Vanguard by Beamr encoder software.
For more information on Vanguard by Beamr Codec SDK’s, please visit the V.264 and V.265 pages. Or visit http://beamr.com for more on the company and our technology.