-->
Save your FREE seat for Streaming Media Connect this August. Register Now!

AOMedia Delivers on SVT-AV1's Promise

Article Featured Image

In August 2020, 开放媒体联盟(AOMedia)成立了一个软件工作组,“使用英特尔开发的可扩展视频技术AV1 (SVT-AV1)编码器” ... 创建AV1编码器实现,在应用程序之间提供出色的视频压缩,以消除计算复杂性的方式,为不断增长的视频传输市场进行权衡.“当时发布的测试表明,SVT-AV1要在其他AV1编解码器中脱颖而出,还有很长的路要走.

For example, in a comparison published a month later, I found SVT-AV1 to be last among the four AV1 codecs I tested (AOMedia’s lib­aom, Visionular’s Aurora1, aomenc, and SVT-AV1), although only about 3% less efficient than FFmpeg/libaom-AV1. At the time, SVT-AV1 had several critical deficits, including a two-pass rate control that was incomplete. In a 2020 report, 莫斯科国立大学(MSU)发现,SVT-AV1比libaom落后4个百分点,比aurora落后25个百分点.

With the recent launch of version 1.0, SVT-AV1 appears to have caught up with libaom in quality, with very definite performance advantages. Its two-pass rate control is tested and proven. If you’re creating an AV1 encoding workflow today that emphasises encoding speed and quality, SVT-AV1 should definitely be on your short list.

About Scalable Video Technology

Let’s start with a brief introduction to what Scalable Video Technology (SVT) is and how it works. According to a recent Intel white paper, “SVT架构旨在最大限度地提高英特尔至强可扩展处理器上SVT编码器的性能. It is based on three-dimensional parallelism.” Most important of the three is segment-based parallelism, 哪一种“将每张图片分割成片段,并并行处理一张图片的多个片段,从而在不损失视频质量的情况下更好地利用计算百家乐软件?.”

This technique is counter to the view that encoding each frame in its entirety delivers the best quality. For example, Avidemux’s encoding guide says, “H.264 allows the encoder to segment each frame into several parts. These parts are called ‘slices.使用多个切片(每帧)的优点是,这些切片可以独立地并行处理. This allows easy multi-threading implementations in H.264 encoders and decoders. Unfortunately using multiple slices hurts compression efficiency! The more slices are used the worse!”

So, SVT尝试做的很大一部分是分割图像,以在保持质量的同时获得处理效率. Early efforts were not encouraging. As shown in Figure 1 from the aforementioned MSU report, SVT-HEVC was 51 percentage points behind x265, and SVT-VP9 was an astonishing 129 percentage points behind VP9, which made the 4% delta between SVT-AV1 and aomedia seem like a breakthrough.

AV1

Figure 1. According to Moscow State University in 2020, early SVT performance wasn’t encouraging.

Testing Parameters

Now that you’re familiar with SVT-AV1, let’s explore the encoding parameters that I used for my testing and the quality comparisons. For the record, I tested version 1.0.0 of SVT-AV1 as provided by a member of the Intel SVT-AV1 development team. I tested FFmpeg version 2022-06-09-git5d5a014199, downloaded from www.gyan.dev. 我在一台运行Windows 7的40核HP Z840工作站上执行了所有编码测试,该工作站安装了两个运行速度为3的英特尔至强E5-2687W v3 cpu.10 GHz with 32GB of RAM.

Choosing a Preset

编解码器开发人员创建预设来配置控制编码时间/编码质量权衡的编码参数组. 这允许编解码器用户选择适合其特定应用的成本和质量水平. Whenever you start working with a new codec or encoder, 您应该用自己的源素材对编解码器进行基准测试,以探索这些权衡并为您做出最佳决定. To do this, you should select several representative test clips, encode them using all of the presets and otherwise consistent settings, time the encode, and measure the quality. With FFmpeg, you control the AV1 preset using the -cpu-used switch, with settings ranging from 0 to 8 and a default setting of 1.

Table 1 shows the average results for two 10-second test clips when encoding with FFmpeg and libaom-AV1. With preset 0, the highest-quality preset, 编码一个10秒的测试片段平均需要3:24:33(小时:分:秒)(这就是为什么用更长的片段进行测试具有挑战性的原因). With the fastest/lowest-quality preset, it took 1:06 (min:sec). This tells you that on this test bed, FFmpeg/libaom-AV1 isn’t capable of encoding a live stream; in fact, the best performance is close to 7x real-time.

AV1 table 1

Table 1. Encoding time and quality with FFmpeg and libaom-AV1

For a measure of overall quality, I used Video Multimethod Assessment Fusion (VMAF) computed via the harmonic mean method. To assess transient quality, I used low-frame VMAF, which is the lowest VMAF score for any frame in the test file.

In the Delta row on the bottom of Table 1, the time delta divides the slowest score by the fastest and shows that the slowest took 185.95 times longer than the fastest. You can also see that the overall VMAF difference between the fastest and slowest preset is 3.77. For perspective, 奈飞公司表示,6分的差异代表观众会感觉到的明显差异(JND), although other researchers have found that 3 VMAF points constitutes a JND. Either way, it’s not a significant difference between the highest- and lowest-quality preset—particularly, as you will see, compared to SVT-AV1.

To visualise the enco­d­ing time/quality trade­off, I plotted three factors—time, VMAF, 和低帧vmaf -为每个预置从0(最快预置/最低质量)到100(最慢预置/最高质量)的刻度. You can see this in Figure 2.

VMAF

Figure 2. Plotting encoding time versus quality for libaom-AV1

Every application is different, and every producer dances to their own particular tune. With my fictional VOD content producer hat on, I see preset 4 as the logical starting point, with a substantial jump in both VMAF and low-frame VMAF. Do I increase encoding costs by roughly 50% to achieve a 0.4 VMAF improvement with preset 3? Probably not. Unless I’m shipping extremely high stream volumes, I don’t consider presets 2, 1, or 0.

As an aside, although we are only looking at encoding time and quality in this analysis, a third factor, bandwidth, is also in play. That is, 所有的发行商都应该从选择预设和比特率所能达到的目标质量水平开始分析. With preset 2, the bit­rate necessary to achieve that target quality level will be less than for preset 3, so bandwidth savings will increasingly offset the encoding time costs as viewing volume increases.

At relatively low volumes, choosing a faster preset and saving on encoding time is probably the best strategy. If your streams will be viewed hundreds of thousands of times or more, it might make sense to pay more for encoding and save bandwidth. (I explore these issues in an article titled Choosing an x265 Preset—An ROI Analysis.) For most producers, I would assume that preset 4 or preset 3 is the most relevant choice for FFmpeg/libaom-AV1.

Choosing a Preset: SVT-AV1

Now let’s look at SVT-AV1. Table 2 shows the same datapoints for SVT-AV1 presets 0–12, with an actual range of -2–13 and a default of 10. The results reveal several obvious points.

SVT-AV1

Table 2. Encoding time and quality with SVT-AV1

First, the ranges of encoding time for VMAF and low-frame VMAF are much, much greater. In particular, three presets (10, 11, and 12) are capable of real-time encoding, with preset 9 very close, although the quality disparity is significant, extending to 2 JND by Netflix’s numbers and close to 3 JND for low frame.

Figure 3 charts the encoding time/quality trade-­off. From a VOD perspective, it appears that preset 6 is the starting point, with most producers choosing somewhere between 2 and 4. As previously detailed, as the anticipated view counts for your videos increase, you should gravitate toward a higher-quality preset.

Encoding time/quality

Figure 3. Plotting encoding time versus quality for SVT-AV1

In terms of the bigger picture, the range of performance and quality makes SVT-AV1 much more usable than libaom-AV1, enabling even live AV1 applications. I don’t know what configuration options are available within libaom-AV1, 但如果它的开发人员探索如何扩大编码时间和质量的传播,使这种编解码器像SVT-AV1一样灵活,那将是有帮助的.

Choosing the Thread Count

Now that we’ve selected a preset, let’s cover threads. 此分析将帮助您了解命令字符串中应该包含哪些线程计数,并帮助您在多核计算机上选择最佳的云实例或编码策略.

FFmpeg/libaom-AV1

With FFmpeg/libaom-AV1, you control the number of CPU threads applied to the encode with the -threads command. Table 3 展示了我在试图确定任何配置选项的最佳设置时所进行的分析. The baseline column shows the result when no setting is in the command string, which invokes the default setting. 随后的每一列都显示了将原本相同的命令字符串配置为使用命令字符串的结果, two, four, eight, 16, and 32 threads on the 40-core HP workstation. The Delta column shows the difference between the highest and lowest scores.

finding the optimal thread setting

Table 3. Finding the optimal thread setting for FFmpeg/libaom-AV1

You can see the results in encoding speed, bitrate, and three quality variables—harmonic mean VMAF, low-frame VMAF, and standard deviation—the last being a measure of quality variability in the stream. The green background identifies the best score, the yellow background the worst.

In terms of performance, not surprisingly, we see that one thread is the slowest option by far. We also see that while 16 threads is the fastest setting, the performance difference between 16 and eight/32 is negligible. From this, I’d guess that the maximum number of threads libaom-AV1 can utilise is eight.

Surprisingly, the single-threaded encode was the lowest quality in all three measures, although the Delta column shows that the differences are irrelevant. The quality results for almost all other alternatives are identical, so production efficiency should be the focus. Clearly, any setting over eight threads makes no sense, and if you’re provisioning cloud instances, eight should be the maximum as well. But is eight the optimal thread count? Table 4 tells the tale.

encoding workstation optimal thread count

Table 4. Finding the optimal thread count for an encoding workstation for FFmpeg/libaom-AV1

Using the average encoding times shown in Table 3, Table 4 computes the number of hours it would take to encode an hour of AV1 video using each thread count. Then, it adds the hourly cost of Amazon Web Services (AWS) compute instances from go2sm.com/awspricing and computes the cost per hour for the four thread counts shown.

Interestingly, you achieve the cheapest cost per hour using a single-threaded machine. Why would this be? Because as shown in Figure 4, the encoding cost increases linearly, while the additional threads deliver increasingly lower speed increases. Going from one thread to two doubles the cost but only increases encoding speed by 1.8x. Going from one thread to eight increases costs by 8x but only increases throughput by 2.99x.

speed vs. instance cost

Figure 4. Plotting the increase in encoding speed versus instance cost

Of course, 该分析假设,供应和管理更多编码站所涉及的工作不会超过所节省的成本. Either way, provisioning encoding stations with more than eight cores likely doesn’t make economic sense, and lower thread counts might be more cost-efficient.

Working Efficiently on Multicore Encoding Stations

The same logic should apply to spreading production encodes over a multiple-core workstation. On a 16-core workstation, for example, you might achieve faster throughput with four encodes using four threads each as opposed to two encodes using eight threads. Of course, running multiple encodes adds some overhead that slows overall operation. For example, on my 40-core workstation, a single encode of the 10-second Football test clip took 4:23 (min:sec). When I encoded eight files simultaneously, the average time increased to 5:49, about 32% higher. Still, if you have the ability to deploy multiple instances on a single workstation, some experiments with different thread values will provide useful direction.

Choosing the Optimal Thread Count With SVT-AV1

Given the previously shared explanation of SVT, you’d expect better performance at higher thread counts, and SVT delivers. Still, as you’ll see, the same analysis does less to sell multiple-core Xeon processors than you might think.

Table 5 shows the encoding speed/quality trade-off associated with SVT-AV1’s -lp switch, which controls the number of logical processors assigned to any encoding task. Baseline is fastest because it appears to assign all logical processors to the task, although baseline is only slightly faster than 32 threads.

optimal thread setting for svt-av1

Table 5. Finding the optimal thread setting for SVT-AV1

From a quality perspective, a single thread delivers the best quality here, but the delta is irrelevant. 这使得编码吞吐量和成本成为选择线程数(和-lp值)时最重要的因素。. In this regard, the surprisingly diminishing speed returns from the additional threads dictate the results shown in Table 6.

optimal thread count for svt-av1

Table 6. Finding the optimal thread count for an encoding workstation for SVT-AV1

As you can see in Table 6, the jump from one thread to eight threads delivers slightly more in throughput than AWS charges for CPUs, making eight threads the cheapest encoding option by a hair. From there, however, the lessened speed increase means an ever-escalating cost per hour for higher thread counts. These findings suggest that encoding configurations exceeding eight threads might not be cost-effective.

These results come with all of the usual caveats; your findings may certainly vary. I performed these tests on 1080p 8-bit content, and the results for 4K and 8K HDR footage might be completely different. I’m also predicting cloud throughput from results posted by an older desktop machine; results on newer versions may be different. Intel versus AMD is another potential differentiator.

The high-level point is that with both lib­aom-AV1 and SVT-AV1, you shouldn’t assume that more cores deliver the most cost-effective throughput. 如果您准备扩展AV1编码,并且需要确定购买哪些工作站或提供哪些云实例, 用你的简单素材和目标输出进行一两天的测试,应该能提供非常清晰的方向.

This takes us to our quality bake-off.

Comparative Quality

Here’s the encoding string that I used for FFmpeg/libaom-AV1, with options in green as the defaults. This means that you’d get the same result if you removed them. I like to leave them in because it simplifies comparing the string to those used in other comparisons.

ffmpeg -y -i Football_10.mp4 -c:v libaom-av1 -b:v 1500K -g 60 -keyint_min 60 -cpu-used 8 -auto-alt-ref 1 -threads 8 -tile-columns 1 -tile-rows 0 -row-mt 1 -lag-in-frames 25 -pass 1 -f matroska NUL & \

ffmpeg -y -i Football_10.mp4 -c:v
libaom-av1 -b:v 1500K -maxrate 3000K

-g 60 -keyint_min 60 -cpu-used 4
-auto-alt-ref 1 -threads 8 -tile-
columns 1 -tile-rows 0 -row-mt 1 -lag-in-frames 25 -pass 2 Football_1.mkv

Note that I tested with -cpu-used 8 in the first pass and -cpu-used 4 in the second. That’s because the quality used in the first pass doesn’t impact overall quality. I tested with threads set to 8 for maximum single-encoding- instance throughput on my workstation.

Here’s the command string used for SVT-AV1. For these tests, I wanted to get as close to the same encoding time for both codecs as possible.

SvtAv1EncApp -i input.y4m --rc 1 --tbr 1500 --mbr 3000 --keyint 2s --preset 3 --passes 3 --lp 8 --tile-columns 0 --tile-rows 0 --enable-tf 1 -b
output.ivf

With -cpu-used 4 in the second pass, FFmpeg delivered the files in 4:24 (min:sec; see Table 1). I used preset 3 for SVT-AV1, as it delivered the files in a slightly faster 3:48 (see Table 2). Note that I used three-pass encoding to encode all SVT-AV1 output produced for this article, although the first and second passes are very, very fast. I also used -lp 8 for throughput and to match the libaom setting.

Overall, I tested 17 files ranging in duration from 1 to 4 minutes, 有四个编码,每个编码产生输出,呈现在速率失真图中,并用于计算BD-Rate结果. I’m told that adding the results to present a composite graph is mathematically incorrect, but I find it useful as a general gauge of the overall result. So please don’t show Figure 5 to your mathematically inclined colleagues.

average results for 17 test files

Figure 5. Average results for 17 test files

As you can see in Figure 5, SVT-AV1 wins at lower bitrates, while libaom prevails at higher bitrates. Overall, according to the BD-Rate composite computation, SVT-AV1 produced the same quality as libaom-AV1, with a bitrate savings of 1.36%.

Feeling a bit let down because you read all the way to the end, only to find that SVT-AV1 delivered just a minuscule bandwidth savings? Well, when I last reviewed SVT-AV1, 编解码器需要增加4%的带宽才能达到libaom-AV1的质量,并且在测试中实际上速度较慢.

Now, SVT-AV1 slightly exceeds libaom-AV1 quality while enabling software-based live AV1 encoding. Not bad for version 1.0. While this may not trigger a mass exodus from libaom-AV1 to SVT-AV1, it does enable a completely different set of potential AV1 applications, which can only accelerate AV1 adoption.

Looking Ahead

During my tests, I had to convert the source MP4 files to Y4M format to encode with the SVT-AV1 standalone encoder. Obviously, 在FFmpeg中的操作将消除这一点,并简化将SVT-AV1编码集成到现有的基于FFmpeg的工作流程中.

While you can access SVT-AV1 within some FFmpeg builds, it’s single pass only, not two or three pass, which delivers better nitrate control and quality. Apparently, 为FFmpeg添加三通道功能需要大量工作,至少在2022年底之前可能不会实现. So, most developers will use the ACT-AV1 app that I used.

Another open question is the continued vitality of the libaom-AV1 codec in FFmpeg, given that AOMedia has focused its software working group on SVT-AV1. 我给amedia的一位联系人发了一个问题,问他是否计划继续更新libaom-AV1及其独立编码器(aomenc)。, but hadn’t heard back by press time. Check the website for any updates.

Looking at prominent AV1 publishers, YouTube has been producing AV1 with FFmpeg/libaom-AV1 for years. 因为在YouTube基于ffmpeg的编码农场中切换到SVT-AV1将需要大量的百家乐软件来获得适度的VOD生产收益, 似乎amedia将继续支持libaom-AV1(和它最大的用户),至少在SVT-AV1完全使用之前, including three-pass encoding, is available within FFmpeg—and probably a whole lot longer.

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

The State of VVC Adoption and Implementation

If you're a video publisher who doesn't rank high in the VVC patent list, it's likely that you won't consider VVC until 2026 or later. If you're looking to augment your H.264 encodes with a more efficient codec before then, your most likely choices are HEVC and AV1, with LCEVC also an option. However, if you're a product or service provider in the streaming media ecosystem, 早该开始考虑VVC何时何地适合你和你的目标客户了.

Rethink TV's Alex Davies Talks New M&E Codecs Market Forecast

When it comes to making roadmap decisions about adopting emerging codecs for media & entertainment content delivery between now and 2030, what do the numbers say, and how has the forecast changed since the pandemic shook up the streaming market? In this in-depth exclusive interview with Streaming Learning Center's Jan Ozer, Rethink Technology Research Senior Analyst Alex Davies discusses the findings of Rethink's new M&E Codecs Market Forecast.

AV1 Hardware Solutions at IBC

Aims to lower overhead, raise performance, diminish power for open-source codec

Companies and Suppliers Mentioned