欄目導航

新聞資訊

新聞資訊

在前文《視頻編解碼硬件方案漫談》中我們介紹硬件視頻編解碼的一般方案，本文我們進一步介紹音視頻編解碼如何在ffmpeg使用顯卡硬件進行加速。

一、基本概況

ffmpeg對顯卡廠家SDK進行封裝和集成，實現部分的硬件編解碼

	NVIDIA	AMD	INTEL
編碼器	xxx_nvenc	xxx_amf	xxxx_qsv
解碼器	xxx_ cuvid	暫未實現	xxxx_qsv

Ffmpeg硬解編解碼應用

其中xxx標識編碼類型，如h264,h265,mpeg2,vp8,vp9等。其次在ffmpeg中軟件編解碼器可以實現相關硬解加速。如在h264解碼器中可以使用cuda 加速，qsv加速，dxva2 加速，d3d11va加速，opencl加速等。

	cuda	qsv	dxva2/d3d11va	opencl
應用場景	適應NVIDIA顯卡平臺，但跨OS	適應Intel顯卡平臺，但跨OS	使用Windows OS，但跨硬件平臺	僅僅支持opencl的硬件平臺

二、命令行的使用

在ffmpeg中，如果使用-vcodec xxx 指定硬件編解碼器，否則使用軟件編解碼。

如：

ffplay -x 800 -y 600 -vcodec h264_qsv h264.mp4

ffplay -x 800 -y 600 -vcodec hevc_qsv 4k_hevc.mp4

ffmpeg.exe -i test.ts -vcodec hevc_amf -s 1280x720 output.ts

二、代碼中使用

1）使用特定的編解碼器

任何一個編解碼器包都是由AVCodec來描述的。其中ID代表一類編碼器或解碼。如：AV_CODEC_ID_H264；代表是h264編解碼器。而name代表某一個編碼器或解碼器。通常我們使用avcodec_find_decoder(ID)和avcodec_find_encoder(ID)來解碼器和編碼器。默認采用的軟件編解碼。如果我們需要使用硬件編解碼，采用avcodec_find_encoder_by_name（name）和avcodec_find_decoder_by_name(name)來指定編碼器。其他代碼流程與軟件編解碼一致。

如：

//codec = avcodec_find_decoder(AV_CODEC_ID_H264);

codec = avcodec_find_decoder_by_name("h264_cuvid");

if (!codec) {

fprintf(stderr, "Codec not found\n");

exit(1);

}

2）使用

2）使用硬件加速

使用特定的編解碼器好處就是跨操作系統，不論是Windows還是Linux都是一套代碼，但缺點就是不跨硬件，不同顯卡廠家采用不同編解碼器。而基于軟件編碼器的硬件加速是跨硬件顯卡的，如Windows d3d11va硬件加速，無論底層是AMD顯卡還是Intel顯卡還是nvidia顯卡都適用，相當于windows 系統屏蔽了硬件細節，我們只需要調用windows的API實現即可。下面一個基于硬件加速的demo

tatic AVBufferRef* hw_device_ctx = NULL;
static enum AVPixelFormat hw_pix_fmt;
static FILE* output_file = NULL;

//硬件加速初始化	
static int hw_decoder_init(AVCodecContext* ctx, const enum AVHWDeviceType type)
{
	int err = 0;
    //創建一個硬件設備上下文
	if ((err = av_hwdevice_ctx_create(&hw_device_ctx, type,
		NULL, NULL, 0)) < 0) {
		fprintf(stderr, "Failed to create specified HW device.\n");
		return err;
	}
	ctx->hw_device_ctx = av_buffer_ref(hw_device_ctx);

	return err;
}

//獲取GPU硬件解碼幀的格式
static enum AVPixelFormat get_hw_format(AVCodecContext* ctx,
	const enum AVPixelFormat* pix_fmts)
{
	const enum AVPixelFormat* p;

	for (p = pix_fmts; *p != -1; p++) {
		if (*p == hw_pix_fmt)
			return *p;
	}

	fprintf(stderr, "Failed to get HW surface format.\n");
	return AV_PIX_FMT_NONE;
}

//解碼后數據格式轉換，GPU到CPU拷貝，YUV數據dump到文件
static int decode_write(AVCodecContext* avctx, AVPacket* packet)
{
	AVFrame* frame = NULL, * sw_frame = NULL;
	AVFrame* tmp_frame = NULL;
	uint8_t* buffer = NULL;
	int size;
	int ret = 0;

	ret = avcodec_send_packet(avctx, packet);
	if (ret < 0) {
		fprintf(stderr, "Error during decoding\n");
		return ret;
	}

	while (1) {
		if (!(frame = av_frame_alloc()) || !(sw_frame = av_frame_alloc())) {
			fprintf(stderr, "Can not alloc frame\n");
			ret = AVERROR(ENOMEM);
			goto fail;
		}

		ret = avcodec_receive_frame(avctx, frame);
		if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
			av_frame_free(&frame);
			av_frame_free(&sw_frame);
			return 0;
		}
		else if (ret < 0) {
			fprintf(stderr, "Error while decoding\n");
			goto fail;
		}

		if (frame->format == hw_pix_fmt) {
		/* 將解碼后的數據從GPU內存存格式轉為CPU內存格式，并完成GPU到CPU內存的拷貝*/
			if ((ret = av_hwframe_transfer_data(sw_frame, frame, 0)) < 0) {
				fprintf(stderr, "Error transferring the data to system memory\n");
				goto fail;
			}
			tmp_frame = sw_frame;
		}
		else
			tmp_frame = frame;
          //計算一張YUV圖需要的內存 大小
		size = av_image_get_buffer_size((AVPixelFormat)tmp_frame->format, tmp_frame->width,
			tmp_frame->height, 1);
      //分配內存
		buffer = (uint8_t *)av_malloc(size);
		if (!buffer) {
			fprintf(stderr, "Can not alloc buffer\n");
			ret = AVERROR(ENOMEM);
			goto fail;
		}
         //將圖片數據拷貝的buffer中(按行拷貝)
		ret = av_image_copy_to_buffer(buffer, size,
			(const uint8_t* const*)tmp_frame->data,
			(const int*)tmp_frame->linesize, (AVPixelFormat)tmp_frame->format,
			tmp_frame->width, tmp_frame->height, 1);
		if (ret < 0) {
			fprintf(stderr, "Can not copy image to buffer\n");
			goto fail;
		}
        //buffer數據dump到文件
		if ((ret = fwrite(buffer, 1, size, output_file)) < 0) {
			fprintf(stderr, "Failed to dump raw data.\n");
			goto fail;
		}

	fail:
		av_frame_free(&frame);
		av_frame_free(&sw_frame);
		av_freep(&buffer);
		if (ret < 0)
			return ret;
	}
}
int main(int argc,char * argv[])
{
	AVFormatContext* input_ctx = NULL;
	int video_stream, ret;
	AVStream* video = NULL;
	AVCodecContext* decoder_ctx = NULL;
	AVCodec* decoder = NULL;
	AVPacket packet;
	enum AVHWDeviceType type;
	int i;
   
	if (argc < 4) {
		fprintf(stderr, "Usage: %s <device type> <input file> <output file>\n", argv[0]);
		return -1;
	}
  // 設備類型為：cuda dxva2 qsv d3d11va opencl，通常在windows使用d3d11va或者dxva2
	type = av_hwdevice_find_type_by_name(argv[1]); //根據設備名找到設備類型
	if (type == AV_HWDEVICE_TYPE_NONE) {
		fprintf(stderr, "Device type %s is not supported.\n", argv[1]);
		fprintf(stderr, "Available device types:");
		while ((type = av_hwdevice_iterate_types(type)) != AV_HWDEVICE_TYPE_NONE)
			fprintf(stderr, " %s", av_hwdevice_get_type_name(type));
		fprintf(stderr, "\n");
		return -1;
	}

	/* open the input file */
	if (avformat_open_input(&input_ctx, argv[2], NULL, NULL) != 0) {
		fprintf(stderr, "Cannot open input file '%s'\n", argv[2]);
		return -1;
	}

	if (avformat_find_stream_info(input_ctx, NULL) < 0) {
		fprintf(stderr, "Cannot find input stream information.\n");
		return -1;
	}

	/* find the video stream information */
	ret = av_find_best_stream(input_ctx, AVMEDIA_TYPE_VIDEO, -1, -1, &decoder, 0);
	if (ret < 0) {
		fprintf(stderr, "Cannot find a video stream in the input file\n");
		return -1;
	}
	video_stream = ret;

    //查找到對應硬件類型解碼后的數據格式
	for (i = 0;; i++) {
		const AVCodecHWConfig* config = avcodec_get_hw_config(decoder, i);
		if (!config) {
			fprintf(stderr, "Decoder %s does not support device type %s.\n",
				decoder->name, av_hwdevice_get_type_name(type));
			return -1;
		}
		if (config->methods & AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX &&
			config->device_type == type) {
			hw_pix_fmt = config->pix_fmt;
			break;
		}
	}

	if (!(decoder_ctx = avcodec_alloc_context3(decoder)))
		return AVERROR(ENOMEM);

	video = input_ctx->streams[video_stream];
	if (avcodec_parameters_to_context(decoder_ctx, video->codecpar) < 0)
		return -1;
   
	decoder_ctx->get_format = get_hw_format;

//硬件加速初始化	
if (hw_decoder_init(decoder_ctx, type) < 0)
		return -1;

	if ((ret = avcodec_open2(decoder_ctx, decoder, NULL)) < 0) {
		fprintf(stderr, "Failed to open codec for stream #%u\n", video_stream);
		return -1;
	}

	/* open the file to dump raw data */
	output_file = fopen(argv[3], "w+b");

	/* actual decoding and dump the raw data */
	while (ret >= 0) {
		if ((ret = av_read_frame(input_ctx, &packet)) < 0)
			break;

		if (video_stream == packet.stream_index)
			ret = decode_write(decoder_ctx, &packet); //解碼并dump文件

		av_packet_unref(&packet);
	}

	/* flush the decoder */
	packet.data = NULL;
	packet.size = 0;
	ret = decode_write(decoder_ctx, &packet);
	av_packet_unref(&packet);

	if (output_file)
		fclose(output_file);
	avcodec_free_context(&decoder_ctx);
	avformat_close_input(&input_ctx);
	av_buffer_unref(&hw_device_ctx);

	return 0;
}

編譯后生成hw_decoder.exe,解碼生成YUV文件如下：

hw_decoder.exe dxva2 D:\videos\hevcdemo.ts test.yuv

硬件加速解碼

由此可見，GPU解碼器有利用率，CPU占用率極低，硬件加速成功。

更多更詳細的信息請關注微信公眾號：AV_Chat

HEVC 的現狀

背景簡介

什么是 HEVC ？簡單說就是一種比 H264 壓縮效率更高的現代視頻編碼格式，它支持 8K，支持 HDR，支持廣色域，支持最高 16bit 的色彩深度，最高 YUV444 的色彩抽樣，總之一句話，是一種用來取代現有 H264 的更高效、現代的視頻編碼格式，且目前已經被各類硬件廣泛支持。

然而因為版權和技術派別等原因，這種格式一直沒有被瀏覽器很好支持，尤其是目前市占率最高的 Chrome，一月初看到了一條 B 站用戶吐槽 HEVC 解碼性能/發熱問題的新聞（感謝 B 站在 HEVC WASM 解碼方案上的探索），考慮到這也是困擾業界很久的問題，大量依賴 HEVC 的 Web 項目均被迫產出了各種各樣 Workaround 方案，但效果一直都不是最理想的，心想不如幫 Chromium 實現一下 HEVC 硬解吧。

本文簡述了 Web 解碼方案現狀，介紹了作者為 Chromium 瀏覽器實現 & 完善硬解過程中遇到的問題和實現原理，并在文末附加了測試結果，預編譯版本供參考，希望可以解決 FrontEnd 苦 HEVC 久矣的問題。

也可以提前下載 Chrome Canary（https://www.google.com/chrome/canary/），體驗 HEVC 硬解功能（ChromeOS、Android、Mac、Windows 需要添加啟動參數 --enable-features=PlatformHEVCDecoderSupport，Linux 版本暫未支持）。

主流設備早已支持且廣泛使用

在 2015 年，蘋果的 iPhone6s 就已經在其 A9 芯片內首次實現了 HEVC 硬解能力，同年，Intel 在第六代 Skylake 的 HD500 系列核顯上，NVIDIA 在 GTX900 系列獨顯上，也先后支持了 HEVC 硬解。

在 2017 年發布的 iOS11, macOS 10.13 上，蘋果繼續完成了其 VideoToolbox 編解碼框架對 HEVC 編解碼能力的支持，微軟也發布了 HEVC Video Extension 作為 Windows PC 環境 HEVC 解碼的能力對標。

從此 HEVC 成為蘋果，安卓默認視頻格式，成為絕大多數單反 / 無人機 / 攝像設備的主推格式。

直到今年，也就是 2022 年，iPhone 已經出到了 13，芯片技術已經提升到了 5 納米，然而我們所使用的大部分瀏覽器依然無法播放 HEVC 視頻。

硬解的必要性

更低的發熱

所謂硬解，即指使用 GPU 內專用于解碼的芯片來處理解碼工作，由于 GPU 多核心低頻且專一的優勢，在解碼視頻時發熱和功耗顯著低于 CPU。

更好的性能

通過將 CPU 從繁重的解碼工作中解放，可極大程度降低系統卡頓。

且 GPU 天生適合進行圖形解碼工作，解碼性能秒殺 CPU，視頻分辨率越高，顯卡解碼越可以做到不掉幀輸出，因此“永遠不要指望單純靠 CPU 軟解可以流暢播放 8K 60 幀的 HEVC 視頻”。

總結

HEVC 是目前桌面端或手機端播放器最主流的編碼格式，考慮到其編碼復雜度高，解碼更耗費資源，因此為其實現硬解非常必要。

HEVC 解碼的方案

瀏覽器解碼現狀

首先先來看看 Web 側解碼的現狀：

Windows

	H264	H265	VP8	VP9	MPEG4	AV1
Chrome 102	硬+軟解	不支持	軟解	硬+軟解	不支持	硬+軟解
Firefox 101	硬+軟解	不支持	軟解	軟解	不支持	硬+軟解
Edge 102	硬+軟解	硬+軟解(需裝擴展)	軟解	硬+軟解	不支持	硬解(需裝擴展)

macOS

	H264	H265	VP8	VP9	MPEG4	AV1
Chrome 102	硬+軟解	不支持	軟解	硬+軟解	不支持	軟解
Firefox 101	硬+軟解	不支持	軟解	硬+軟解	不支持	軟解
Edge 102	硬+軟解	不支持	軟解	硬+軟解	不支持	不支持
Safari 15.3	硬+軟解	硬+軟解(僅部分)	不支持	不支持	硬+軟解	不支持

目前業內常用的 Web HEVC 解碼方案大致可以分為兩種：“換瀏覽器” 或 “WASM 軟解”，他們各自有各自的優勢和使用場景。

瀏覽器-Edge (硬解，僅 Windows）

Chromium 內核的 Edge 在 Windows 系統下，額外支持了硬解 HEVC 視頻，但必須滿足如下條件：

操作系統版本必須為 Windows 10 1709（16299.0）及以后版本。
安裝付費的 HEVC 視頻擴展或免費的來自設備制造商的 HEVC 視頻擴展且版本號必須大于等于 1.0.50361.0（由于一個存在了一年半以上的 Bug，老版本存在抖動的 Bug，Issue：https://techcommunity.microsoft.com/t5/discussions/hevc-video-decoding-broken-with-b-frames/td-p/2077247/page/4）。

3.版本號必須大于等于 Edge 99 。

在安裝插件后，進入 edge://gpu 頁面，可以查看 Edge 對于 HEVC 硬解支持的 Profile：

出現上圖所示的字樣，則證明硬解開啟成功。

指標：

分辨率最高支持 8192px * 8192px。
支持 HEVC Main / Main10 / Main Still Picture Profile。

優勢：

在顯卡支持的情況，性能是最好的。
HTMLVideoElement、MSE 等原生 API 的直接支持。

劣勢：

不支持 Windows 8 和老版本 Windows 10。
需要手動裝插件。
HDR 支持不夠好。

瀏覽器-Safari (硬解，僅 macOS）

由于 Apple 是 HEVC 標準的主要推動者，因此早在 17 年的 Safari 11 即完成了 HEVC 視頻硬解的支持，無需安裝任何插件開箱即用。

指標：

分辨率最高支持 8192px * 8192px。
支持 HEVC Main / Main10 Profile，M1+ 機型支持部分 HEVC Rext Profile。

優勢：

在顯卡支持的情況，性能是最好的。
HTMLVideoElement、MSE 等原生 API 的直接支持。
開箱即用，無需裝插件。
HDR 支持最好（比如：杜比視界 Profile5，杜比全景聲）。

劣勢：

生態不足，缺乏大量 Chromium 內核下“可用、好用的”插件。
Safari 俗稱“下一個 IE”，其瀏覽器 API 兼容性與實現，相比 Chromium 仍有差距。
部分 HEVC 視頻莫名其妙無法播放，哪怕視頻本身沒問題。

前端解碼-WASM（軟解，任何平臺）

此類方案絕大部分基于 WASM + FFMPEG 編譯實現，支持所有支持 WASM 的瀏覽器。

指標：

支持 FFMPEG 支持的所有分辨率和 Profile。

優勢：

不挑瀏覽器，是純前端的技術實現。

劣勢：

需要依賴所在版本瀏覽器 WASM 的穩定性。
不支持硬解，因為軟解+性能損耗的緣故，性能有其天花板，4K 以上視頻即使使用 5950X 這樣的頂級 CPU 也會卡頓掉幀。
非 HTML Video Element、MSE、EME 原生 API，需要手動用 js 初始化視頻播放，使用有成本。

瀏覽器-本文方案（硬 / 軟解，Windows / macOS / Linux）

本文嘗試直接為 Chromium 實現硬解，因為盡管 Safari 和 Edge 均已經實現了 HEVC 硬解，但它們均為閉源軟件，無法被各種開源框架集成，而因為 Chromium 是開源的，這可以確保所有人可自行編譯支持 Windows / macOS / Linux 硬解的 Chromium / Electron / CEF，考慮到實現原理部分較長，因此如果你感興趣，可直接下載預編譯版本（https://github.com/StaZhu/enable-chromium-hevc-hardware-decoding/releases）進行測試（未來會被包含在 Chrome 正式版本內，預編譯版本可供大家嘗鮮提前試用，也可下載 Chrome Canary），或跳到測評部分查看與 Edge / Safari 的對比。

HEVC 硬解的實現原理

正是因為如上瓶頸，“讓專業的人做專業的事”這句話同樣適用視頻解碼，GPU 硬解是很有必要的。GPU 解碼的存在正是為了讓解碼工作可以充分利用顯卡內部專用芯片，分擔 CPU 解碼時的壓力，因此支持更多格式的硬解能力，已然成為眾多顯卡廠商的一大賣點。

首先我們需要做一些調研，研究下目前硬解框架是如何存在，并支持哪些“系統” or “GPU”。

下表來自 FFMPEG 項目對不同解碼框架硬解支持情況的總結（來源：https://trac.ffmpeg.org/wiki/HWAccelIntro）

（硬解框架的支持情況，表格內容來自 FFmpeg 官網）

可以看到硬解框架五花八門，不同的顯卡廠商和設備有各自的專用解碼框架，操作系統也有定義好的通用解碼框架，由于顯卡廠商眾多，因此大部分播放器一般均基于通用框架實現硬解，少部分播放器在人力充裕的情況可能會為了更好的性能（顯卡廠商自己的框架一般比通用框架性能更好，但也不絕對）額外對專用框架二次實現。

其中 Windows 平臺通用的解碼框架有 Media Foundation, Direct3D 11, DXVA2, 以及 OpenCL。macOS 平臺通用的解碼框架只有一個，也就是蘋果自己的 VideoToolbox。Linux 平臺的通用解碼框架有 VAAPI 和 OpenCL。

顯然，對于 Chrome 而言，為了更好的兼容性和穩定性，基于通用硬解框架實現硬解，更符合最小成本最大收益的目標，并提升了可維護性。

理解 Chromium 解碼流程

根據 Chromium Media 模塊簡介可知，瀏覽器將音視頻播放一共抽象成三種類型，我們比較常見的有：Video Element 標簽，MSE API。此外還有支持加密視頻播放的 EME API，這三種在底層又存在多種復用關系。

(Chromium 的解碼流程，圖片來自 Chromium 代碼倉庫)

那么到了最底層的解碼模塊，整體邏輯大概可以簡述為：

瀏覽器會從列表中依次按照順序查找 Decoder，通常來說優先級最高的是硬解 Decoder, 然后會嘗試軟解 Decoder。
如有命中其中的某個 Decoder 則執行后續解碼邏輯。
如沒有命中的 Decoder，則解碼失敗，中止。

因此，為了實現 HEVC 硬解，我們首先需要找到各個平臺的通用硬解 Decoder：

對于 Windows，根據操作系統以及顯卡驅動版本，分為兩種：D3D11VideoDecoder 和 VDAVideoDecoder，前者在大于 Windows8 且支持 D3D11 的系統默認被使用，后者則在前者不被使用時（比如 Windows 7）作為 Backup 方案被使用。
對于 macOS，為 VDAVideoDecoder。
對于 Linux，為 VAAPIVideoDecoder。

macOS 的硬解

在了解了大致背景后，便可以開始探索實現 HEVC 硬解實現了，考慮到 Apple 其最新 Apple Silicon 芯片專門實現了支持 H.264、HEVC 和 ProRes 的專用編解碼媒體處理引擎，看在 Apple 這么努力的份上，我首先挑選了 macOS 平臺來進行嘗試。

FFMPEG 方案的嘗試

雖然 Chrome 沒有直接實現 HEVC 解碼能力，但由于其實現了 FFMpegVideoDecoder，因此本質上任何 FFMPEG 可以播的視頻，只要利用修改 Chromium 的方式為其添加 FFMPEG 解碼器的入口，理論上均可以實現播放，此方案其實是本文硬解實現前開源社區最廣為流傳的一種方案，@斯杰的文章（https://www.infoq.cn/article/s65bFDPWzdfP9CQ6Wbw6）內已有詳盡介紹，由于當時的版本是基于 Chromium 79，目前最新的 Chromium 版本號為 104，因此里面的一些實現有所變動，但整體邏輯并沒有明顯改變，通過修改 Chromium 104 依然可以實現軟解。

優點有很多：由于是 CPU 軟解且使用行業最標準的 FFMPEG 解碼，最終結果是：不挑系統，容錯性好，支持任何 CPU 架構、操作系統，性能雖比不過硬解，但依然比前端 WASM 方案性能更好，且原生支持 MSE 和 Video Element。

缺點也很明顯：普通的四核筆記本電腦，即使分辨率只有 1080P，在快進或快退時也會感到明顯的卡頓，同時伴隨比較高的 CPU 占用，搶占渲染進程 CPU 資源，另外這種方法是否有版權有待評估，但可以確定一點，使用平臺提供的解碼是合規且沒有版權風險的。

當分辨率達到 4K 甚至 8K 級別，8 核甚至更多核的 CPU 也會卡到掉幀。

( FFMPEG 的解碼流程，圖片來自知乎 @我是小北挖哈哈)

根據 FFMPEG 的解碼流程如上圖（參考：https://zhuanlan.zhihu.com/p/168240163?Futm_source=wechat_session&utm_medium=social&utm_oi=29396161265664），可知道，FFMPEG 除了實現了軟解，其實已經完整實現了硬解功能，然而 Chromium 的 FFMpegVideoDecoder 并不支持硬解，因此，同組同學 @豪爽，首先嘗試 FFMpegVideoDecoder 內嘗試配置 hw_device_ctx，以開啟其硬解能力，具體步驟如下：

開啟硬解宏:

// third_party/ffmpeg/chromium/config/Chrome/mac/x64/config.h

#define CONFIG_VIDEOTOOLBOX 1
#define CONFIG_HEVC_VIDEOTOOLBOX_HWACCEL 1
#define HAVE_KCMVIDEOCODECTYPE_HEVC 1

設置硬件上下文:

// media/filters/ffmpeg_video_decoder.cc -> FFmpegVideoDecoder::ConfigureDecoder(const VideoDecoderConfig& config, bool low_delay)

if (decode_nalus_)
    codec_context_->flags2 |= AV_CODEC_FLAG2_CHUNKS;

+ if (codec_context_->codec_id == AVCodecID::AV_CODEC_ID_HEVC) {
+     AVBufferRef *hw_device_ctx = NULL;
+     int err;
+     if ((err = av_hwdevice_ctx_create(&hw_device_ctx, AV_HWDEVICE_TYPE_VIDEOTOOLBOX, NULL, NULL, 0)) >= 0) {
+         codec_context_->hw_device_ctx = av_buffer_ref(hw_device_ctx);
+     }
+ }

const AVCodec* codec = avcodec_find_decoder(codec_context_->codec_id);

取出解碼數據:

// media/ffmpeg/ffmpeg_common.cc -> AVPixelFormatToVideoPixelFormat(AVPixelFormat pixel_format)

case AV_PIX_FMT_YUV420P:
case AV_PIX_FMT_YUVJ420P:
case AV_PIX_FMT_VIDEOTOOLBOX: // hwaccel
  return PIXEL_FORMAT_I420;

將硬件解碼得到的數據取出，即 av_hwframe_transfer_data 函數:

// media/ffmpeg/ffmpeg_decoding_loop.cc

FFmpegDecodingLoop::DecodeStatus FFmpegDecodingLoop::DecodePacket(const AVPacket* packet, FrameReadyCB frame_ready_cb) {
+    AVFrame* tmp_frame = NULL;
+    AVFrame* sw_frame = av_frame_alloc();
    bool sent_packet = false, frames_remaining = true, decoder_error = false;
    while (!sent_packet || frames_remaining) {
        ......
+        if (frame_.get()->format == AV_PIX_FMT_VIDEOTOOLBOX) {
+            int ret = av_hwframe_transfer_data(sw_frame, frame_.get(), 0);
+            tmp_frame = sw_frame;
+        } else {
+            tmp_frame = frame_.get();
+        }
+        const bool frame_processing_success = frame_ready_cb.Run(tmp_frame);
+        av_frame_unref(tmp_frame);
-        const bool frame_processing_success = frame_ready_cb.Run(frame_.get());
        av_frame_unref(frame_.get());
        if (!frame_processing_success)
            return DecodeStatus::kFrameProcessingFailed;
        }

    return decoder_error ? DecodeStatus::kDecodeFrameFailed : DecodeStatus::kOkay;
}

如上，經過多次嘗試后，通過活動監視器可以觀察到點擊< Video >標簽播放按鈕時 VTDecoderXPCService 進程（Videotoolbox 的解碼進程）CPU 占有率有所上升，說明調用 VideoToolbox 硬件解碼模塊成功，但視頻白屏說明解碼失敗。

探索過程中，閱讀 Chromium Media 模塊的文檔后發現，使用 FFMpegVideoDecoder 不支持在 Sandboxed 的進程調用 VT 硬解框架，為了避免在錯誤的道路上投入過多精力，遂放棄。

在 GPU 進程實現

上面的方式行不通，說明得換一種思路，需要看看正統的 H264 硬解流程是怎樣的，通過使用 Chrome 的搜索引擎（https://source.chromium.org/），發現 macOS 的 H264 硬解實現均位于vt_video_decoder_accelerator.cc這個文件內。

VideoToolbox 簡介

由 FFmpeg 介紹可知，如我們想在 macOS 實現 HEVC 硬解，則一定需要使用蘋果提供的媒體解碼框架 VideoToolbox 來完成。

VideoToolbox is a low-level framework that provides direct access to hardware encoders and decoders. It provides services for video compression and decompression, and for conversion between raster image formats stored in CoreVideo pixel buffers. These services are provided in the form of session objects (compression, decompression, and pixel transfer), which are vended as Core Foundation (CF) types. Apps that don't need direct access to hardware encoders and decoders should not need to use VideoToolbox directly.

根據 Apple Developer 網站介紹（https://developer.apple.com/documentation/videotoolbox）可知，VideoToolbox 是蘋果提供的直接用來進行編解碼的底層框架，要實現硬解，大體解碼流程可以理解為：Chromium -> VDAVideoDecoder -> VideoToolbox -> GPU -> VideoToolbox -> VDAVideoDecoder -> Chromium。

因此我們的目標就是正確按照 VideoToolbox 要求的方式，提交 Image Buffer，并等待 VT 將解碼后的數據回傳。

添加 Supported Profile

根據 Chromium 解碼流程可知，Chromium 對于特定 Codec 的視頻首先會嘗試查找硬解 Decoder，如硬解 Decoder 不支持，則繼續向后查找 Fallback 的軟解 Decoder。

通過觀察可發現，在 macOS 下，某種編碼格式是否支持硬解，取決于硬解 Decoder 內的 SupportProfiles 是否包含這種編碼格式，其代碼如下：

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

// 這個數組內包含了所有可能支持的Profile，但是否真正支持并不取決于這里
constexpr VideoCodecProfile kSupportedProfiles[] = {
    H264PROFILE_BASELINE, H264PROFILE_EXTENDED, H264PROFILE_MAIN,
    H264PROFILE_HIGH,

    // macOS 11以上，會嘗試對這兩種格式進行硬解
    VP9PROFILE_PROFILE0, VP9PROFILE_PROFILE2,

    // macOS 11以上，支持的最主流的HEVC Main / Main10 Profile, 以及
    // Main Still Picture / Main Rext 的硬、軟解
    // (Apple Silicon 機型支持硬解HEVC Rext, Intel 機型支持軟解HEVC Rext)
    // These are only supported on macOS 11+.
    HEVCPROFILE_MAIN, HEVCPROFILE_MAIN10, HEVCPROFILE_MAIN_STILL_PICTURE,
    HEVCPROFILE_REXT,

    // TODO(sandersd): Hi10p fails during
    // CMVideoFormatDescriptionCreateFromH264ParameterSets with
    // kCMFormatDescriptionError_InvalidParameter.
    //
    // H264PROFILE_HIGH10PROFILE,

    // TODO(sandersd): Find and test media with these profiles before enabling.
    //
    // H264PROFILE_SCALABLEBASELINE,
    // H264PROFILE_SCALABLEHIGH,
    // H264PROFILE_STEREOHIGH,
    // H264PROFILE_MULTIVIEWHIGH,
};

Session 預熱與引導邏輯

實現硬解，需要在 Sandboxed 的進程啟用前創建解碼 Session 預熱，并根據系統版本與支持情況決定最終是否啟用硬解：

// media/gpu/mac/vt_video_decode_accelerator_mac.cc
bool InitializeVideoToolbox() {
  // 在GPU主進程調用時立刻執行，以確保Sandboxed/非Sandoxed進程均可硬解
  static const bool succeeded = InitializeVideoToolboxInternal();
  return succeeded;
}

// 在GPU Sandbox啟用前通過創建Videotoolbox的Decompression Session預熱，確保Sandboxed/非Sandoxed進程均可硬解
bool InitializeVideoToolboxInternal() {
  VTDecompressionOutputCallbackRecord callback = {0};
  base::ScopedCFTypeRef<VTDecompressionSessionRef> session;
  gfx::Size configured_size;

  // 創建H264硬解Session
  const std::vector<uint8_t> sps_h264_normal = {
      0x67, 0x64, 0x00, 0x1e, 0xac, 0xd9, 0x80, 0xd4, 0x3d, 0xa1, 0x00, 0x00,
      0x03, 0x00, 0x01, 0x00, 0x00, 0x03, 0x00, 0x30, 0x8f, 0x16, 0x2d, 0x9a};
  const std::vector<uint8_t> pps_h264_normal = {0x68, 0xe9, 0x7b, 0xcb};
  if (!CreateVideoToolboxSession(
          CreateVideoFormatH264(sps_h264_normal, std::vector<uint8_t>(),
                                pps_h264_normal),
          /*require_hardware=*/true, /*is_hbd=*/false, &callback, &session,
          &configured_size)) {
    // 如果H264硬解Session創建失敗，直接禁用整個硬解模塊
    DVLOG(1) << "Hardware H264 decoding with VideoToolbox is not supported";
    return false;
  }

  session.reset();

  // 創建H264軟解Session
  // 總結下，如果這臺設備連H264硬/軟解都不支持，則直接禁用硬解，解碼完全走FFMpegVideoDecoder的軟解
  const std::vector<uint8_t> sps_h264_small = {
      0x67, 0x64, 0x00, 0x0a, 0xac, 0xd9, 0x89, 0x7e, 0x22, 0x10, 0x00,
      0x00, 0x3e, 0x90, 0x00, 0x0e, 0xa6, 0x08, 0xf1, 0x22, 0x59, 0xa0};
  const std::vector<uint8_t> pps_h264_small = {0x68, 0xe9, 0x79, 0x72, 0xc0};
  if (!CreateVideoToolboxSession(
          CreateVideoFormatH264(sps_h264_small, std::vector<uint8_t>(),
                                pps_h264_small),
          /*require_hardware=*/false, /*is_hbd=*/false, &callback, &session,
          &configured_size)) {
    DVLOG(1) << "Software H264 decoding with VideoToolbox is not supported";
    // 如果H264軟解 Decompression Session創建失敗，直接禁用整個硬解模塊
    return false;
  }

  session.reset();

  if (__builtin_available(macOS 11.0, *)) {
    VTRegisterSupplementalVideoDecoderIfAvailable(kCMVideoCodecType_VP9);

    // 當系統大于等于macOS Big Sur時，嘗試創建VP9硬解Session
    if (!CreateVideoToolboxSession(
            CreateVideoFormatVP9(VideoColorSpace::REC709(), VP9PROFILE_PROFILE0,
                                 absl::nullopt, gfx::Size(720, 480)),
            /*require_hardware=*/true, /*is_hbd=*/false, &callback, &session,
            &configured_size)) {
      DVLOG(1) << "Hardware VP9 decoding with VideoToolbox is not supported";
      // 如果創建session失敗，說明不支持VP9硬解，跳過，但保持H264可繼續硬解
    }
  }

// 按照Chromium的要求HEVC硬解相關的邏輯，均需要依賴ENABLE_HEVC_PARSER_AND_HW_DECODER宏定義開關，只有開啟開關后才會將代碼引入
#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  // 即使編譯時開啟了HEVC硬解宏
  // 當啟動時傳入`--enable-features=PlatformHEVCDecoderSupport`可啟用HEVC硬解
  if (base::FeatureList::IsEnabled(media::kPlatformHEVCDecoderSupport)) {
    // 這里限制了至少是Big Sur系統的原因是，Catalina及以下系統使用
    // CMVideoFormatDescriptionCreateFromHEVCParameterSets API創建解碼Session
    // 會失敗
    // 注：macOS自身問題，蘋果承諾了10.13及以上系統即可使用這個API，然，實測結果并卵
    // 但VLC和FFmpeg等使用的CMVideoFormatDescriptionCreate可以正常創建
    // 但，這與硬解模塊實現的風格和結構不符
    if (__builtin_available(macOS 11.0, *)) {
      session.reset();

      // 創建HEVC硬解Session
      // vps/sps/pps提取自bear-1280x720-hevc.mp4
      const std::vector<uint8_t> vps_hevc_normal = {
          0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x01, 0x60,
          0x00, 0x00, 0x03, 0x00, 0x90, 0x00, 0x00, 0x03,
          0x00, 0x00, 0x03, 0x00, 0x5d, 0x95, 0x98, 0x09};

      const std::vector<uint8_t> sps_hevc_normal = {
          0x42, 0x01, 0x01, 0x01, 0x60, 0x00, 0x00, 0x03, 0x00, 0x90, 0x00,
          0x00, 0x03, 0x00, 0x00, 0x03, 0x00, 0x5d, 0xa0, 0x02, 0x80, 0x80,
          0x2d, 0x16, 0x59, 0x59, 0xa4, 0x93, 0x2b, 0xc0, 0x5a, 0x70, 0x80,
          0x00, 0x01, 0xf4, 0x80, 0x00, 0x3a, 0x98, 0x04};

      const std::vector<uint8_t> pps_hevc_normal = {0x44, 0x01, 0xc1, 0x72,
                                                    0xb4, 0x62, 0x40};

      if (!CreateVideoToolboxSession(
              CreateVideoFormatHEVC(vps_hevc_normal, sps_hevc_normal,
                                    pps_hevc_normal),
              /*require_hardware=*/true, /*is_hbd=*/false, &callback, &session,
              &configured_size)) {
        DVLOG(1) << "Hardware HEVC decoding with VideoToolbox is not supported";
        // 同VP9邏輯，HEVC硬解預熱失敗不會禁用H264硬解能力
      }

      session.reset();

      // 創建HEVC軟解Session
      // vps/sps/pps提取自bear-320x240-v_frag-hevc.mp4
      const std::vector<uint8_t> vps_hevc_small = {
          0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x01, 0x60,
          0x00, 0x00, 0x03, 0x00, 0x90, 0x00, 0x00, 0x03,
          0x00, 0x00, 0x03, 0x00, 0x3c, 0x95, 0x98, 0x09};

      const std::vector<uint8_t> sps_hevc_small = {
          0x42, 0x01, 0x01, 0x01, 0x60, 0x00, 0x00, 0x03, 0x00, 0x90,
          0x00, 0x00, 0x03, 0x00, 0x00, 0x03, 0x00, 0x3c, 0xa0, 0x0a,
          0x08, 0x0f, 0x16, 0x59, 0x59, 0xa4, 0x93, 0x2b, 0xc0, 0x40,
          0x40, 0x00, 0x00, 0xfa, 0x40, 0x00, 0x1d, 0x4c, 0x02};

      const std::vector<uint8_t> pps_hevc_small = {0x44, 0x01, 0xc1, 0x72,
                                                   0xb4, 0x62, 0x40};

      if (!CreateVideoToolboxSession(
              CreateVideoFormatHEVC(vps_hevc_small, sps_hevc_small,
                                    pps_hevc_small),
              /*require_hardware=*/false, /*is_hbd=*/false, &callback, &session,
              &configured_size)) {
        DVLOG(1) << "Software HEVC decoding with VideoToolbox is not supported";

        //  同VP9邏輯，HEVC軟解預熱失敗不會禁用H264硬解能力
      }
    }
  }
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  return true;
}

// 實際的最終判斷邏輯
VideoDecodeAccelerator::SupportedProfiles
VTVideoDecodeAccelerator::GetSupportedProfiles(
    const gpu::GpuDriverBugWorkarounds& workarounds) {
  SupportedProfiles profiles;
  // H264硬/軟解不支持時，禁用硬解模塊
  if (!InitializeVideoToolbox())
    return profiles;

  for (const auto& supported_profile : kSupportedProfiles) {
    // 目前僅支持VP9 PROFILE0、2兩種Profile
    if (supported_profile == VP9PROFILE_PROFILE0 ||
        supported_profile == VP9PROFILE_PROFILE2) {
      // 所有GPU模塊的解碼都會先讀取依賴GPU Workaround
      // 比如需要禁用特定型號或廠商的GPU對特定Codec的硬解支持
      // 則可利用GPU Workaround下發禁用配置
      if (workarounds.disable_accelerated_vp9_decode)
        continue;
      if (!base::mac::IsAtLeastOS11())
        // 系統版本不支持VP9硬解，跳過
        continue;
      if (__builtin_available(macOS 10.13, *)) {
        if ((supported_profile == VP9PROFILE_PROFILE0 ||
             supported_profile == VP9PROFILE_PROFILE2) &&
            !VTIsHardwareDecodeSupported(kCMVideoCodecType_VP9)) {
          // Profile不支持，或操作系統不支持VP9硬解，跳過
          continue;
        }

        // 經過GPU workaround、操作系統版本、Profile、以及OS是否支持VP9硬解檢查，最終確認支持VP9硬解，并接管解碼權限
      } else {
        // 系統版本不支持VP9硬解，跳過
        continue;
      }
    }

    // 目前支持HEVC Main、Main10、MSP、Rext四種Profile
    if (supported_profile == HEVCPROFILE_MAIN ||
        supported_profile == HEVCPROFILE_MAIN10 ||
        supported_profile == HEVCPROFILE_MAIN_STILL_PICTURE ||
        supported_profile == HEVCPROFILE_REXT) {
#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
      if (!workarounds.disable_accelerated_hevc_decode &&
          base::FeatureList::IsEnabled(kPlatformHEVCDecoderSupport)) {
        if (__builtin_available(macOS 11.0, *)) {
          // 經過GPU workaround、操作系統版本、Profile，編譯開關，啟動開關檢查，最終確認支持HEVC硬解（軟解我們也使用Videotoolbox來做，原因后面說），并接管解碼權限
          SupportedProfile profile;
          profile.profile = supported_profile;
          profile.min_resolution.SetSize(16, 16);
          // HEVC最大可支持8k 
          profile.max_resolution.SetSize(8192, 8192);
          profiles.push_back(profile);
        }
      }
#endif  //  BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
      continue;
    }

    // H264和VP9最大支持4k
    SupportedProfile profile;
    profile.profile = supported_profile;
    profile.min_resolution.SetSize(16, 16);
    profile.max_resolution.SetSize(4096, 4096);
    profiles.push_back(profile);
  }
  return profiles;
}

如上，經過 GPU workaround、操作系統版本、Profile、編譯開關、啟動開關檢查，最終如果校驗通過，則 HEVC 解碼邏輯會由 VideoToolbox 接管，并由 VTDecoderXPCService 進程最終實際負責解碼。

理解 HEVC 的 NALU 類型

NALU (network abstraction layer unit)，即網絡抽象層單元，是 H.264 / AVC 和 HEVC 視頻編碼標準的核心定義，按白話理解，就是 H264 / HEVC 為不同的視頻單元定義了的不同的類型（參考），感興趣可自行百科，這里不再贅述。對于 H264，存在 32 種，其中保留 Nalu 有 8 種。到了 HEVC，被擴展到了 64 種，保留 Nalu 有 16 種。

(H264 的 Nalu Unit 組成，圖片來自 Apple)

// media/video/h265_nalu_parser.h

enum Type {
    TRAIL_N = 0,  // coded slice segment of a non TSA(Temporal Sub-layer Access)
                  // trailing picture
    TRAIL_R = 1,  // coded slice segment of a non TSA(Temporal Sub-layer Access)
                  // trailing picture
    TSA_N = 2,    // coded slice segment of a TSA(Temporal Sub-layer Access)
                  // trailing picture
    TSA_R = 3,    // coded slice segment of a TSA(Temporal Sub-layer Access)
                  // trailing picture
    STSA_N = 4,   // coded slice segment of a STSA(Step-wise Temporal Sub-layer
                  // Access) trailing picture
    STSA_R = 5,   // coded slice segment of a STSA(Step-wise Temporal Sub-layer
                  // Access) trailing picture
    RADL_N = 6,   // coded slice segment of a RADL(Random Access Decodable
                  // Leading) leading picture
    RADL_R = 7,   // coded slice segment of a RADL(Random Access Decodable
                  // Leading) leading picture
    RASL_N = 8,   // coded slice segment of a RASL(Random Access Skipped
                  // Leading)L leading picture
    RASL_R = 9,  // coded slice segment of a RASL(Random Access Skipped Leading)
                 // leading picture
    RSV_VCL_N10 = 10,     // reserved non-IRAP SLNR VCL
    RSV_VCL_R11 = 11,     // reserved non-IRAP sub-layer reference VCL
    RSV_VCL_N12 = 12,     // reserved non-IRAP SLNR VCL
    RSV_VCL_R13 = 13,     // reserved non-IRAP sub-layer reference VCL
    RSV_VCL_N14 = 14,     // reserved non-IRAP SLNR VCL
    RSV_VCL_R15 = 15,     // reserved non-IRAP sub-layer reference VCL
    BLA_W_LP = 16,        // coded slice segment of a BLA IRAP picture
    BLA_W_RADL = 17,      // coded slice segment of a BLA IRAP picture
    BLA_N_LP = 18,        // coded slice segment of a BLA IRAP picture
    IDR_W_RADL = 19,      // coded slice segment of an IDR IRAP picture
    IDR_N_LP = 20,        // coded slice segment of an IDR IRAP picture
    CRA_NUT = 21,         // coded slice segment of a CRA IRAP picture
    RSV_IRAP_VCL22 = 22,  // reserved IRAP(intra random access point) VCL
    RSV_IRAP_VCL23 = 23,  // reserved IRAP(intra random access point) VCL
    RSV_VCL24 = 24,       // reserved non-IRAP VCL
    RSV_VCL25 = 25,       // reserved non-IRAP VCL
    RSV_VCL26 = 26,       // reserved non-IRAP VCL
    RSV_VCL27 = 27,       // reserved non-IRAP VCL
    RSV_VCL28 = 28,       // reserved non-IRAP VCL
    RSV_VCL29 = 29,       // reserved non-IRAP VCL
    RSV_VCL30 = 30,       // reserved non-IRAP VCL
    RSV_VCL31 = 31,       // reserved non-IRAP VCL
    VPS_NUT = 32,         // vps(video parameter sets)
    SPS_NUT = 33,         // sps(sequence parameter sets)
    PPS_NUT = 34,         // pps(picture parameter sets)
    AUD_NUT = 35,         // access unit delimiter
    EOS_NUT = 36,         // end of sequence
    EOB_NUT = 37,         // end of bitstream
    FD_NUT = 38,          // filter Data
    PREFIX_SEI_NUT = 39,  // sei
    SUFFIX_SEI_NUT = 40,  // sei
    RSV_NVCL41 = 41,      // reserve
    RSV_NVCL42 = 42,      // reserve
    RSV_NVCL43 = 43,      // reserve
    RSV_NVCL44 = 44,      // reserve
    RSV_NVCL45 = 45,      // reserve
    RSV_NVCL46 = 46,      // reserve
    RSV_NVCL47 = 47,      // reserve
    UNSPEC48 = 48,        // unspecified
    UNSPEC49 = 49,        // unspecified
    UNSPEC50 = 50,        // unspecified
    UNSPEC51 = 51,        // unspecified
    UNSPEC52 = 52,        // unspecified
    UNSPEC53 = 53,        // unspecified
    UNSPEC54 = 54,        // unspecified
    UNSPEC55 = 55,        // unspecified
    UNSPEC56 = 56,        // unspecified
    UNSPEC57 = 57,        // unspecified
    UNSPEC58 = 58,        // unspecified
    UNSPEC59 = 59,        // unspecified
    UNSPEC60 = 60,        // unspecified
    UNSPEC61 = 61,        // unspecified
    UNSPEC62 = 62,        // unspecified
    UNSPEC63 = 63,        // unspecified
  };

解析 SPS / PPS / VPS

如想實現 HEVC 解碼，首先需要拿到視頻的元數據，這就需要通過解析 NALU 類型為 32 (VPS_NUT), 33 (SPS_NUT), 34 (PPS_NUT)的 Nalu Header 來獲取。

舉個最基本的例子，如果我們希望獲取視頻的寬高，則需要解析SPS_NUT的 Nalu Header，并取sps->pic_width_in_luma_samples的值，以此類推。

通常媒體開發會使用一個叫做StreamAnalyzer的工具（鏈接：https://www.elecard.com/zh/products/video-analysis/stream-analyzer）快速解析視頻 Nalu Header，我們要做的事其實和這個軟件做的差不多：

（Stream Analyzer 解析 Nalu Header 示意）

可以看到，SPS_NUT的 Nalu Header 解析后的數據如截圖右側區域顯示，感謝 Elecard 開發的這款好用的工具，有了它對我們實現 VPS 解析有很大幫助。

觀察 Chromium 的代碼結構發現 @Jeffery Kardatzke 大佬已經于 2020 年底完成 Linux 平臺 Vappi HEVC 的硬解加速實現，和 H265 Nalu Parse 的大部分邏輯實現，由于 Linux 平臺硬解并不需要提取 VPS 參數，因為大佬沒有實現 VPS 解析，但根據 Apple Developer 的說明，若我們使用CMVideoFormatDescriptionCreateFromHEVCParameterSets API 創建解碼 session，需要提供 VPS, SPS, PPS 三種類型的 Nalu Data，因此實現 macOS 硬解的很大一部分工作即是完成 VPS NALU 的 Header 解析：

首先，參考 T-REC-H.265-202108-I，以及 FFMPEG 定義好的 H265RawVPS Struct Reference，我們需要定義好要解析的 VPS 結構體類型：

// media/video/h265_parser.h

// 定義H265VPS的結構體
struct MEDIA_EXPORT H265VPS {
  H265VPS();

  int vps_video_parameter_set_id; // 即vps_id，稍后需要用到
  bool vps_base_layer_internal_flag;
  bool vps_base_layer_available_flag;
  int vps_max_layers_minus1;
  int vps_max_sub_layers_minus1;
  bool vps_temporal_id_nesting_flag;
  H265ProfileTierLevel profile_tier_level;
  int vps_max_dec_pic_buffering_minus1[kMaxSubLayers]; // 稍后需要用到
  int vps_max_num_reorder_pics[kMaxSubLayers]; // 稍后需要用到
  int vps_max_latency_increase_plus1[kMaxSubLayers];
  int vps_max_layer_id;
  int vps_num_layer_sets_minus1;
  bool vps_timing_info_present_flag;

  // 剩余部分我們不需要，因此暫未實現解析邏輯
};

接著，我們需要完成 H265VPS 的解析邏輯：

// media/video/h265_parser.cc

// 解析VPS邏輯
H265Parser::Result H265Parser::ParseVPS(int* vps_id) {
  DVLOG(4) << "Parsing VPS";
  Result res = kOk;

  DCHECK(vps_id);
  *vps_id = -1;

  std::unique_ptr<H265VPS> vps = std::make_unique<H265VPS>();

  // 讀4Bit
  READ_BITS_OR_RETURN(4, &vps->vps_video_parameter_set_id);
  // 校驗讀取結果是否為0-16區間內的值
  IN_RANGE_OR_RETURN(vps->vps_video_parameter_set_id, 0, 16);
  READ_BOOL_OR_RETURN(&vps->vps_base_layer_internal_flag);
  READ_BOOL_OR_RETURN(&vps->vps_base_layer_available_flag);
  READ_BITS_OR_RETURN(6, &vps->vps_max_layers_minus1);
  IN_RANGE_OR_RETURN(vps->vps_max_layers_minus1, 0, 62);
  READ_BITS_OR_RETURN(3, &vps->vps_max_sub_layers_minus1);
  IN_RANGE_OR_RETURN(vps->vps_max_sub_layers_minus1, 0, 7);
  READ_BOOL_OR_RETURN(&vps->vps_temporal_id_nesting_flag);
  SKIP_BITS_OR_RETURN(16);  // 跳過vps_reserved_0xffff_16bits
  res = ParseProfileTierLevel(true, vps->vps_max_sub_layers_minus1,
                              &vps->profile_tier_level);
  if (res != kOk) {
    return res;
  }

  bool vps_sub_layer_ordering_info_present_flag;
  READ_BOOL_OR_RETURN(&vps_sub_layer_ordering_info_present_flag);

  for (int i = vps_sub_layer_ordering_info_present_flag
                   ? 0
                   : vps->vps_max_sub_layers_minus1;
       i <= vps->vps_max_sub_layers_minus1; ++i) {
    READ_UE_OR_RETURN(&vps->vps_max_dec_pic_buffering_minus1[i]);
    IN_RANGE_OR_RETURN(vps->vps_max_dec_pic_buffering_minus1[i], 0, 15);
    READ_UE_OR_RETURN(&vps->vps_max_num_reorder_pics[i]);
    IN_RANGE_OR_RETURN(vps->vps_max_num_reorder_pics[i], 0,
                       vps->vps_max_dec_pic_buffering_minus1[i]);
    if (i > 0) {
      TRUE_OR_RETURN(vps->vps_max_dec_pic_buffering_minus1[i] >=
                     vps->vps_max_dec_pic_buffering_minus1[i - 1]);
      TRUE_OR_RETURN(vps->vps_max_num_reorder_pics[i] >=
                     vps->vps_max_num_reorder_pics[i - 1]);
    }
    READ_UE_OR_RETURN(&vps->vps_max_latency_increase_plus1[i]);
  }
  if (!vps_sub_layer_ordering_info_present_flag) {
    for (int i = 0; i < vps->vps_max_sub_layers_minus1; ++i) {
      vps->vps_max_dec_pic_buffering_minus1[i] =
          vps->vps_max_dec_pic_buffering_minus1[vps->vps_max_sub_layers_minus1];
      vps->vps_max_num_reorder_pics[i] =
          vps->vps_max_num_reorder_pics[vps->vps_max_sub_layers_minus1];
      vps->vps_max_latency_increase_plus1[i] =
          vps->vps_max_latency_increase_plus1[vps->vps_max_sub_layers_minus1];
    }
  }

  READ_BITS_OR_RETURN(6, &vps->vps_max_layer_id);
  IN_RANGE_OR_RETURN(vps->vps_max_layer_id, 0, 62);
  READ_UE_OR_RETURN(&vps->vps_num_layer_sets_minus1);
  IN_RANGE_OR_RETURN(vps->vps_num_layer_sets_minus1, 0, 1023);

  *vps_id = vps->vps_video_parameter_set_id;
  // 如果存在相同vps_id的vps，則直接替換
  active_vps_[*vps_id] = std::move(vps);

  return res;
}

// 獲取VPS邏輯
const H265VPS* H265Parser::GetVPS(int vps_id) const {
  auto it = active_vps_.find(vps_id);
  if (it == active_vps_.end()) {
    DVLOG(1) << "Requested a nonexistent VPS id " << vps_id;
    return nullptr;
  }

  return it->second.get();
}

完善編寫 Unit Test 和 Fuzzer Test：

// media/video/h265_parser_unittest.cc

TEST_F(H265ParserTest, VpsParsing) {
  LoadParserFile("bear.hevc");
  H265NALU target_nalu;
  EXPECT_TRUE(ParseNalusUntilNut(&target_nalu, H265NALU::VPS_NUT));
  int vps_id;
  EXPECT_EQ(H265Parser::kOk, parser_.ParseVPS(&vps_id));
  const H265VPS* vps = parser_.GetVPS(vps_id);
  EXPECT_TRUE(!!vps);
  EXPECT_TRUE(vps->vps_base_layer_internal_flag);
  EXPECT_TRUE(vps->vps_base_layer_available_flag);
  EXPECT_EQ(vps->vps_max_layers_minus1, 0);
  EXPECT_EQ(vps->vps_max_sub_layers_minus1, 0);
  EXPECT_TRUE(vps->vps_temporal_id_nesting_flag);
  EXPECT_EQ(vps->profile_tier_level.general_profile_idc, 1);
  EXPECT_EQ(vps->profile_tier_level.general_level_idc, 60);
  EXPECT_EQ(vps->vps_max_dec_pic_buffering_minus1[0], 4);
  EXPECT_EQ(vps->vps_max_num_reorder_pics[0], 2);
  EXPECT_EQ(vps->vps_max_latency_increase_plus1[0], 0);
  for (int i = 1; i < kMaxSubLayers; ++i) {
    EXPECT_EQ(vps->vps_max_dec_pic_buffering_minus1[i], 0);
    EXPECT_EQ(vps->vps_max_num_reorder_pics[i], 0);
    EXPECT_EQ(vps->vps_max_latency_increase_plus1[i], 0);
  }
  EXPECT_EQ(vps->vps_max_layer_id, 0);
  EXPECT_EQ(vps->vps_num_layer_sets_minus1, 0);
  EXPECT_FALSE(vps->vps_timing_info_present_flag);
}

// media/video/h265_parser_fuzzertest.cc

case media::H265NALU::VPS_NUT:
  int vps_id;
  res = parser.ParseVPS(&vps_id);
  break;

由于 FFMPEG 已經實現了 VPS 的解析邏輯，因此這里大部分邏輯與 FFMPEG 保持一致即可，經過 UnitTest 測試（編譯步驟：autoninja -C out/Release64 media_unittests) 確認無問題，對照 StreamAnalyzer 同樣無問題后，完成 VPS 解析邏輯實現。

這里跳過 SPS, PPS, SliceHeader 的解析邏輯，因為代碼量過大且瑣碎，感興趣可參考 h265_parser.cc（https://source.chromium.org/chromium/chromium/src/+/main:media/video/h265_parser.cc）

計算 POC (Picture Order Count)

我們知道 H264 / HEVC 視頻幀類型大體上有三種：I 幀，P 幀，B 幀，其中 I 幀又稱全幀壓縮編碼幀，為整個 GOP（一個存在了 I，P，B 的幀組）內的第一幀，解碼無需參考其他幀，P 幀又稱前向預測編碼幀，解碼需要參考前面的 I，P 幀解碼，B 幀又稱雙向預測內插編碼幀，解碼需要參考前面的 I、P 幀和后面的 P 幀。

一共存在的這三種幀，他們在編碼時不一定會按順序寫入視頻流，因此在解碼時為了獲取不同幀的正確順序，需要計算圖片的順序即 POC。

（StreamEye 解析后的 GOP POC 結果示意）

如上圖 StreamEye 解析結果所示，POC 呈現：0 -> 4 -> 2 -> 1 -> 3 -> 8 -> 6 ... 規律。

不同幀的出現順序對于解碼來說至關重要，因此我們需要在不同幀解碼后對幀按 POC 重新排序，最終確保解碼圖像按照實際順序呈現給用戶：0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8。

蘋果的 VideoToolbox 并不會給我們實現這部分邏輯，因此我們需要自行計算 POC 順序，并在之后重排序，代碼實現如下：

// media/video/h265_poc.h

// Copyright 2022 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

#ifndef MEDIA_VIDEO_H265_POC_H_
#define MEDIA_VIDEO_H265_POC_H_

#include <stdint.h>

#include "third_party/abseil-cpp/absl/types/optional.h"

namespace media {

struct H265SPS;
struct H265PPS;
struct H265SliceHeader;

class MEDIA_EXPORT H265POC {
 public:
  H265POC();

  H265POC(const H265POC&) = delete;
  H265POC& operator=(const H265POC&) = delete;

  ~H265POC();

  // 根據SPS和PPS以及解析好的SliceHeader信息計算POC
  int32_t ComputePicOrderCnt(const H265SPS* sps,
                             const H265PPS* pps,
                             const H265SliceHeader& slice_hdr);
  void Reset();

 private:
  int32_t ref_pic_order_cnt_msb_;
  int32_t ref_pic_order_cnt_lsb_;
  // 是否為解碼過程的首張圖
  bool first_picture_;
};

}  // namespace media

#endif  // MEDIA_VIDEO_H265_POC_H_

POC 的計算邏輯：

// media/video/h265_poc.cc

// Copyright 2022 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

#include <stddef.h>

#include <algorithm>

#include "base/cxx17_backports.h"
#include "base/logging.h"
#include "media/video/h265_parser.h"
#include "media/video/h265_poc.h"

namespace media {

H265POC::H265POC() {
  Reset();
}

H265POC::~H265POC() = default;

void H265POC::Reset() {
  ref_pic_order_cnt_msb_ = 0;
  ref_pic_order_cnt_lsb_ = 0;
  first_picture_ = true;
}

// 如下邏輯所示，我們需要按照HEVC Spec的規范計算POC
//（這里我參考了Jeffery Kardatzke在H265Decoder的實現邏輯）
int32_t H265POC::ComputePicOrderCnt(const H265SPS* sps,
                                    const H265PPS* pps,
                                    const H265SliceHeader& slice_hdr) {
  int32_t pic_order_cnt = 0;
  int32_t max_pic_order_cnt_lsb =
      1 << (sps->log2_max_pic_order_cnt_lsb_minus4 + 4);
  int32_t pic_order_cnt_msb;
  int32_t no_rasl_output_flag;
  // Calculate POC for current picture.
  if (slice_hdr.irap_pic) {
    // 8.1.3
    no_rasl_output_flag = (slice_hdr.nal_unit_type >= H265NALU::BLA_W_LP &&
                           slice_hdr.nal_unit_type <= H265NALU::IDR_N_LP) ||
                          first_picture_;
  } else {
    no_rasl_output_flag = false;
  }

  if (!slice_hdr.irap_pic || !no_rasl_output_flag) {
    int32_t prev_pic_order_cnt_lsb = ref_pic_order_cnt_lsb_;
    int32_t prev_pic_order_cnt_msb = ref_pic_order_cnt_msb_;

    if ((slice_hdr.slice_pic_order_cnt_lsb < prev_pic_order_cnt_lsb) &&
        ((prev_pic_order_cnt_lsb - slice_hdr.slice_pic_order_cnt_lsb) >=
         (max_pic_order_cnt_lsb / 2))) {
      pic_order_cnt_msb = prev_pic_order_cnt_msb + max_pic_order_cnt_lsb;
    } else if ((slice_hdr.slice_pic_order_cnt_lsb > prev_pic_order_cnt_lsb) &&
               ((slice_hdr.slice_pic_order_cnt_lsb - prev_pic_order_cnt_lsb) >
                (max_pic_order_cnt_lsb / 2))) {
      pic_order_cnt_msb = prev_pic_order_cnt_msb - max_pic_order_cnt_lsb;
    } else {
      pic_order_cnt_msb = prev_pic_order_cnt_msb;
    }
  } else {
    pic_order_cnt_msb = 0;
  }

  // 8.3.1 Decoding process for picture order count.
  if (!pps->temporal_id && (slice_hdr.nal_unit_type < H265NALU::RADL_N ||
                            slice_hdr.nal_unit_type > H265NALU::RSV_VCL_N14)) {
    ref_pic_order_cnt_lsb_ = slice_hdr.slice_pic_order_cnt_lsb;
    ref_pic_order_cnt_msb_ = pic_order_cnt_msb;
  }

  pic_order_cnt = pic_order_cnt_msb + slice_hdr.slice_pic_order_cnt_lsb;
  first_picture_ = false;

  return pic_order_cnt;
}

}  // namespace media

計算 MaxReorderCount

計算 POC 并解碼后，為了確保視頻幀按照正確的順序展示給用戶，需要對視頻幀進行 Reorder 重排序，我們可以觀察 H264 的最大 Reorder 數計算邏輯，發現很復雜：

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

// H264最大Reorder數的計算邏輯
int32_t ComputeH264ReorderWindow(const H264SPS* sps) {
  // When |pic_order_cnt_type| == 2, decode order always matches presentation
  // order.
  // TODO(sandersd): For |pic_order_cnt_type| == 1, analyze the delta cycle to
  // find the minimum required reorder window.
  if (sps->pic_order_cnt_type == 2)
    return 0;

  int max_dpb_mbs = H264LevelToMaxDpbMbs(sps->GetIndicatedLevel());
  int max_dpb_frames =
      max_dpb_mbs / ((sps->pic_width_in_mbs_minus1 + 1) *
                     (sps->pic_height_in_map_units_minus1 + 1));
  max_dpb_frames = std::clamp(max_dpb_frames, 0, 16);

  // See AVC spec section E.2.1 definition of |max_num_reorder_frames|.
  if (sps->vui_parameters_present_flag && sps->bitstream_restriction_flag) {
    return std::min(sps->max_num_reorder_frames, max_dpb_frames);
  } else if (sps->constraint_set3_flag) {
    if (sps->profile_idc == 44 || sps->profile_idc == 86 ||
        sps->profile_idc == 100 || sps->profile_idc == 110 ||
        sps->profile_idc == 122 || sps->profile_idc == 244) {
      return 0;
    }
  }
  return max_dpb_frames;
}

幸運的是 HEVC 相比 H264 不需要如此繁雜的計算，HEVC 在編碼時已經提前將最大 Reorder 數算好了，我們只需按如下方式獲取：

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

// HEVC最大Reorder數的計算邏輯
#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
int32_t ComputeHEVCReorderWindow(const H265VPS* vps) {
  int32_t vps_max_sub_layers_minus1 = vps->vps_max_sub_layers_minus1;
  return vps->vps_max_num_reorder_pics[vps_max_sub_layers_minus1];
}
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)

計算好 Reorder 數和 POC 后，繼續復用 H264 的 Reorder 邏輯，即可正確完成排序。

提取并緩存 SPS / PPS / VPS

下面我們正式開始解碼邏輯實現，首先，需要提取 SPS / PPS / VPS，并對其解析，緩存:

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

switch (nalu.nal_unit_type) {
    // 跳過
    ...
    // 解析SPS
    case H265NALU::SPS_NUT: {
        int sps_id = -1;
        result = hevc_parser_.ParseSPS(&sps_id);
        if (result == H265Parser::kUnsupportedStream) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR, "Unsupported SPS");
            NotifyError(PLATFORM_FAILURE, SFT_UNSUPPORTED_STREAM);
            return;
        }
        if (result != H265Parser::kOk) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR, "Could not parse SPS");
            NotifyError(UNREADABLE_INPUT, SFT_INVALID_STREAM);
            return;
        }
        // 按照sps_id緩存SPS的nalu data
        seen_sps_[sps_id].assign(nalu.data, nalu.data + nalu.size);
        break;
    }
    // 解析PPS
    case H265NALU::PPS_NUT: {
        int pps_id = -1;
        result = hevc_parser_.ParsePPS(nalu, &pps_id);
        if (result == H265Parser::kUnsupportedStream) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR, "Unsupported PPS");
            NotifyError(PLATFORM_FAILURE, SFT_UNSUPPORTED_STREAM);
            return;
        }
        if (result == H265Parser::kMissingParameterSet) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR,
                            "Missing SPS from what was parsed");
            NotifyError(PLATFORM_FAILURE, SFT_INVALID_STREAM);
            return;
        }
        if (result != H265Parser::kOk) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR, "Could not parse PPS");
            NotifyError(UNREADABLE_INPUT, SFT_INVALID_STREAM);
            return;
        }
        // 按照pps_id緩存PPS的nalu data
        seen_pps_[pps_id].assign(nalu.data, nalu.data + nalu.size);
        // 將PPS同樣作為提交到VT的一部分，這可以解決同一個GOP下不同幀引用不同PPS的問題
        nalus.push_back(nalu);
        data_size += kNALUHeaderLength + nalu.size;
        break;
    }
    // 解析VPS
    case H265NALU::VPS_NUT: {
        int vps_id = -1;
        result = hevc_parser_.ParseVPS(&vps_id);
        if (result == H265Parser::kUnsupportedStream) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR, "Unsupported VPS");
            NotifyError(PLATFORM_FAILURE, SFT_UNSUPPORTED_STREAM);
            return;
        }
        if (result != H265Parser::kOk) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR, "Could not parse VPS");
            NotifyError(UNREADABLE_INPUT, SFT_INVALID_STREAM);
            return;
        }
        // 按照vps_id緩存VPS的nalu data
        seen_vps_[vps_id].assign(nalu.data, nalu.data + nalu.size);
        break;
    }
    // 跳過
    ...
}

創建解碼 Format 和 Session

根據解析后的 VPS，SPS，PPS，我們可以創建解碼 Format：

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
// 使用vps，sps，pps創建解碼Format（CMFormatDescriptionRef）
base::ScopedCFTypeRef<CMFormatDescriptionRef> CreateVideoFormatHEVC(
    const std::vector<uint8_t>& vps,
    const std::vector<uint8_t>& sps,
    const std::vector<uint8_t>& pps) {
  DCHECK(!vps.empty());
  DCHECK(!sps.empty());
  DCHECK(!pps.empty());

  // Build the configuration records.
  std::vector<const uint8_t*> nalu_data_ptrs;
  std::vector<size_t> nalu_data_sizes;
  nalu_data_ptrs.reserve(3);
  nalu_data_sizes.reserve(3);
  nalu_data_ptrs.push_back(&vps.front());
  nalu_data_sizes.push_back(vps.size());
  nalu_data_ptrs.push_back(&sps.front());
  nalu_data_sizes.push_back(sps.size());
  nalu_data_ptrs.push_back(&pps.front());
  nalu_data_sizes.push_back(pps.size());

  // 這里有一個關鍵點，即，在一個 GOP 內可能存在 >= 2 的引用情況、
  // 比如I幀引用了 pps_id 為 0 的 pps，P幀引用了 pps_id 為 1 的 pps
  // 這種場景經過本人測試，解決方法有兩個：
  // 方法1：把兩個PPS都傳進來，以此創建 CMFormatDescriptionRef（此時nalu_data_ptrs數組長度為4）
  // 方法2（本文選用的方法）：仍然只傳一個PPS，但把 PPS 的 Nalu Data 提交到 VT，VT 會自動查找到PPS的引用關系，并處理這種情況，見"vt_video_decode_accelerator_mac.cc;l=1380"
  base::ScopedCFTypeRef<CMFormatDescriptionRef> format;
  if (__builtin_available(macOS 11.0, *)) {
    OSStatus status = CMVideoFormatDescriptionCreateFromHEVCParameterSets(
        kCFAllocatorDefault,
        nalu_data_ptrs.size(),     // parameter_set_count
        &nalu_data_ptrs.front(),   // ?meter_set_pointers
        &nalu_data_sizes.front(),  // ?meter_set_sizes
        kNALUHeaderLength,         // nal_unit_header_length
        extensions, format.InitializeInto());
    OSSTATUS_LOG_IF(WARNING, status != noErr, status)
        << "CMVideoFormatDescriptionCreateFromHEVCParameterSets()";
  }
  return format;
}
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)

(VideoToolbox 的解碼流程，圖片來自 Apple)

在創建解碼 Format 后，繼續創建解碼 Session：

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

bool VTVideoDecodeAccelerator::ConfigureDecoder() {
  DVLOG(2) << __func__;
  DCHECK(decoder_task_runner_->RunsTasksInCurrentSequence());

  base::ScopedCFTypeRef<CMFormatDescriptionRef> format;
  switch (codec_) {
    case VideoCodec::kH264:
      format = CreateVideoFormatH264(active_sps_, active_spsext_, active_pps_);
      break;
#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
    case VideoCodec::kHEVC:
      // 創建CMFormatDescriptionRef
      format = CreateVideoFormatHEVC(active_vps_, active_sps_, active_pps_);
      break;
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
    case VideoCodec::kVP9:
      format = CreateVideoFormatVP9(
          cc_detector_->GetColorSpace(config_.container_color_space),
          config_.profile, config_.hdr_metadata,
          cc_detector_->GetCodedSize(config_.initial_expected_coded_size));
      break;
    default:
      NOTREACHED() << "Unsupported codec.";
  }

  if (!format) {
    NotifyError(PLATFORM_FAILURE, SFT_PLATFORM_ERROR);
    return false;
  }

  if (!FinishDelayedFrames())
    return false;

  format_ = format;
  session_.reset();

  // 利用創建好的解碼format創建解碼session
  // 如果是VP9，則強制請求硬解解碼
  // 如果是HEVC，由于一些可能的原因，我們選擇不強制硬解解碼（讓VT自己選最適合的解碼方式）
  // 可能的原因有：
  // 1. GPU不支持硬解（此時我們希望使用VT軟解）
  // 2. 解碼的Profile不受支持（比如M1支持HEVC Rext硬解，而Intel/AMD GPU不支持，此時希望軟解）
  // 3. GPU繁忙，資源不足，此時希望軟解
  const bool require_hardware = config_.profile == VP9PROFILE_PROFILE0 ||
                                config_.profile == VP9PROFILE_PROFILE2;

  // 可能是HDR視頻，因此希望輸出pix_fmt是
  // kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
  const bool is_hbd = config_.profile == VP9PROFILE_PROFILE2 ||
                      config_.profile == HEVCPROFILE_MAIN10 ||
                      config_.profile == HEVCPROFILE_REXT;
  // 創建解碼Session
  if (!CreateVideoToolboxSession(format_, require_hardware, is_hbd, &callback_,
                                 &session_, &configured_size_)) {
    NotifyError(PLATFORM_FAILURE, SFT_PLATFORM_ERROR);
    return false;
  }

  // Report whether hardware decode is being used.
  bool using_hardware = false;
  base::ScopedCFTypeRef<CFBooleanRef> cf_using_hardware;
  if (VTSessionCopyProperty(
          session_,
          // kVTDecompressionPropertyKey_UsingHardwareAcceleratedVideoDecoder
          CFSTR("UsingHardwareAcceleratedVideoDecoder"), kCFAllocatorDefault,
          cf_using_hardware.InitializeInto()) == 0) {
    using_hardware = CFBooleanGetValue(cf_using_hardware);
  }
  UMA_HISTOGRAM_BOOLEAN("Media.VTVDA.HardwareAccelerated", using_hardware);

  if (codec_ == VideoCodec::kVP9 && !vp9_bsf_)
    vp9_bsf_ = std::make_unique<VP9SuperFrameBitstreamFilter>();

  // Record that the configuration change is complete.
#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  configured_vps_ = active_vps_;
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  configured_sps_ = active_sps_;
  configured_spsext_ = active_spsext_;
  configured_pps_ = active_pps_;
  return true;
}

創建解碼 Session 的邏輯：

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

// 利用CMFormatDescriptionRef創建VTDecompressionSession
bool CreateVideoToolboxSession(
    const CMFormatDescriptionRef format,
    bool require_hardware,
    bool is_hbd,
    const VTDecompressionOutputCallbackRecord* callback,
    base::ScopedCFTypeRef<VTDecompressionSessionRef>* session,
    gfx::Size* configured_size) {
  // Prepare VideoToolbox configuration dictionaries.
  base::ScopedCFTypeRef<CFMutableDictionaryRef> decoder_config(
      CFDictionaryCreateMutable(kCFAllocatorDefault,
                                1,  // capacity
                                &kCFTypeDictionaryKeyCallBacks,
                                &kCFTypeDictionaryValueCallBacks));
  if (!decoder_config) {
    DLOG(ERROR) << "Failed to create CFMutableDictionary";
    return false;
  }

  CFDictionarySetValue(
      decoder_config,
      kVTVideoDecoderSpecification_EnableHardwareAcceleratedVideoDecoder,
      kCFBooleanTrue);
  CFDictionarySetValue(
      decoder_config,
      kVTVideoDecoderSpecification_RequireHardwareAcceleratedVideoDecoder,
      require_hardware ? kCFBooleanTrue : kCFBooleanFalse);

  CGRect visible_rect = CMVideoFormatDescriptionGetCleanAperture(format, true);
  CMVideoDimensions visible_dimensions = {
      base::ClampFloor(visible_rect.size.width),
      base::ClampFloor(visible_rect.size.height)};
  base::ScopedCFTypeRef<CFMutableDictionaryRef> image_config(
      BuildImageConfig(visible_dimensions, is_hbd));
  if (!image_config) {
    DLOG(ERROR) << "Failed to create decoder image configuration";
    return false;
  }

  // 創建解碼Session的最終邏輯
  OSStatus status = VTDecompressionSessionCreate(
      kCFAllocatorDefault,
      format,          // 我們創建好的CMFormatDescriptionRef
      decoder_config,  // video_decoder_specification
      image_config,    // destination_image_buffer_attributes
      callback,        // output_callback
      session->InitializeInto());
  if (status != noErr) {
    OSSTATUS_DLOG(WARNING, status) << "VTDecompressionSessionCreate()";
    return false;
  }

  *configured_size =
      gfx::Size(visible_rect.size.width, visible_rect.size.height);

  return true;
}

提取視頻幀并解碼

這一步開始我們就要開始正式解碼了，解碼前首先需要提取視頻幀的 SliceHeader，并從緩存中拿到到該幀引用的 SPS，PPS，VPS，計算 POC 和最大 Reorder 數。

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

switch (nalu.nal_unit_type) {
    case H265NALU::BLA_W_LP:
    case H265NALU::BLA_W_RADL:
    case H265NALU::BLA_N_LP:
    case H265NALU::IDR_W_RADL:
    case H265NALU::IDR_N_LP:
    case H265NALU::TRAIL_N:
    case H265NALU::TRAIL_R:
    case H265NALU::TSA_N:
    case H265NALU::TSA_R:
    case H265NALU::STSA_N:
    case H265NALU::STSA_R:
    case H265NALU::RADL_N:
    case H265NALU::RADL_R:
    case H265NALU::RASL_N:
    case H265NALU::RASL_R:
    case H265NALU::CRA_NUT: {
        // 針對視頻幀提取SliceHeader
        curr_slice_hdr.reset(new H265SliceHeader());
        result = hevc_parser_.ParseSliceHeader(nalu, curr_slice_hdr.get(),
                                                last_slice_hdr.get());

        if (result == H265Parser::kMissingParameterSet) {
            curr_slice_hdr.reset();
            last_slice_hdr.reset();
            WriteToMediaLog(MediaLogMessageLevel::kERROR,
                            "Missing PPS when parsing slice header");
            continue;
        }

        if (result != H265Parser::kOk) {
            curr_slice_hdr.reset();
            last_slice_hdr.reset();
            WriteToMediaLog(MediaLogMessageLevel::kERROR,
                            "Could not parse slice header");
            NotifyError(UNREADABLE_INPUT, SFT_INVALID_STREAM);
            return;
        }

        // 這里是一個Workaround，一些iOS設備拍攝的視頻如果在Seek過程首個關鍵幀是CRA幀，
        // 那么下一個幀如果是一個RASL幀，則會立即報kVTVideoDecoderBadDataErr的錯誤，
        // 因此我們需要判斷總輸出幀數是否大于5，否則跳過這些RASL幀
        if (output_count_for_cra_rasl_workaround_ < kMinOutputsBeforeRASL &&
            (nalu.nal_unit_type == H265NALU::RASL_N ||
                nalu.nal_unit_type == H265NALU::RASL_R)) {
            continue;
        }

        // 根據SliceHeader內的pps_id，拿到緩存的pps nalu data
        const H265PPS* pps =
            hevc_parser_.GetPPS(curr_slice_hdr->slice_pic_parameter_set_id);
        if (!pps) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR,
                            "Missing PPS referenced by slice");
            NotifyError(UNREADABLE_INPUT, SFT_INVALID_STREAM);
            return;
        }

        // 根據PPS內的sps_id，拿到緩存的sps nalu data
        const H265SPS* sps = hevc_parser_.GetSPS(pps->pps_seq_parameter_set_id);
        if (!sps) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR,
                            "Missing SPS referenced by PPS");
            NotifyError(UNREADABLE_INPUT, SFT_INVALID_STREAM);
            return;
        }

        // 根據VPS內的vps_id，拿到緩存的vps nalu data
        const H265VPS* vps =
            hevc_parser_.GetVPS(sps->sps_video_parameter_set_id);
        if (!vps) {
            WriteToMediaLog(MediaLogMessageLevel::kERROR,
                            "Missing VPS referenced by SPS");
            NotifyError(UNREADABLE_INPUT, SFT_INVALID_STREAM);
            return;
        }

        // 記錄一下當前激活的sps/vps/pps
        DCHECK(seen_pps_.count(curr_slice_hdr->slice_pic_parameter_set_id));
        DCHECK(seen_sps_.count(pps->pps_seq_parameter_set_id));
        DCHECK(seen_vps_.count(sps->sps_video_parameter_set_id));
        active_vps_ = seen_vps_[sps->sps_video_parameter_set_id];
        active_sps_ = seen_sps_[pps->pps_seq_parameter_set_id];
        active_pps_ = seen_pps_[curr_slice_hdr->slice_pic_parameter_set_id];

        // 計算POC
        int32_t pic_order_cnt =
            hevc_poc_.ComputePicOrderCnt(sps, pps, *curr_slice_hdr.get());

        frame->has_slice = true;
        // 是否為IDR（這里其實為IRAP）
        frame->is_idr = nalu.nal_unit_type >= H265NALU::BLA_W_LP &&
                        nalu.nal_unit_type <= H265NALU::RSV_IRAP_VCL23;
        frame->pic_order_cnt = pic_order_cnt;
        // 計算最大Reorder數
        frame->reorder_window = ComputeHEVCReorderWindow(vps);

        // 存儲上一幀的SliceHeader
        last_slice_hdr.swap(curr_slice_hdr);
        curr_slice_hdr.reset();
        [[fallthrough]];
    }
    default:
        nalus.push_back(nalu);
        data_size += kNALUHeaderLength + nalu.size;
        break;
}

檢測視頻參數是否發生變化

(H264 視頻幀的 SPS、PPS 引用關系，圖片來自 Apple)

如果視頻幀引用的 VPS，PPS，SPS 任一發生變化，則按照 VideoToolbox 的要求，需要重新配置解碼 Session：

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

if (frame->is_idr &&
      (configured_vps_ != active_vps_ || configured_sps_ != active_sps_ ||
       configured_pps_ != active_pps_)) {

    // 這里是一些校驗邏輯
    ...

    // 這里重新創建解碼format，并重新配置解碼session
    if (!ConfigureDecoder()) {
      return;
    }
}

設置輸出視頻幀的目標像素格式

鑒于我們需要支持 HDR，因此需要判斷一下視頻是否為 Main10 / Rext Profile，并調整輸出為gfx::BufferFormat::P010

// media/gpu/mac/vt_video_decode_accelerator_mac.cc

const gfx::BufferFormat buffer_format =
      config_.profile == VP9PROFILE_PROFILE2 ||
              config_.profile == HEVCPROFILE_MAIN10 ||
              config_.profile == HEVCPROFILE_REXT
          ? gfx::BufferFormat::P010
          : gfx::BufferFormat::YUV_420_BIPLANAR;

總結

在上述步驟后，硬解關鍵流程基本完工，目前代碼已合入 Chromium 104（main 分支），macOS 平臺具體實現過程和代碼 Diff 可以追溯 Crbug（https://bugs.chromium.org/p/chromium/issues/detail?id=1300444）。

Windows 的硬解

有了 macOS 硬解的開發經驗，嘗試 Windows 硬解相對變得容易了一些，盡管也踩了一些坑。

Media Foundation 方案的嘗試

文章開頭已經介紹了，實際上在 Windows 平臺，如果你可以安裝 HEVC視頻擴展，則是可以在 Edge 瀏覽器硬解 HEVC 的，因此我最初的思路也是和 Edge 一樣，通過引導 HEVC視頻擴展，完成硬解支持。

首先，使用 Edge，打開任意 HEVC 視頻，發現，Edge 使用 VDAVideoDecoder 進行 HEVC 硬解：

因此嘗試搜索發現 Windows 平臺 VDAVideoDecoder 代碼實現均位于dxva_video_decode_accelerator_win.cc文件內，繼續尋找蛛絲馬跡，發現，在開源的 Chromium 項目內，并不存在 HEVC 硬解相關的任何實現，這說明 Edge 是自己基于某個時期的 Chromium Media 模塊“魔改”出來的 HEVC 硬解支持，同時在這個過程發現了個有趣的現象：

Edge 完全不使用 D3D11VideoDecoder 進行解碼，而是使用 VDAVideoDecoder 解碼，我猜測其目的是為了推廣自家的 Media Foundation。
Edge 的色彩處理有問題 (Edge 102)，與 Chrome 不一樣的點在于，比如對于 Transfer 是 PQ 的 HDR 視頻，Edge 并沒有對其進行 Tone Mapping，導致 PQ 視頻在 Edge 下看起來有“過曝”的問題。

Edge 解碼 AV1 需要裝 AV1 Video Extension ，才可解碼 AV1（軟解+硬解)，而 Chrome 由于實現了 AV1 的 D3D11VA 硬解，以及 Gav1VideoDecoder 和 DAV1dVideoDecoder 軟解 Decoder，不需要安裝 AV1 插件也可在受支持的顯卡硬解，不受支持的顯卡軟解。

好吧，盡管沒少吐槽 Edge，但是他確實是 Windows 平臺唯一支持 HEVC 硬解的瀏覽器（當然，馬上就不是了）。

接著看 dxva_video_decode_accelerator_win.cc 的實現，從上述 Edge 解碼需要安裝 AV1 插件的邏輯反推，如果我們照著 AV1 的方式實現 HEVC，是否可行？答案是肯定的。

觀察 Supported Profile，然后將我們需要支持的 HEVCPROFILE_MAIN、HEVCPROFILE_MAIN10 加入：

// media/gpu/windows/dxva_video_decode_accelerator_win.cc

// 我們可以看到與macOS類似，VDAVideoDecoder支持的格式都被放到了Supported Profiles內
// 如下，一目了然，VDAVideoDecoder原始支持H264，VP8，VP9，AV1
constexpr VideoCodecProfile kSupportedProfiles[] = {
    H264PROFILE_BASELINE,    H264PROFILE_MAIN,        H264PROFILE_HIGH,
    VP8PROFILE_ANY,          VP9PROFILE_PROFILE0,     VP9PROFILE_PROFILE2,
    AV1PROFILE_PROFILE_MAIN, AV1PROFILE_PROFILE_HIGH, AV1PROFILE_PROFILE_PRO,
    // 添加我們需要支持的兩種Profile
    HEVCPROFILE_MAIN,        HEVCPROFILE_MAIN10,
 };

之后按照 AV1 的邏輯，加入 HEVC Codec，同時值得一提的是必須在調用 SetOutput 方法前設置分辨率（這塊坑了我大概一天的時間 Debug），代碼如下：

// media/gpu/windows/dxva_video_decode_accelerator_win.cc

...
if (config.profile == VP9PROFILE_PROFILE2 ||
      config.profile == VP9PROFILE_PROFILE3 ||
      config.profile == H264PROFILE_HIGH10PROFILE) {
    // Input file has more than 8 bits per channel.
    use_fp16_ = true;
    decoder_output_p010_or_p016_ = true;
    // the OS for VP9 which is why it works and AV1 doesn't.
    HRESULT hr = CreateAV1Decoder(IID_PPV_ARGS(&decoder_));
    RETURN_ON_HR_FAILURE(hr, "Failed to create decoder instance", false);
  } else if (profile >= HEVCPROFILE_MAIN && profile <= HEVCPROFILE_MAIN10) {

    codec_ = kCodecHEVC;
    clsid = CLSID_MSH265DecoderMFT;
    // 這里必須提前設置分辨率，否則SetOutput會失敗
    using_ms_vpx_mft_ = true;

    // 經過各種探索發現只有1.0.31823版本的HEVC視頻擴展沒有抖動問題，
    // 其他情況包括最新版本（1.0.51361.0）的HEVC視頻擴展，依然存在解碼跳幀問題
    // 顯然如果希望1.0.51361版本插件正常解碼，需要額外的配置，但無法從官網文檔找到配置方法
    HRESULT hr = CreateHEVCDecoder(IID_PPV_ARGS(&decoder_));
    RETURN_ON_HR_FAILURE(hr, "Failed to create hevc decoder instance", false);
  } else {
    if (!decoder_dll)
      RETURN_ON_FAILURE(false, "Unsupported codec.", false);
  HRESULT hr = MFCreateMediaType(&media_type);
  RETURN_ON_HR_FAILURE(hr, "MFCreateMediaType failed", false);

  // 設置主類型，參考：https://docs.microsoft.com/en-us/windows/win32/medfound/mf-mt-major-type-attribute
  hr = media_type->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
  RETURN_ON_HR_FAILURE(hr, "Failed to set major input type", false);

  if (codec_ == kCodecH264) {
  // 設置輔類型，參考：https://docs.microsoft.com/en-us/windows/win32/medfound/video-subtype-guids
  if (codec_ == kCodecHEVC) {
    hr = media_type->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_HEVC);
  } else if (codec_ == kCodecH264) {
    hr = media_type->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264);
  } else if (codec_ == kCodecVP9) {
    hr = media_type->SetGUID(MF_MT_SUBTYPE, MEDIASUBTYPE_VP90);

接著實現一下 HEVCDecoder 的獲取邏輯：

// media/gpu/windows/dxva_video_decode_accelerator_win.cc

// 不同于H264等格式，由于HEVC是以可選插件形式支持的，因此直接讀取DLL的方式并不可行
// 參考AV1的實現方法，以及微軟的官方文檔，使用::MFTEnumEx方法，最終可以拿到HEVCDecoder
HRESULT CreateHEVCDecoder(const IID& iid, void** object) {
  MFT_REGISTER_TYPE_INFO type_info = {MFMediaType_Video, MFVideoFormat_HEVC};

  base::win::ScopedCoMem<IMFActivate*> acts;
  UINT32 acts_num;
  HRESULT hr =
      ::MFTEnumEx(MFT_CATEGORY_VIDEO_DECODER, MFT_ENUM_FLAG_SORTANDFILTER,
                  &type_info, nullptr, &acts, &acts_num);
  if (FAILED(hr))
    return hr;

  if (acts_num < 1)
    return E_FAIL;

  hr = acts[0]->ActivateObject(iid, object);
  for (UINT32 i = 0; i < acts_num; ++i)
    acts[i]->Release();
  return hr;
}

同時將 HEVC Main, Main10 Profile 加入到 supported_profile_helpers.cc:

// media/gpu/windows/supported_profile_helpers.cc

// 對Windows10 1709以上版本添加HEVC硬解支持：
if (base::win::GetVersion() >= base::win::Version::WIN10_RS2) {
    if (profile_id == D3D11_DECODER_PROFILE_HEVC_VLD_MAIN) {
        supported_resolutions[HEVCPROFILE_MAIN] = GetResolutionsForGUID(
            video_device.Get(), profile_id, kModernResolutions, DXGI_FORMAT_NV12);
        continue;
    }
    if (profile_id == D3D11_DECODER_PROFILE_HEVC_VLD_MAIN10) {
        supported_resolutions[HEVCPROFILE_MAIN10] = GetResolutionsForGUID(
            video_device.Get(), profile_id, kModernResolutions, DXGI_FORMAT_P010);
        continue;
    }
}

最后，還需要修改一下引導邏輯，強制讓 HEVC 編碼格式使用 VDAVideoDecoder 而不是 D3D11VideoDecoder：

// media/mojo/services/gpu_mojo_media_client_win.cc

...
// 強制令HEVC編碼格式的視頻使用VDAVideoDecoder解碼
std::unique_ptr<VideoDecoder> CreatePlatformVideoDecoder(
    const VideoDecoderTraits& traits) {
   if (!ShouldUseD3D11VideoDecoder(*traits.gpu_workarounds) || (
      config.profile() >= HEVCPROFILE_MAIN &&
      config.profile() <= HEVCPROFILE_MAIN10)) {
    if (traits.gpu_workarounds->disable_dxva_video_decoder)
      return nullptr;
    return VdaVideoDecoder::Create(
...

后面省略了一些透傳 Profile 的代碼。

在上述步驟執行后，一切大功告成，代碼實現基本完成，HEVC視頻擴展幫我們處理了大部分的解碼邏輯，所以實現過程相當簡單。

但，問題來了！由于HEVC視頻擴展插件在 1.0.31823 之后的版本存在抖動問題，而 1.0.50361 雖然解決了抖動的問題，但其官網文檔并沒有明確詳述如何配置 Decoder 解決該問題（注：歡迎貢獻配置方法），因此，如果我們需要用HEVC視頻擴展的方案，則必須限制用戶本地強行使用 1.0.31823 版本。

為此我嘗試寫過 nsh 腳本，試圖在用戶電腦存在非 31823 版本HEVC視頻擴展的情況，強制卸載并重裝 1.0.31823 版本的 HEVC視頻擴展，但，因為 Windows 商店會默認對 Appx 擴展自動更新，這導致如果希望用戶電腦不更新HEVC視頻擴展，則必須強迫用戶關閉 Windows 商店的自動更新，這無疑意味著這個方案是個半成品，很可能需要換技術方案了，但我們的選擇真的不多。

然而就在即將放棄的時候，我突然想到為啥不照著 Chromium 把 D3D11VA 的方案實現一遍呢？Media Foundation 絕不是唯一解！時間點是 2022 年的 2 月中旬，我抱著嘗試的態度，打開了 source.chromium.org 這個網站，試圖學習下其他格式 D3D11VA 的解碼方法，并在 media 文件夾偶然間瞥到了一個叫 d3d11_h265_accelerator.cc 的文件，這是啥？怎么可能？Windows 不是沒有人實現過 HEVC 硬解么？然后我果斷看了下提交時間，發現在 2 月 8 號，這個文件才被合入到 Chromium！感謝作者 @Jianlin Qiu（來自 Intel 的大佬），把 Windows 的 D3D11 硬解加速實現的差不多。

使用 D3D11VA 硬解

Trace 了下 @Jianlin Qiu 實現相關的 crbug（https://bugs.chromium.org/p/chromium/issues/detail?id=1286132#c18），合入其代碼，簡單做了下測試發現一半的視頻可以播（早期版本有些小問題，目前均已解決）。

遂觀察其實現邏輯，發現 Windows 的硬解實現邏輯與 macOS 完全不同，在 macOS，盡管我會對 SPS / PPS / VPS / Slice Header 進行 Parse，但是實際上，最終調用CMVideoFormatDescriptionCreateFromHEVCParameterSets 方法創建解碼 Format 時，傳給 VT 的參數是包含了 VPS, SPS, PPS 的 Nalu Data 的數組，也就是說理論上如果我們不計算 POC，不 Reorder，直接將 Nalu Data 塞給 VideoToolbox，也可以解碼，只是幀組會抖動罷了。

但到了 Windows 和 Linux 這里，實現起來要麻煩的多。

根據 Mircosoft 官網，可知 D3D11VA 硬解的實際流程可參考這篇文章（https://docs.microsoft.com/en-us/windows/win32/medfound/supporting-direct3d-11-video-decoding-in-media-foundation#open-a-device-handle），總結起來其實主要工作在于對每一個視頻幀的圖片參數拼裝。

GPU 是否支持硬解檢測

GPU 是否支持硬解，這里的邏輯，首先假定默認是不支持 HEVC Main / Main10 的，然后調用 D3D11 Device 提供的 GetVideoDecoderConfig 方法，拿到支持的 Codec 列表，若列表中存在 HEVC 則認為支持：

// media/gpu/windows/d3d11_video_decoder.cc

D3D11_VIDEO_DECODER_CONFIG dec_config = {};
bool found = false;

for (UINT i = 0; i < config_count; i++) {
// 調用該方法，d3d11會返回其所支持的全部codec類型
hr = video_device_->GetVideoDecoderConfig(
    decoder_configurator_->DecoderDescriptor(), i, &dec_config);
if (FAILED(hr))
    return {D3D11Status::Codes::kGetDecoderConfigFailed, hr};

if (dec_config.ConfigBitstreamRaw == 1 &&
    (config_.codec() == VideoCodec::kVP9 ||
        config_.codec() == VideoCodec::kAV1 ||
        config_.codec() == VideoCodec::kHEVC)) {
    // DXVA HEVC, VP9, and AV1 specifications say ConfigBitstreamRaw
    // "shall be 1".
    // 如果類型中有HEVC類型，且ConfigBitstreamRaw == 1，則顯卡支持硬解
    found = true;
    break;
}

if (config_.codec() == VideoCodec::kH264 &&
    dec_config.ConfigBitstreamRaw == 2) {
    // ConfigBitstreamRaw == 2 means the decoder uses DXVA_Slice_H264_Short.
    found = true;
    break;
}
}
if (!found)
    return D3D11Status::Codes::kDecoderUnsupportedConfig;

理解 DXVA HEVC Spec

如上流程可知，根據 DXVA HEVC Spec，要正確實現解碼，需要自己提前解析好其要的 Picture Params，以 HEVC 的 Picture Params 為例，結構體如下，每一個參數都不能缺少，這本身工作量就不小，但好在 @Jeffery 大佬在實現 Linux 的 H265 Decoder 和 H265 Parser 時已經完成了大部分工作，因此 @Jianlin 大佬的工作主要是如何正確的將這些已經 Parse 好的 Params 拼裝和計算，并塞給 D3D11。

// 誠然，如果每個軟件都要實現一遍拼裝邏輯，成本高的離譜
// 相比macOS的API設計，DXVA規范設計的非常復雜
// 但我相信其一定有自己的理由

typedef struct _DXVA_PicEntry_HEVC {
  union {
    struct {
      UCHAR  Index7Bits  :7;
      UCHAR  AssociatedFlag   :1;
    };
    UCHAR  bPicEntry;
  };
} DXVA_PicEntry_HEVC, *PDXVA_PicEntry_HEVC;

typedef struct _DXVA_PicParams_HEVC {
  USHORT             PicWidthInMinCbsY;
  USHORT             PicHeightInMinCbsY;
  union {
    struct {
      USHORT chroma_format_idc  :2;
      USHORT separate_colour_plane_flag   :1;
      USHORT bit_depth_luma_minus8   :3;
      USHORT bit_depth_chroma_minus8  :3;
      USHORT log2_max_pic_order_cnt_lsb_minus4  :4;
      USHORT NoPicReorderingFlag   :1;
      USHORT  NoBiPredFlag   :1;
      USHORT ReservedBits1     :1;
    };
    USHORT  wFormatAndSequenceInfoFlags;
  };
  DXVA_PicEntry_HEVC CurrPic;
  UCHAR              sps_max_dec_pic_buffering_minus1;
  UCHAR              log2_min_luma_coding_block_size_minus3;
  UCHAR              log2_diff_max_min_luma_coding_block_size;
  UCHAR              log2_min_transform_block_size_minus2;
  UCHAR              log2_diff_max_min_transform_block_size;
  UCHAR              max_transform_hierarchy_depth_inter;
  UCHAR              max_transform_hierarchy_depth_intra;
  UCHAR              num_short_term_ref_pic_sets;
  UCHAR              num_long_term_ref_pics_sps;
  UCHAR              num_ref_idx_l0_default_active_minus1;
  UCHAR              num_ref_idx_l1_default_active_minus1;
  CHAR               init_qp_minus26;
  UCHAR              ucNumDeltaPocsOfRefRpsIdx;
  USHORT             wNumBitsForShortTermRPSInSlice;
  USHORT             ReservedBits2;
  union {
    struct {
      UINT32 scaling_list_enabled_flag  :1;
      UINT32 amp_enabled_flag  :1;
      UINT32 sample_adaptive_offset_enabled_flag  :1;
      UINT32 pcm_enabled_flag   :1;
      UINT32 pcm_sample_bit_depth_luma_minus1   :4;
      UINT32 pcm_sample_bit_depth_chroma_minus1     :4;
      UINT32 log2_min_pcm_luma_coding_block_size_minus3    :2;
      UINT32 log2_diff_max_min_pcm_luma_coding_block_size  :2;
      UINT32 pcm_loop_filter_disabled_flag  :1;
      UINT32 long_term_ref_pics_present_flag   :1;
      UINT32 sps_temporal_mvp_enabled_flag  :1;
      UINT32 strong_intra_smoothing_enabled_flag   :1;
      UINT32 dependent_slice_segments_enabled_flag    :1;
      UINT32 output_flag_present_flag   :1;
      UINT32 num_extra_slice_header_bits    :3;
      UINT32 sign_data_hiding_enabled_flag  :1;
      UINT32 cabac_init_present_flag  :1;
      UINT32 ReservedBits3    :5;
    };
    UINT32             dwCodingParamToolFlags;
    union {
      struct {
        UINT32 constrained_intra_pred_flag  :1;
        UINT32 transform_skip_enabled_flag  :1;
        UINT32 cu_qp_delta_enabled_flag  :1;
        UINT32 pps_slice_chroma_qp_offsets_present_flag  :1;
        UINT32 weighted_pred_flag  :1;
        UINT32 weighted_bipred_flag  :1;
        UINT32 transquant_bypass_enabled_flag  :1;
        UINT32 tiles_enabled_flag   :1;
        UINT32 entropy_coding_sync_enabled_flag   :1;
        UINT32 uniform_spacing_flag    :1;
        UINT32 loop_filter_across_tiles_enabled_flag   :1;
        UINT32 pps_loop_filter_across_slices_enabled_flag  :1;
        UINT32 deblocking_filter_override_enabled_flag  :1;
        UINT32 pps_deblocking_filter_disabled_flag  :1;
        UINT32 lists_modification_present_flag  :1;
        UINT32 slice_segment_header_extension_present_flag  :1;
        UINT32 IrapPicFlag  :1;
        UINT32 IdrPicFlag     :1;
        UINT32 IntraPicFlag   :1;
        UINT32 ReservedBits4     :13;
      };
      UINT32   dwCodingSettingPicturePropertyFlags;
    };
    CHAR               pps_cb_qp_offset;
    CHAR               pps_cr_qp_offset;
    UCHAR              num_tile_columns_minus1;
    UCHAR              num_tile_rows_minus1;
    USHORT             column_width_minus1[19];
    USHORT             row_height_minus1[21];
    UCHAR              diff_cu_qp_delta_depth;
    CHAR               pps_beta_offset_div2;
    CHAR               pps_tc_offset_div2;
    UCHAR              log2_parallel_merge_level_minus2;
    INT                CurrPicOrderCntVal;
    DXVA_PicEntry_HEVC RefPicList[15];
    UCHAR              ReservedBits5;
    INT                PicOrderCntValList[15];
    UCHAR              RefPicSetStCurrBefore[8];
    UCHAR              RefPicSetStCurrAfter[8];
    UCHAR              RefPicSetLtCurr[8];
    USHORT             ReservedBits6;
    USHORT             ReservedBits7;
    UINT               StatusReportFeedbackNumber;
  };
} DXVA_PicParams_HEVC, *PDXVA_PicParams_HEVC;

填充默認 Picture Params

實現硬解加速本身不需要實現解碼邏輯，因此其實 H265Accelerator 本身的功能主要在于拼裝 DXVA 所要的 Picture Params，并正確提交。這個過程，首先需要填充默認的 Picture Params：

// media/gpu/windows/d3d11_h265_accelerator.cc

void D3D11H265Accelerator::FillPicParamsWithConstants(
    DXVA_PicParams_HEVC* pic) {
  // According to DXVA spec section 2.2, this optional 1-bit flag
  // has no meaning when used for CurrPic so always configure to 0.
  pic->CurrPic.AssociatedFlag = 0;

  // num_tile_columns_minus1 and num_tile_rows_minus1 will only
  // be set if tiles are enabled. Set to 0 by default.
  pic->num_tile_columns_minus1 = 0;
  pic->num_tile_rows_minus1 = 0;

  // Host decoder may set this to 1 if sps_max_num_reorder_pics is 0,
  // but there is no requirement that NoPicReorderingFlag must be
  // derived from it. So we always set it to 0 here.
  pic->NoPicReorderingFlag = 0;

  // Must be set to 0 in absence of indication whether B slices are used
  // or not, and it does not affect the decoding process.
  pic->NoBiPredFlag = 0;

  // Shall be set to 0 and accelerators shall ignore its value.
  pic->ReservedBits1 = 0;

  // Bit field added to enable DWORD alignment and should be set to 0.
  pic->ReservedBits2 = 0;

  // Should always be set to 0.
  pic->ReservedBits3 = 0;

  // Should be set to 0 and ignored by accelerators
  pic->ReservedBits4 = 0;

  // Should always be set to 0.
  pic->ReservedBits5 = 0;

  // Should always be set to 0.
  pic->ReservedBits6 = 0;

  // Should always be set to 0.
  pic->ReservedBits7 = 0;
}

從 SPS 等位置提取 Picture Params

下面基本都是一些枯燥的流程, 利用 H265 Parser 解析后的結果，去填充 Picture Params：

// media/gpu/windows/d3d11_h265_accelerator.cc

#define ARG_SEL(_1, _2, NAME, ...) NAME
#define SPS_TO_PP1(a) pic_param->a = sps->a;
#define SPS_TO_PP2(a, b) pic_param->a = sps->b;
#define SPS_TO_PP(...) ARG_SEL(__VA_ARGS__, SPS_TO_PP2, SPS_TO_PP1)(__VA_ARGS__)
void D3D11H265Accelerator::PicParamsFromSPS(DXVA_PicParams_HEVC* pic_param,
                                            const H265SPS* sps) {
  // Refer to formula 7-14 and 7-16 of HEVC spec.
  int min_cb_log2_size_y = sps->log2_min_luma_coding_block_size_minus3 + 3;
  pic_param->PicWidthInMinCbsY =
      sps->pic_width_in_luma_samples >> min_cb_log2_size_y;
  pic_param->PicHeightInMinCbsY =
      sps->pic_height_in_luma_samples >> min_cb_log2_size_y;
  // wFormatAndSequenceInfoFlags from SPS
  SPS_TO_PP(chroma_format_idc);
  SPS_TO_PP(separate_colour_plane_flag);
  SPS_TO_PP(bit_depth_luma_minus8);
  SPS_TO_PP(bit_depth_chroma_minus8);
  SPS_TO_PP(log2_max_pic_order_cnt_lsb_minus4);

  // HEVC DXVA spec does not clearly state which slot
  // in sps->sps_max_dec_pic_buffering_minus1 should
  // be used here. However section A4.1 of HEVC spec
  // requires the slot of highest tid to be used for
  // indicating the maximum DPB size if level is not
  // 8.5.
  int highest_tid = sps->sps_max_sub_layers_minus1;
  pic_param->sps_max_dec_pic_buffering_minus1 =
      sps->sps_max_dec_pic_buffering_minus1[highest_tid];

  SPS_TO_PP(log2_min_luma_coding_block_size_minus3);
  SPS_TO_PP(log2_diff_max_min_luma_coding_block_size);

  // DXVA spec names them differently with HEVC spec.
  SPS_TO_PP(log2_min_transform_block_size_minus2,
            log2_min_luma_transform_block_size_minus2);
  SPS_TO_PP(log2_diff_max_min_transform_block_size,
            log2_diff_max_min_luma_transform_block_size);

  SPS_TO_PP(max_transform_hierarchy_depth_inter);
  SPS_TO_PP(max_transform_hierarchy_depth_intra);
  SPS_TO_PP(num_short_term_ref_pic_sets);
  SPS_TO_PP(num_long_term_ref_pics_sps);

  // dwCodingParamToolFlags extracted from SPS
  SPS_TO_PP(scaling_list_enabled_flag);
  SPS_TO_PP(amp_enabled_flag);
  SPS_TO_PP(sample_adaptive_offset_enabled_flag);
  SPS_TO_PP(pcm_enabled_flag);

  // 這里發現過一個bug
  //（fix：https://chromium-review.googlesource.com/c/chromium/src/+/3538144）
  // 部分單反拍出的視頻如果這里填充錯誤會導致花屏
  if (sps->pcm_enabled_flag) {
    SPS_TO_PP(pcm_sample_bit_depth_luma_minus1);
    SPS_TO_PP(pcm_sample_bit_depth_chroma_minus1);
    SPS_TO_PP(log2_min_pcm_luma_coding_block_size_minus3);
    SPS_TO_PP(log2_diff_max_min_pcm_luma_coding_block_size);
    SPS_TO_PP(pcm_loop_filter_disabled_flag);
  }
  SPS_TO_PP(long_term_ref_pics_present_flag);
  SPS_TO_PP(sps_temporal_mvp_enabled_flag);
  SPS_TO_PP(strong_intra_smoothing_enabled_flag);
}
#undef SPS_TO_PP
#undef SPS_TO_PP2
#undef SPS_TO_PP1

Picture Params 還需要從 PPS，SliceHeader，以及計算好的 Ref Pic List，Picture 填充，考慮到內容過于繁瑣，這里暫時省略，整體思路可以概括為參數拼裝。

處理分辨率，色彩深度的突變

現實中的實際視頻，尤其是在 WebRTC 場景產生的視頻，可能存在分辨率或者色彩深度突變的情況，因此，在實際實現 Decoder 的過程中，處理這種情況至關重要，如果處理不好，輕則會導致視頻花屏、綠屏，重則會導致 D3D11 device context lost，并最終導致 GPU 進程崩潰。

// media/gpu/h265_decoder.cc

switch (curr_nalu_->nal_unit_type) {
      // 對每個視頻幀解碼
      case H265NALU::BLA_W_LP:  // fallthrough
      case H265NALU::BLA_W_RADL:
      case H265NALU::BLA_N_LP:
      case H265NALU::IDR_W_RADL:
      case H265NALU::IDR_N_LP:
      case H265NALU::TRAIL_N:
      case H265NALU::TRAIL_R:
      case H265NALU::TSA_N:
      case H265NALU::TSA_R:
      case H265NALU::STSA_N:
      case H265NALU::STSA_R:
      case H265NALU::RADL_N:
      case H265NALU::RADL_R:
      case H265NALU::RASL_N:
      case H265NALU::RASL_R:
      case H265NALU::CRA_NUT:
        if (!curr_slice_hdr_) {
          curr_slice_hdr_.reset(new H265SliceHeader());
          // 對所有視頻幀，解析SliceHeader
          par_res = parser_.ParseSliceHeader(*curr_nalu_, curr_slice_hdr_.get(),
                                             last_slice_hdr_.get());
          ....

          state_ = kTryPreprocessCurrentSlice;
          // 這里負責處理檢測是否為irap幀 (之前因為使用sps的id去判斷是否發生變化，
          // 導致了部分視頻崩潰），因此使用irap作為判斷條件，如果是irap
          // 則去檢查是否該幀引用的分辨率，色彩空間等參數是否發生變化
          if (curr_slice_hdr_->irap_pic) {
            bool need_new_buffers = false;
            if (!ProcessPPS(curr_slice_hdr_->slice_pic_parameter_set_id,
                            &need_new_buffers)) {
              SET_ERROR_AND_RETURN();
            }

            // 如果發生變化，則need_new_buffers賦值true，返回kConfigChange，
            // 并重新創建D3D11Decoder
            if (need_new_buffers) {
              curr_pic_ = nullptr;
              return kConfigChange;
            }
          }
        }

        ....

// 這里是實際的檢測邏輯，profile，色深，分辨率若發生變化，則need_new_buffers改為true
bool H265Decoder::ProcessPPS(int pps_id, bool* need_new_buffers) {
  DVLOG(4) << "Processing PPS id:" << pps_id;

  const H265PPS* pps = parser_.GetPPS(pps_id);
  // Slice header parsing already verified this should exist.
  DCHECK(pps);

  const H265SPS* sps = parser_.GetSPS(pps->pps_seq_parameter_set_id);
  // PPS parsing already verified this should exist.
  DCHECK(sps);

  if (need_new_buffers)
    *need_new_buffers = false;

  gfx::Size new_pic_size = sps->GetCodedSize();
  gfx::Rect new_visible_rect = sps->GetVisibleRect();
  if (visible_rect_ != new_visible_rect) {
    DVLOG(2) << "New visible rect: " << new_visible_rect.ToString();
    visible_rect_ = new_visible_rect;
  }
  if (!IsYUV420Sequence(*sps)) {
    DVLOG(1) << "Only YUV 4:2:0 is supported";
    return false;
  }

  // Equation 7-8
  max_pic_order_cnt_lsb_ =
      std::pow(2, sps->log2_max_pic_order_cnt_lsb_minus4 + 4);

  VideoCodecProfile new_profile = H265Parser::ProfileIDCToVideoCodecProfile(
      sps->profile_tier_level.general_profile_idc);
  uint8_t new_bit_depth = 0;
  if (!ParseBitDepth(*sps, new_bit_depth))
    return false;
  if (!IsValidBitDepth(new_bit_depth, new_profile)) {
    DVLOG(1) << "Invalid bit depth=" << base::strict_cast<int>(new_bit_depth)
             << ", profile=" << GetProfileName(new_profile);
    return false;
  }
  if (pic_size_ != new_pic_size || dpb_.max_num_pics() != sps->max_dpb_size ||
      profile_ != new_profile || bit_depth_ != new_bit_depth) {
    if (!Flush())
      return false;
    DVLOG(1) << "Codec profile: " << GetProfileName(new_profile)
             << ", level(x30): " << sps->profile_tier_level.general_level_idc
             << ", DPB size: " << sps->max_dpb_size
             << ", Picture size: " << new_pic_size.ToString()
             << ", bit_depth: " << base::strict_cast<int>(new_bit_depth);
    profile_ = new_profile;
    bit_depth_ = new_bit_depth;
    pic_size_ = new_pic_size;
    dpb_.set_max_num_pics(sps->max_dpb_size);
    if (need_new_buffers)
      *need_new_buffers = true;
  }

  return true;
}

可以看到在返回 kConfigChange 后，實際上是重新創建了一個新的 D3D11Decoder，這個過程用戶在前端完全無感知，創建速度非常快，整體視頻播放不會感受到一絲卡頓，是連貫的，相比 VLC 處理的體驗更好。

// media/gpu/windows/d3d11_video_decoder.cc

...
} else if (result == media::AcceleratedVideoDecoder::kConfigChange) {
    // 忽略首次變化的情況
    const auto new_bit_depth = accelerated_video_decoder_->GetBitDepth();
    const auto new_profile = accelerated_video_decoder_->GetProfile();
    const auto new_coded_size = accelerated_video_decoder_->GetPicSize();
    if (new_profile == config_.profile() &&
        new_coded_size == config_.coded_size() &&
        new_bit_depth == bit_depth_ && !picture_buffers_.size()) {
        continue;
    }

    // Update the config.
    MEDIA_LOG(INFO, media_log_)
        << "D3D11VideoDecoder config change: profile: "
        << static_cast<int>(new_profile) << " coded_size: ("
        << new_coded_size.width() << ", " << new_coded_size.height() << ")";
    profile_ = new_profile;
    config_.set_profile(profile_);
    config_.set_coded_size(new_coded_size);

    // 如果發生變化，則重新創建D3D11Decoder
    auto video_decoder_or_error = CreateD3D11Decoder();
    if (video_decoder_or_error.has_error()) {
        return NotifyError(std::move(video_decoder_or_error).error());
    }
    DCHECK(set_accelerator_decoder_cb_);
    set_accelerator_decoder_cb_.Run(
        std::move(video_decoder_or_error).value());
    picture_buffers_.clear();
} else if (result == media::AcceleratedVideoDecoder::kTryAgain) {
...

處理非 HEVC Main / Main10 的其他 Profile

根據 HEVC Spec 2021，HEVC 一共存在 11 種 Profile，具體視頻使用哪種 Profile 可由 SPS 中的general_profile_idc的值來判斷，由于之前 Chromium 沒有定義其他 8 種 Profile，導致其他 Profile 會被當作 Main Profile，并使 FFMpegVideoDecoder 的兜底邏輯失敗，因此在這個 CL （https://chromium-review.googlesource.com/c/chromium/src/+/3552293）中將其他幾種 Profile 添加解決了這個問題。

HEVC 的 11 種 Profile：

// media/mojo/mojom/stable/stable_video_decoder_types.mojom

// Maps to |media.mojom.VideoCodecProfile|.
[Stable, Extensible]
enum VideoCodecProfile {
  // Keep the values in this enum unique, as they imply format (h.264 vs. VP8,
  // for example), and keep the values for a particular format grouped
  // together for clarity.
  // Next version: 2
  // Next value: 37
  // 跳過
  ...,
  kHEVCProfileMin = 16,
  // 下面的三種Profile是HEVC Version1定義的三種基礎Profile
  // HEVC Main Profile，最高支持8Bit，YUV420
  // 蘋果老款不支持杜比世界的iPhone拍的都是這種
  kHEVCProfileMain = kHEVCProfileMin,
  // HEVC Main10 Profile, 支持最高10bit，YUV420
  // 蘋果新款支持杜比視界（HLG8.4）的iPhone拍的HDR視頻都是這種
  kHEVCProfileMain10 = 17,
  // 一個傳說中的Profile，并沒有見過一個視頻是這個Profile
  // 使用`ffmpeg -i bear-1280x720.mp4 -vcodec hevc -profile:v mainstillpicture bear-1280x720-hevc-msp.mp4`
  // 轉碼后也無法獲得該類型profile
  kHEVCProfileMainStillPicture = 18,
  kHEVCProfileMax = kHEVCProfileMainStillPicture,
  // 跳過
  ...,
  // 這里是新增的8種Profile
  [MinVersion=1] kHEVCProfileExtMin = 29,
  // Format range extension（HEVC擴展格式，HEVC Version2新增）
  // 佳能，索尼，尼康等新機型拍出來的422 10bit HEVC都是這種 最高支持16bit，YUV444
  // 在macOS M1 Mac機型10bit及以下可硬解
  // 在Windows Intel機型可硬解，因為Intel自己擴展了DXVA規范實現了這部分能力
  //（VLC支持，但目前Chromium還沒支持）
  [MinVersion=1] kHEVCProfileRext = kHEVCProfileExtMin,
  // 后面的這7種都是存在于Spec上的Profile，俺也沒見找到過樣片，只知道他們都不能硬解
  [MinVersion=1] kHEVCProfileHighThroughput = 30,
  [MinVersion=1] kHEVCProfileMultiviewMain = 31,
  [MinVersion=1] kHEVCProfileScalableMain = 32,
  [MinVersion=1] kHEVCProfile3dMain = 33,
  [MinVersion=1] kHEVCProfileScreenExtended = 34,
  [MinVersion=1] kHEVCProfileScalableRext = 35,
  [MinVersion=1] kHEVCProfileHighThroughputScreenExtended = 36,
  [MinVersion=1] kHEVCProfileExtMax = kHEVCProfileHighThroughputScreenExtended,
};

Profile 的賦值邏輯：

// media/ffmpeg/ffmpeg_common.cc

int hevc_profile = -1;
// 這里由于chrome并沒有引入ffmpeg hevcps相關代碼，因此需要自己解析一遍
// 拿到HEVCDecoderConfigurationRecord，并獲取general_profile_idc
if (codec_context->extradata && codec_context->extradata_size) {
  mp4::HEVCDecoderConfigurationRecord hevc_config;
  if (hevc_config.Parse(codec_context->extradata,
                        codec_context->extradata_size)) {
    hevc_profile = hevc_config.general_profile_idc;
#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
    if (!color_space.IsSpecified()) {
      // 由于沒有引入hevc_ps相關代碼，在無法從容器獲取色彩空間的情況
      // 手動從SPS提取色彩空間
      color_space = hevc_config.GetColorSpace();
    }
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  }
}
// The values of general_profile_idc are taken from the HEVC standard, see
// the latest https://www.itu.int/rec/T-REC-H.265/en
switch (hevc_profile) {
    case 1:
        profile = HEVCPROFILE_MAIN;
        break;
    case 2:
        profile = HEVCPROFILE_MAIN10;
        break;
    case 3:
        profile = HEVCPROFILE_MAIN_STILL_PICTURE;
        break;
    case 4:
        profile = HEVCPROFILE_REXT;
        break;
    case 5:
        profile = HEVCPROFILE_HIGH_THROUGHPUT;
        break;
    case 6:
        profile = HEVCPROFILE_MULTIVIEW_MAIN;
        break;
    case 7:
        profile = HEVCPROFILE_SCALABLE_MAIN;
        break;
    case 8:
        profile = HEVCPROFILE_3D_MAIN;
        break;
    case 9:
        profile = HEVCPROFILE_SCREEN_EXTENDED;
        break;
    case 10:
        profile = HEVCPROFILE_SCALABLE_REXT;
        break;
    case 11:
        profile = HEVCPROFILE_HIGH_THROUGHPUT_SCREEN_EXTENDED;
        break;
    default:
        // Always assign a default if all heuristics fail.
        profile = HEVCPROFILE_MAIN;
        break;
}

當 Profile 能力補齊后，就可以支持將硬解不支持的 Profile 自動 fallback 到 FFMpegVideoDecoder 軟解的能力了，這樣可以確保我們目前可見的所有 HEVC Profile 都可以正常播放（能走硬解走硬解，否則走軟解）。

處理色彩空間提取邏輯

之前版本的 Chromium 提取色彩空間的邏輯要么是利用 ffmpeg 的 avcodec_parameters_to_context 獲取，最終利用 ffmpeg 解析 mov 或者 mp4 container 的邏輯獲取，要么在 demux 階段提取 FOURCC_COLR Box 獲取，這樣做對于標準的 mov, mp4 視頻并沒有什么問題，然而很多編碼器在實現時并沒有將色彩空間信息寫入容器，導致 Chromium 的之前的邏輯無法正確提取到 HEVC 視頻的色彩空間。

因此我們需要利用解析好的 HEVCDecoderConfigurationRecord，在 demux 階段對 SPS 進行解析，并提取其 sps->vui_parameters->colour_primaries , sps->vui_parameters->transfer_characteristics , sps->vui_parameters->matrix_coeffs , 以及 sps->vui_parameters->video_full_range_flag以生成 VideoColorSpace。

// media/formats/mp4/hevc.cc

#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
VideoColorSpace HEVCDecoderConfigurationRecord::GetColorSpace() {
  // 利用HEVCDecoderConfigurationRecord的HVCCNALArray，解析SPS
  if (!arrays.size()) {
    DVLOG(1) << "HVCCNALArray not found, fallback to default colorspace";
    return VideoColorSpace();
  }

  std::vector<uint8_t> buffer;

  for (size_t j = 0; j < arrays.size(); j++) {
    for (size_t i = 0; i < arrays[j].units.size(); ++i) {
      buffer.insert(buffer.end(), kAnnexBStartCode,
                    kAnnexBStartCode + kAnnexBStartCodeSize);
      buffer.insert(buffer.end(), arrays[j].units[i].begin(),
                    arrays[j].units[i].end());
    }
  }

  H265Parser parser;
  H265NALU nalu;
  parser.SetStream(buffer.data(), buffer.size());
  while (true) {
    H265Parser::Result result = parser.AdvanceToNextNALU(&nalu);

    if (result != H265Parser::kOk)
      return VideoColorSpace();

    switch (nalu.nal_unit_type) {
      case H265NALU::SPS_NUT: {
        int sps_id = -1;
        result = parser.ParseSPS(&sps_id);
        if (result != H265Parser::kOk) {
          DVLOG(1) << "Could not parse SPS, fallback to default colorspace";
          return VideoColorSpace();
        }

        const H265SPS* sps = parser.GetSPS(sps_id);
        DCHECK(sps);
        return sps->GetColorSpace();
      }
      default:
        break;
    }
  }
}
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)

// media/formats/mp4/box_definitions.cc

case FOURCC_HEV1:
case FOURCC_HVC1: {
  DVLOG(2) << __func__ << " parsing HEVCDecoderConfigurationRecord (hvcC)";
  std::unique_ptr<HEVCDecoderConfigurationRecord> hevcConfig(
      new HEVCDecoderConfigurationRecord());
  RCHECK(reader->ReadChild(hevcConfig.get()));
  video_codec = VideoCodec::kHEVC;
  // 這里調用，獲取一下色彩空間
#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  video_color_space = hevcConfig->GetColorSpace();
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  video_codec_profile = hevcConfig->GetVideoProfile();
  ...
case FOURCC_DVH1:
case FOURCC_DVHE: {
  DVLOG(2) << __func__ << " reading HEVCDecoderConfigurationRecord (hvcC)";
  std::unique_ptr<HEVCDecoderConfigurationRecord> hevcConfig(
      new HEVCDecoderConfigurationRecord());
  RCHECK(reader->ReadChild(hevcConfig.get()));
  // 這里調用，獲取一下色彩空間
#if BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  video_color_space = hevcConfig->GetColorSpace();
#endif  // BUILDFLAG(ENABLE_HEVC_PARSER_AND_HW_DECODER)
  ...

在與 Edge 進行對比后可以發現，Edge 只通過容器讀取色彩空間，而沒有 SPS 讀取的邏輯，這會導致 HDR 視頻無法正確 Tone Mapping，最終渲染視頻異常，而在 Chromium 內則一切正常。

（左圖為 Edge 在處理 HLG 視頻時 Tone Mapping 異常的問題）

總結

在上述步驟后，硬解步驟已完成的差不多，目前所有的 CL 和 Fix 已合入 Chromium 104（main 分支），Windows 平臺具體實現過程和代碼 Diff 也可以追溯這個 Crbug（https://bugs.chromium.org/p/chromium/issues/detail?id=1286132）。

與 Edge / Safari 的對比與測試

說了一堆技術實現可能會很枯燥，下面來到最有趣的環節：“與競品對比”。為了公平起見，使用原生 HTML + 原生 Video 標簽方式，排除一切外界干擾完成一個基礎的測試頁面，并收集了 28 個不同 Profile、HDR / 非 HDR、不同位深的測試 Case（測試素材來自網絡：https://lf3-cdn-tos.bytegoofy.com/obj/tcs-client/resources/video_demo_hevc.html），下面開始測試：

HDR 測試

我們首先進行 HDR 能力測試，測試選擇了多個 PQ、HLG Transfer 的 HEVC 視頻。

PQ SDR 顯示器測試

畢竟不是所有人都使用 HDR 顯示器，甚至可以說 99.99% 的用戶仍在使用 SDR 顯示器，因此 HDR 視頻是否能在普通 SDR 顯示器正確顯示，是非常重要的，將 HDR 視頻轉換為 SDR 視頻的過程一般被稱作做 Tone Mapping，因此下述測試主要測試瀏覽器是否支持 Tone Mapping。

（左圖：Edge 102 Windows，右圖: Chromium 104 Windows）

可以看到在 Windows 平臺，Edge 在處理 PQ 曲線的 HDR 視頻時存在 Tone Mapping 異常的問題，而 Chromium 可以正常 Tone Mapping，這一輪 Chromium 勝。

（左圖：Safari 15.3 macOS，右圖: Chromium 104 macOS）

在 macOS 平臺，Safari 的對 PQ HDR 視頻的 Tone Mapping 處理的很棒，Chromium 104 也同樣不錯，二者效果完全相同，而且由于 macOS 支持 EDR（簡介：https://developer.apple.com/videos/play/wwdc2021/10161/），即使使用 SDR 顯示器，其顯示效果相比 Windows 平臺更佳（Mac 會適當擴展高光），兩款瀏覽器這一輪打平。

（EDR 簡介，圖片來自 Apple）

PQ HDR 顯示器測試

接著我們將顯示器調為 HDR 模式，并開啟操作系統的 HDR 輸出。

（左圖：Edge 102 Windows，右圖: Chromium 104 Windows）

可以看到，如果視頻是 mov，一般都會在封裝容器會寫入色彩空間，此時二者區別不大，都可以較好在 HDR 顯示器以 HDR 效果顯示 PQ HDR 視頻內容，但如果視頻是 mp4 封裝，則由于 mp4 一般不寫入色彩空間到封裝容器，Edge 存在 PQ 視頻顯示異常的問題，這一輪 Chromium 勝。

接著我們測試 macOS，在 macOS 播放 HDR 視頻，無需任何設置，因為其支持 EDR 功能，我們選擇支持 HDR 的 XDR 顯示器 Mac (新款 M1 Pro / Max Macbook Pro）進行測試，正確顯示 HDR 視頻無需任何設置。（注：如果需要為外置顯示器強制啟用 HDR，需要使用支持的顯示器并在設置-顯示器面板開啟“高動態范圍”選項）

（左圖：Safari 15.3 macOS，右圖: Chromium 104 macOS）

可以看到在 macOS 平臺，Safari 對 PQ HDR 視頻的 Tone Mapping 處理的很棒，Chromium 104 也同樣不錯，因此二者顯示效果完全相同（由于 macOS 是默認 EDR，無需額外設置），這一輪打平。

HLG SDR 顯示器測試

（左圖：Edge 102 Windows，右圖: Chromium 104 Windows）

可以看到 Edge 在處理 HLG 視頻時，如果色彩空間沒有寫入封裝容器，則無法正確讀取色彩空間，導致存在偏色問題，而 Chromium 目前會通過容器讀取色彩空間，如果不存在，則繼續從 SPS 讀取色彩空間，這可以保證所有 HLG 視頻均可正確 Tone Mapping，這一輪 Chromium 勝。

（左圖：Safari 15.3 macOS，右圖: Chromium 104 macOS）

在 macOS 平臺，Safari 的對 PQ HDR 視頻的 Tone Mapping 處理的很棒，Chromium 104 也同樣不錯，二者效果完全相同，而且由于 macOS 支持 EDR（簡介），即使使用 SDR 顯示器，其處理效果相比 Windows 平臺更佳（Mac 不會強制壓高光），兩款瀏覽器這一輪打平。

HLG HDR 顯示器測試

（容器未寫入色彩空間情況，左圖：Edge 102 Windows，右圖: Chromium 104 Windows）

（容器寫入色彩空間情況，左圖：Edge 102 Windows，右圖: Chromium 104 Windows）

可以看到 Edge 在顯示 HLG 視頻時并未激活 HDR 輸出，而 Chromium 可完美 HDR 輸出（肉眼效果和截圖不一致，截圖比較亮，肉眼顯示是正常的），同時，即使視頻容器有寫入色彩空間，Edge 處理后的視頻存在過曝問題，這一輪 Chromium 完勝。

（左圖：Safari 15.3 macOS，右圖: Chromium 104 macOS）

在 macOS 平臺，Safari 完全支持 HLG HDR 視頻，Chromium 104 也不錯，二者效果完全相同，均可良好支持 HLG HDR 在 HDR 顯示器以 HDR 格式完美顯示，這一輪打平。

小結

根據上述測試結果，總結如下：

PQ (SDR Display)PQ (HDR Display)HLG (SDR Display)HLG (HDR Display)Chromium macOS?(EDR)??(EDR)?Safari macOS?(EDR)??(EDR)?Chromium Windows????Edge Windows?PartialPartial?

在 macOS 平臺，Safari 和 Chromium 的 HDR 表現均良好，且支持 EDR。

在 Windows 平臺，如果你想觀看 HDR 內容，沒有別的選擇，Chromium 是唯一完全支持 HDR 的瀏覽器。

Rext Profile 測試

在 Windows 平臺，Edge 與 Windows 均不支持 HEVC Rext，這是因為 DXVA 規范并沒有制定除 Main / Main10 以外的 Profile（盡管 Intel 和 NVIDIA 后期自己實現了 422 444 Rext 的硬解，但這不在規范里）。

（左圖：Safari 15.3 macOS + Intel Mac，右圖: Chromium 104 macOS）

（左圖：Safari 15.3 macOS + M1 Mac，右圖: Chromium 104 macOS）

如圖所示，在 macOS 平臺，Safari 不完全支持 HEVC Rext（比如 Intel 就不支持，M1 芯片的 Mac 支持一部分），而 Chromium 104 支持硬 / 軟解 HEVC Rext (Apple Silicon 芯片支持 10bit Rext 硬解，Intel 芯片 Mac 支持 Rext 軟解)，Chromium 勝在兼容性。

8K 支持測試

在 Windows 平臺，結論是：這一輪 Chromium 和 Edge 打平，目前能找到的 8K 視頻二者都可正常播放，因此這里暫時不放截圖了。

（左圖：Safari 15.3 macOS，右圖: Chromium 104 macOS）

在 macOS 平臺，誠然，Safari 確實支持 8K，這一點可以在 B 站的 8K HEVC 視頻上驗證，但是由于其“挑格式”的小毛病，測試頁面“為數不多”的幾個 8K 測試的視頻團滅，因此這一輪 Chromium 勝。

格式兼容性測試

在 Windows 平臺，結論是：這一輪 Chromium 和 Edge 打平，目前我能找到的所有 Main / Main10 Profile 的 HEVC 視頻均可在二者正常播放，因此這里暫時不放截圖了。

（左圖：Safari 15.3 macOS，右圖: Chromium 104 macOS）

在 macOS 平臺，Safari 徹底輸了，40 個測試視頻約一半無法正常播放，這一波 Chromium 完勝。

實際場景測試

BiliBili 是目前官方支持 HEVC 的網站，通過使用 User Agent 修改插件，并模擬成 macOS 的 Safari，我們可以使 B 站優先使用 HEVC 播放視頻，且最大支持 8K。

如下圖所示，開啟硬解后的 Chromium 可以流暢播放 8K 60P 的 HEVC 視頻：

打開chrome://media-internals可以驗證，視頻是分辨率確實是7680*4320的 HEVC。

性能如何？

為此我找了一臺使用 HD620 GPU 的 Lenovo Thinkpad T14，嘗試在 B 站播放 8K 60P 的視頻：

（Chromium 104，HD620 在播放 8K 60P 視頻時的 GPU Decode 占用率達到了 100%）

可以看到 HD620 核顯拼一拼還是可以播 8K 60P 視頻的（雖然偶爾有一點掉幀），同時也可以看到在播放 8K 視頻時的系統 CPU 占用率只有 16%，硬解帶來的性能收益非常顯著。

通過使用 User Agent 修改插件，并模擬 Edge 18.19041，我們可以解鎖 B 站的 HDR 模式，最大分辨率為 4K。我們選擇了一個支持 HDR 的視頻，并對比 Edge 和 Chromium。

（左圖：Edge 102 Windows，右圖: Chromium 104 Windows）

可以發現在 Windows 平臺，Chromium 是唯一良好支持 B 站 HDR10 的瀏覽器，而 Edge 無法正常顯示。

（左圖：Safari 15.3 macOS，右圖: Chromium 104 macOS）

可以發現在 macOS 平臺，Safari 和 Chromium 均可良好支持 B 站的 HDR10，在 HDR 支持上，二者無區別。

總結

經過上述測評想必大家應該可以看到，在 2022 年的今天，終于，Chromium 也可以完整支持硬解 HEVC 了，且相比 Safari 和 Edge，在 HDR 支持，格式兼容性，性能，平臺支持四個方面均表現良好，甚至小幅超越。

運行并驗證

Chrome Canary

訪問這里，可直接下載 Chrome Canary （https://www.google.com/chrome/canary/）進行測試（啟動參數 --enable-features=PlatformHEVCDecoderSupport ）。

Mac

在 macOS，如果需要開啟 HEVC 硬解功能，一種方式是以命令行的方式打開：

// 通過--args 傳入Switch參數
/Application/Google\ Chrome\ Canary/MacOS/Google\ Chrome\ Canary --args --enable-features=PlatformHEVCDecoderSupport

如果不喜歡命令行，也可以通過 Automator 建立啟動自動操作的方式打開。

如何驗證是否生效？打開chrome://gpu, 如出現下圖紅圈所示的字樣表示成功。

找到一個 HEVC 的視頻并播放，打開 chrome://media-internals 頁面，如視頻解碼出現VDAVideoDecoder 字樣，表示硬解成功：

打開活動監視器，搜索 VTDecoderXPCService , 播放視頻時，觀察到進程 CPU 占用率上漲也可說明硬解成功：

Windows

Windows 的 Chrome，在桌面快捷方式傳入啟動參數即可。

"C:\Users\Admin\AppData\Local\Google\Chrome SxS\Application\chrome.exe" --enable-features="PlatformHEVCDecoderSupport"

如何驗證是否生效？打開 chrome://gpu, 如出現下圖紅圈所示的字樣表示成功。

找到一個 HEVC 視頻并播放，打開 chrome://media-internals 頁面，如視頻解碼出現 D3D11VideoDecoder 字樣，表示硬解成功：

打開 Windows 任務管理器 - 性能 - GPU - Video Decode 區域，觀察播放時的使用率是否上漲，如果上漲亦可說明硬解成功，如果占比為 0% 說明硬解失敗。

在 AMD GPU 上，顯示為 “Video Codec”。

Android

方式類似，傳啟動參數即可支持，這里由于本人不使用 Android 設備，暫未貼出具體步驟。

ChromeOS

最新測試版本已原生集成于 OS，無需傳啟動參數。

預編譯版本

如果你覺得傳參很麻煩，也可訪問這里，下載無需啟動參數的預編譯版本（https://github.com/StaZhu/enable-chromium-hevc-hardware-decoding/releases）進行測試（需要 Google 服務只能用 Chrome Canary）。

穩定版本

HEVC 硬解功能目前正在實驗中，最早可能在 Chrome 105 穩定版發布。

集成到 Electron

你可能希望把這個 Feature 編譯到 CEF、Electron 等 Framework 內。如果是 Electron 20 正式版 (基于 Chromium 104，目前還是 beta 版本)，則已集成好 Mac, Windows 平臺的 HEVC 硬解功能，在啟動時執行 app.commandLine.appendSwitch('enable-features', 'PlatformHEVCDecoderSupport') 即可啟用硬解。如果是 Electron 20 以下版本，需要自己手動 CV 大法集成。

最后

可分別 Trace 這兩個 Issue（Windows：https://bugs.chromium.org/p/chromium/issues/detail?id=1286132，macOS：https://bugs.chromium.org/p/chromium/issues/detail?id=1300444）追蹤后續進度。

如果有 HEVC 視頻播放需求，不妨可以嘗試一下 Chrome Canary 版本或預編譯版本，如遇到 Bug，請在 crbug.com 提交反饋。

操屁眼的视频在线免费看,日本在线综合一区二区,久久在线观看免费视频,欧美日韩精品久久综

HEVC 的現狀

背景簡介

主流設備早已支持且廣泛使用

硬解的必要性

更低的發熱

更好的性能

總結

HEVC 解碼的方案

瀏覽器解碼現狀

Windows

macOS

瀏覽器-Edge (硬解，僅 Windows）

瀏覽器-Safari (硬解，僅 macOS）

前端解碼-WASM（軟解，任何平臺）

瀏覽器-本文方案（硬 / 軟解，Windows / macOS / Linux）

HEVC 硬解的實現原理

理解 Chromium 解碼流程

macOS 的硬解

FFMPEG 方案的嘗試

在 GPU 進程實現

VideoToolbox 簡介

添加 Supported Profile

Session 預熱與引導邏輯

理解 HEVC 的 NALU 類型

解析 SPS / PPS / VPS

計算 POC (Picture Order Count)

計算 MaxReorderCount

提取并緩存 SPS / PPS / VPS

創建解碼 Format 和 Session

提取視頻幀并解碼

檢測視頻參數是否發生變化

設置輸出視頻幀的目標像素格式

總結

Windows 的硬解

Media Foundation 方案的嘗試

使用 D3D11VA 硬解

GPU 是否支持硬解檢測

理解 DXVA HEVC Spec

填充默認 Picture Params

從 SPS 等位置提取 Picture Params

處理分辨率，色彩深度的突變

處理非 HEVC Main / Main10 的其他 Profile

處理色彩空間提取邏輯

總結

與 Edge / Safari 的對比與測試

HDR 測試

PQ SDR 顯示器測試

PQ HDR 顯示器測試

HLG SDR 顯示器測試

HLG HDR 顯示器測試

小結

Rext Profile 測試

8K 支持測試

格式兼容性測試

實際場景測試

總結

運行并驗證

Chrome Canary

Mac

Windows

Android

ChromeOS

預編譯版本

穩定版本

集成到 Electron

最后