Faster-Whisper 1.1.0版本中onnxruntime线程亲和性崩溃问题分析与解决方案

2025-05-14 15:05:05作者：劳婵绚Shirley

问题背景

Faster-Whisper是一个基于Whisper模型的高效语音识别工具库。在最新发布的1.1.0版本中，部分用户在使用过程中遇到了onnxruntime线程亲和性(thread affinity)相关的崩溃问题。这个问题主要出现在使用NVIDIA A40 GPU（4核CPU、48GB VRAM和16GB RAM）的环境中。

问题现象

当用户尝试使用Faster-Whisper 1.1.0版本进行语音转录时，系统会抛出以下错误：

pthread_setaffinity_np failed for thread: 785, index: 1, mask: {2, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.

随后进程会崩溃退出，返回错误代码-6（通常与内存问题相关）。

根本原因分析

这个问题源于onnxruntime在尝试设置CPU线程亲和性时的失败。具体来说：

onnxruntime默认会尝试将计算线程绑定到特定的CPU核心上（线程亲和性），以提高性能
在某些系统配置下（特别是容器化环境），这种绑定操作会失败
当线程亲和性设置失败时，onnxruntime没有正确处理这个错误，导致后续的内存访问问题

解决方案

临时解决方案

对于急需解决问题的用户，可以采用以下临时解决方案：

环境变量法：在代码开始处添加以下环境变量设置

import os
os.environ["ORT_DISABLE_CPU_AFFINITY"] = "1"
os.environ["OMP_NUM_THREADS"] = "4"
os.environ["OPENBLAS_NUM_THREADS"] = "4"
os.environ["MKL_NUM_THREADS"] = "4"
os.environ["VECLIB_MAXIMUM_THREADS"] = "4"
os.environ["NUMEXPR_NUM_THREADS"] = "4"

Monkey Patch法：修改SileroVADModel的初始化行为

import faster_whisper.vad
from faster_whisper.vad import SileroVADModel

class PatchedSileroVADModel(SileroVADModel):
    def __init__(self, encoder_path, decoder_path):
        import onnxruntime
        opts = onnxruntime.SessionOptions()
        opts.inter_op_num_threads = 1  # 设置为1最安全
        opts.intra_op_num_threads = 1  # 设置为1最安全
        opts.log_severity_level = 3
        
        self.encoder_session = onnxruntime.InferenceSession(
            encoder_path,
            providers=["CPUExecutionProvider"],
            sess_options=opts,
        )
        self.decoder_session = onnxruntime.InferenceSession(
            decoder_path,
            providers=["CPUExecutionProvider"],
            sess_options=opts,
        )

faster_whisper.vad.SileroVADModel = PatchedSileroVADModel