YOLOv5多线程推理中的张量尺寸匹配问题解析

2025-05-01 10:34:59作者：宣聪麟

在使用YOLOv5进行多线程图像检测时，开发者经常会遇到"RuntimeError: The size of tensor a (24) must match the size of tensor b (20) at non-singleton dimension 2"这样的错误。这个问题看似简单，实则涉及深度学习模型推理中的多个关键技术点。

问题本质分析

这个错误的核心是张量尺寸不匹配，具体发生在模型推理过程中的某个计算层。当使用多线程并发处理图像时，不同线程可能同时向模型输入不同尺寸的张量，导致计算过程中出现维度不一致的情况。

根本原因

输入尺寸不一致：虽然开发者可能已经将图像统一缩放到640像素宽度，但高度可能因原始图像比例不同而变化，导致实际输入模型的张量尺寸不一致。
多线程资源共享：当多个线程共享同一个模型实例时，模型内部的计算图可能会被不同尺寸的输入交叉干扰。
预处理不一致：不同线程可能对图像进行了不同的预处理操作，如填充(padding)或裁剪方式不同。

解决方案

输入尺寸标准化

确保所有输入图像在进入模型前都经过完全一致的预处理流程：

def preprocess_image(image):
    # 统一缩放至640x640，保持比例的同时进行适当填充
    h, w = image.shape[:2]
    scale = min(640 / h, 640 / w)
    new_h, new_w = int(h * scale), int(w * scale)
    resized = cv2.resize(image, (new_w, new_h))
    
    # 创建640x640的黑色背景
    padded = np.zeros((640, 640, 3), dtype=np.uint8)
    # 将缩放后的图像放置在中心
    top = (640 - new_h) // 2
    left = (640 - new_w) // 2
    padded[top:top+new_h, left:left+new_w] = resized
    
    return padded

线程隔离技术

使用线程本地存储(Thread Local Storage)为每个线程创建独立的模型实例：

import threading

class Detector:
    def __init__(self):
        self.local = threading.local()
    
    def get_model(self):
        if not hasattr(self.local, "model"):
            self.local.model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
        return self.local.model
    
    def detect(self, image):
        model = self.get_model()
        return model(image)

批处理优化

如果硬件条件允许，可以考虑将多个图像组合成一个批次进行推理，而不是使用多线程：

def batch_detect(images):
    # 预处理所有图像
    processed = [preprocess_image(img) for img in images]
    # 转换为张量并堆叠成批次
    batch = torch.stack([torch.from_numpy(img) for img in processed])
    # 单次推理
    model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
    return model(batch)