在Roboflow RF-DETR项目中调用TensorRT模型进行推理测试的方法

2025-07-06 00:01:40作者：牧宁李

TensorRT是NVIDIA推出的高性能深度学习推理优化器，能够显著提升模型在NVIDIA GPU上的推理速度。本文将详细介绍如何在Roboflow RF-DETR项目中使用TensorRT转换后的模型进行推理测试。

TensorRT模型推理流程概述

TensorRT模型的推理流程通常包含以下几个关键步骤：

模型加载：将转换好的TensorRT引擎文件加载到内存中
输入预处理：将原始输入数据转换为模型所需的格式
推理执行：在GPU上运行模型推理
输出后处理：将模型输出转换为可读的结果

Python实现方案

在Python环境中调用TensorRT模型进行推理，可以按照以下步骤实现：

1. 环境准备

首先确保已安装必要的依赖库：

TensorRT运行时库
PyCUDA（用于GPU内存管理）
OpenCV或其他图像处理库（用于输入预处理）

2. 模型加载

import tensorrt as trt

def load_engine(engine_file_path):
    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

3. 创建执行上下文

def create_execution_context(engine):
    context = engine.create_execution_context()
    return context

4. 内存分配

import pycuda.driver as cuda
import pycuda.autoinit

def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        
        bindings.append(int(device_mem))
        
        if engine.binding_is_input(binding):
            inputs.append({'host': host_mem, 'device': device_mem})
        else:
            outputs.append({'host': host_mem, 'device': device_mem})
    
    return inputs, outputs, bindings, stream

5. 输入预处理

对于RF-DETR这样的目标检测模型，输入预处理通常包括：

图像尺寸调整
归一化处理
通道顺序调整（BGR到RGB）
数据格式转换（HWC到CHW）

import cv2
import numpy as np

def preprocess_image(image_path, input_shape):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = cv2.resize(image, (input_shape[2], input_shape[1]))
    image = image.astype(np.float32) / 255.0
    image = np.transpose(image, [2, 0, 1])  # HWC to CHW
    image = np.expand_dims(image, axis=0)   # Add batch dimension
    return image

6. 执行推理

def inference(context, bindings, inputs, outputs, stream):
    # Transfer input data to the GPU
    cuda.memcpy_htod_async(inputs[0]['device'], inputs[0]['host'], stream)
    
    # Run inference
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
    
    # Transfer predictions back from the GPU
    cuda.memcpy_dtoh_async(outputs[0]['host'], outputs[0]['device'], stream)
    
    # Synchronize the stream
    stream.synchronize()
    
    return outputs[0]['host']

7. 输出后处理

对于RF-DETR模型的输出，需要进行以下处理：

解析边界框坐标
应用非极大值抑制(NMS)
过滤低置信度检测结果

def postprocess(output, confidence_threshold=0.5, iou_threshold=0.4):
    # 解析模型输出
    boxes = output[0]  # 假设输出格式为[N, 6]，其中每行包含[x1,y1,x2,y2,score,class]
    
    # 过滤低置信度检测
    keep = boxes[:,4] > confidence_threshold
    boxes = boxes[keep]
    
    # 应用NMS
    indices = cv2.dnn.NMSBoxes(
        boxes[:,:4].tolist(),
        boxes[:,4].tolist(),
        confidence_threshold,
        iou_threshold
    )
    
    if len(indices) > 0:
        return boxes[indices.flatten()]
    return []

完整推理流程示例

def run_inference(engine_path, image_path, input_shape):
    # 1. 加载引擎
    engine = load_engine(engine_path)
    
    # 2. 创建上下文
    context = create_execution_context(engine)
    
    # 3. 分配内存
    inputs, outputs, bindings, stream = allocate_buffers(engine)
    
    # 4. 预处理输入图像
    input_data = preprocess_image(image_path, input_shape)
    np.copyto(inputs[0]['host'], input_data.ravel())
    
    # 5. 执行推理
    output = inference(context, bindings, inputs, outputs, stream)
    
    # 6. 后处理
    detections = postprocess(output)
    
    return detections