NVIDIA DALI 中实现批次内特定图像组的水平翻转控制

2025-06-07 06:21:57作者：胡唯隽

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

项目地址：https://gitcode.com/gh_mirrors/da/DALI

概述

在深度学习数据增强过程中，有时需要对批次中的特定图像组应用相同的变换操作。本文将介绍如何使用NVIDIA DALI（Data Loading Library）实现批次内特定图像组的水平翻转控制，特别是针对视频帧序列等需要保持组内一致性的场景。

问题背景

在视频处理或序列图像处理中，我们经常需要将一个视频片段的多帧图像作为一个组进行处理。当应用数据增强（如水平翻转）时，通常需要确保同一视频片段的所有帧都应用相同的变换，以保持时间连续性。然而，DALI默认的随机翻转操作是针对单张图像独立进行的。

解决方案

通过结合DALI的random.coin_flip和permute_batch操作，可以实现批次内特定图像组的统一翻转控制。以下是关键实现步骤：

创建分组索引：首先需要为批次中的图像创建分组标识。例如，对于24张图像的批次，每8张为一组，可以创建索引数组[0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,...]。
生成随机翻转决策：使用fn.random.coin_flip生成随机翻转决策，这个决策默认会对每张图像独立进行。
统一组内决策：通过fn.permute_batch将随机翻转决策按照分组索引进行排列，确保同一组内的所有图像使用相同的翻转决策。

代码实现

@pipeline_def(batch_size=24, enable_conditionals=True)
def VideoPipe(total_picture, file_list):
    # 文件读取和初始处理
    input = fn.readers.file(file_list=file_list, random_shuffle=False)
    shapes = fn.peek_image_shape(input[0])
    
    # 创建分组索引（每8张为一组）
    num_clips = total_picture // 8
    indices = np.concatenate([i * np.ones(8, dtype=int) for i in range(num_clips)])
    indices = indices.tolist()

    # 随机裁剪处理
    crop_anchor, crop_shape = fn.random_crop_generator(shapes, random_area=[0.2, 1.0])
    crop_anchor = fn.permute_batch(crop_anchor, indices=indices)
    crop_shape = fn.permute_batch(crop_shape, indices=indices)

    # 图像解码和预处理
    images = fn.decoders.image_slice(input[0], crop_anchor, crop_shape, device="mixed", axis_names="HW")
    images = fn.resize(images, resize_x=300, resize_y=300, device="gpu")
    frames = fn.transpose(images, perm=[2, 0, 1])
    
    # 随机裁剪
    gc_frame1 = fn.random_resized_crop(frames, size=224, device="gpu", random_area=[0.4, 1.0])
    
    # 关键步骤：统一组内翻转决策
    coin = fn.random.coin_flip(probability=0.5)
    coin = fn.permute_batch(coin, indices=indices)  # 确保同一组使用相同的翻转决策
    
    # 条件翻转
    if coin:
        gc_frame1 = fn.flip(gc_frame1, horizontal=1, device="gpu")
    else:
        gc_frame1 = gc_frame1
    
    return gc_frame1

技术要点

分组索引创建：通过NumPy创建重复的分组索引，确保同一组的图像具有相同的索引值。
决策统一化：permute_batch操作将随机生成的翻转决策按照分组索引重新排列，使得同一组内的所有图像使用相同的决策结果。
条件执行：DALI的@pipeline_def装饰器中的enable_conditionals=True参数启用了条件语句支持，使得可以根据翻转决策选择性地应用变换。