InvokeAI项目中Flux渲染在MPS设备上的数据类型问题分析

2025-05-07 18:00:25作者：管翌锬

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.

项目地址：https://gitcode.com/GitHub_Trending/in/InvokeAI

问题背景

在InvokeAI项目的v5.4.3rc2版本中，当用户尝试在Apple Silicon MPS设备上使用Flux渲染功能时，系统会抛出数据类型不匹配的错误。具体表现为在运行Flux Denoise过程中，系统期望查询(query)、键(key)和值(value)具有相同的数据类型，但实际上却出现了float和BFloat16两种不同数据类型混合使用的情况。

错误现象

系统日志显示的错误信息明确指出：

Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::BFloat16 instead.

这个错误发生在注意力机制计算过程中，具体是在调用torch.nn.functional.scaled_dot_product_attention函数时触发的。错误堆栈显示问题起源于Flux模型的自定义双流块处理器(CustomDoubleStreamBlockProcessor)中的前向传播过程。

技术分析

深入分析代码后发现，问题根源在于apply_rope函数的实现。该函数负责应用旋转位置编码(RoPE)，但在处理张量类型时存在问题。当前版本的实现中，函数返回的张量没有保持与输入相同的类型，导致后续计算中出现数据类型不一致的情况。

在PyTorch的MPS后端实现中，数据类型一致性要求更为严格。特别是在使用Apple Silicon的Metal Performance Shaders(MPS)时，混合精度计算需要显式处理数据类型转换。

解决方案

修复方案相对简单：在apply_rope函数中，确保输出张量与输入张量保持相同的数据类型。具体修改如下：

def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor) -> tuple[Tensor, Tensor]:
    xq_ = xq.view(*xq.shape[:-1], -1, 1, 2)
    xk_ = xk.view(*xk.shape[:-1], -1, 1, 2)
    xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]
    xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]
    return xq_out.view(*xq.shape).type_as(xq), xk_out.view(*xk.shape).type_as(xk)