LibreChat项目中语音转文字功能的技术问题与解决方案

2025-05-07 08:37:42作者：龚格成

Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active.

项目地址：https://gitcode.com/GitHub_Trending/li/LibreChat

背景介绍

LibreChat是一款开源的聊天应用，最近在语音转文字(STT)功能上遇到了兼容性问题。随着OpenAI发布了新的转录模型gpt-4o-transcribe和gpt-4o-mini-transcribe，开发团队发现这些新模型对音频格式的要求比之前的whisper-1模型更为严格。

问题分析

问题的核心在于浏览器录音格式与文件扩展名不匹配。在Chrome浏览器中，默认使用audio/webm格式录音，但在保存时却被标记为.wav文件。而Safari浏览器则默认使用audio/mp4格式。

whisper-1模型对这种格式不匹配较为宽容，能够正常处理。但新的gpt-4o-transcribe和gpt-4o-mini-transcribe模型则会严格检查音频格式，当发现格式不匹配时会返回错误：

{
  "error": {
    "message": "This model does not support the format you provided.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "unsupported_format"
  }
}

技术细节

问题的根源在于useSpeechToTextExternal.ts文件中的处理逻辑：

浏览器使用MediaRecorderAPI进行录音，不同浏览器支持的默认格式不同
录音完成后，代码将所有音频片段合并为一个Blob对象，并强制指定为audio/wav类型
创建FormData时，文件被命名为audio.wav，而实际内容可能是webm或mp4格式

这种强制转换导致了格式与扩展名的不一致，新模型无法正确处理这种不一致的情况。

解决方案

开发团队实施了以下改进措施：

动态检测最佳MIME类型：通过检测浏览器支持的音频格式，选择最合适的MIME类型

function getBestMimeType() {
    const types = [
        'audio/webm',
        'audio/webm;codecs=opus',
        'audio/mp4',
        'audio/ogg;codecs=opus',
        'audio/ogg',
        'audio/wav'
    ];
    
    for (const type of types) {
        if (MediaRecorder.isTypeSupported(type)) {
            return type;
        }
    }
    
    // 浏览器特定回退
    const ua = navigator.userAgent.toLowerCase();
    if (ua.indexOf('safari') !== -1 && ua.indexOf('chrome') === -1) {
        return 'audio/mp4';
    } else if (ua.indexOf('firefox') !== -1) {
        return 'audio/ogg';
    } else {
        return 'audio/webm';
    }
}

根据MIME类型确定文件扩展名：确保文件扩展名与实际格式匹配

function getFileExtension(mimeType) {
    if (mimeType.includes('mp4')) {
        return 'm4a';
    } else if (mimeType.includes('ogg')) {
        return 'ogg';
    } else if (mimeType.includes('wav')) {
        return 'wav';
    } else {
        return 'webm';
    }
}