GraphRAG项目中的JSON解析问题分析与解决方案

2025-05-08 08:42:48作者：姚月梅Lane

问题背景

在GraphRAG项目中，用户在使用Local Search Response和Global Search功能时遇到了两个关键问题：

Local Search响应为空字符串：在local_search/search.py中，LLM生成的response返回了空字符串，尽管输入的search_messages内容是正确的。
Global Search的JSON解析错误：在global_search/search.py中，search_response为空字符串导致后续JSON解析失败，抛出json.decoder.JSONDecodeError异常。

技术分析

问题根源

这两个问题的共同点在于LLM接口返回的数据格式不符合预期。具体表现为：

响应内容为空字符串，可能是由于：
- LLM服务端配置问题
- 请求参数不匹配
- 模型不支持特定格式输出
JSON解析失败，主要原因是：
- 返回内容包含非法JSON字符
- 返回内容被Markdown格式包裹
- 转义字符处理不当

解决方案实现

针对JSON解析问题，可以通过修改graphrag/llm/openai/utils.py文件中的相关函数来解决：

def try_parse_json_object(input: str) -> dict:
    """JSON字符串解析增强函数"""
    try:
        clean_json = clean_up_json(input)
        result = json.loads(clean_json)
    except json.JSONDecodeError:
        log.exception("error loading json, json=%s", input)
        raise
    else:
        if not isinstance(result, dict):
            raise TypeError
        return result

def clean_up_json(json_str: str) -> str:
    """JSON字符串清理函数"""
    json_str = (
        json_str.replace("\\n", "")
        .replace("\n", "")
        .replace("\r", "")
        .replace('"[{', "[{")
        .replace('}]"', "}]")
        .replace("\\", "")
        .replace("{{", "{")
        .replace("}}", "}")
        .strip()
    )
    
    # 移除JSON Markdown包装
    if json_str.startswith("```json"):
        json_str = json_str[len("```json"):]
    if json_str.endswith("```"):
        json_str = json_str[: len(json_str) - len("```")]
    return json_str