UltimaScraper项目中的平台API数据解析问题分析与解决方案

2025-06-15 00:38:13作者：贡沫苏Truman

问题背景

UltimaScraper是一个用于内容抓取的开源工具，近期在使用过程中出现了KeyError: 'source'的错误。这个错误源于内容平台对其API数据结构进行了调整，导致原有的解析逻辑失效。

错误原因分析

原始代码在解析平台返回的媒体数据时，假设数据结构中包含"source"键，但平台更新后，媒体数据的关键字段发生了变化：

原有的"source"字段被移除
媒体URL现在存储在"files.full.url"路径下
预览URL的位置也发生了变化

这种API结构的变更导致解析器无法正确获取媒体URL，从而抛出KeyError异常。

解决方案实现

针对这一变化，我们需要修改UltimaScraper中处理平台API响应的核心代码。主要修改点在ultima_scraper_api/apis/content_platform/init.py文件中的SiteContent类。

URL解析器修改

原url_picker方法假设媒体数据中有"source"键，现需要调整为从"files.full"获取URL：

def url_picker(self, media_item: dict[str, Any], video_quality: str = ""):
    authed = self.get_author().get_authed()
    video_quality = (
        video_quality or self.author.get_api().get_site_settings().video_quality
    )
    if not media_item["canView"]:
        return
    source: dict[str, Any] = {}
    media_type: str = ""
    if "files" in media_item:
        media_type = media_item["type"]
        media_item = media_item["files"]
        source = media_item["full"]
    else:
        return
    url = source.get("url")
    return urlparse(url) if url else None

预览URL解析器修改

同样地，preview_url_picker方法也需要相应调整：

def preview_url_picker(self, media_item: dict[str, Any]):
    preview_url = None
    if "files" in media_item:
        if (
            "preview" in media_item["files"]
            and "url" in media_item["files"]["full"]
        ):
            preview_url = media_item["files"]["full"]["url"]
    else:
        preview_url = media_item["full"]
        return urlparse(preview_url) if preview_url else None

部署注意事项

文件位置：修改的文件位于虚拟环境的site-packages目录下，具体路径根据Python版本和虚拟环境位置可能有所不同。
Docker部署：如果使用Docker部署，需要在构建镜像时将修改后的文件复制到正确位置：

COPY .venv/lib/python3.10/site-packages/ultima_scraper_api/apis/content_platform/__init__.py /usr/src/app/.venv/lib/python3.10/site-packages/ultima_scraper_api/apis/content_platform/__init__.py