AI结构化输出实战：用AgentScope打造电商商品信息提取系统

2026-04-21 11:18:45作者：魏侃纯Zoe

在电商数据处理中，AI模型返回的商品信息往往格式混乱，导致数据清洗耗时费力。如何解决AI输出格式混乱问题？AgentScope的AI结构化输出功能通过Pydantic模型应用，让非结构化文本秒变标准化数据，彻底改变开发流程。本文将以电商商品信息提取为例，展示AgentScope开发指南的实战应用。

🔍 问题诊断：电商数据提取的三大痛点

AI生成的商品信息常以自由文本形式返回，给后续处理带来诸多挑战：

格式混乱：价格、库存、规格等关键信息散落文本中，缺乏统一结构
类型错误：价格被识别为字符串而非数字，库存数量出现非数值字符
验证缺失：商品分类未按预设类别返回，属性值超出合理范围

这些问题导致开发者不得不编写大量异常处理代码，数据处理效率低下且易错。

💡 解决方案：AgentScope结构化输出核心技术

AgentScope通过三大技术组件解决上述痛点：

Pydantic模型定义

使用Python数据验证最佳实践，通过强类型模型定义商品数据结构：

from pydantic import BaseModel, Field
from typing import Optional, List

class ProductModel(BaseModel):
    """电商商品信息结构化模型"""
    name: str = Field(description="商品名称")
    price: float = Field(description="商品价格，保留两位小数", gt=0)
    category: Literal["electronics", "clothing", "home", "beauty"] = Field(description="商品分类")
    stock: int = Field(description="库存数量", ge=0)
    tags: List[str] = Field(description="商品标签列表")
    description: Optional[str] = Field(description="商品描述")

智能格式化引擎

自动将模型结构转换为AI可理解的提示词，确保输出符合预期格式：

from agentscope.agent import ReActAgent
from agentscope.model import DashScopeChatModel
from agentscope.formatter import DashScopeChatFormatter

agent = ReActAgent(
    name="ProductExtractor",
    sys_prompt="你是专业的电商数据提取助手",
    model=DashScopeChatModel(
        api_key=os.environ.get("DASHSCOPE_API_KEY"),
        model_name="qwen-max",
    ),
    formatter=DashScopeChatFormatter(),
)

数据验证机制

自动校验AI输出是否符合模型约束，对异常值提供友好错误提示：

# 自动验证失败时会抛出 ValidationError
try:
    product = ProductModel(**ai_response)
except ValidationError as e:
    print("数据验证失败:", e)

🚀 实践指南：电商商品信息提取完整流程

环境准备

克隆项目仓库

git clone https://gitcode.com/GitHub_Trending/ag/agentscope
cd agentscope

安装依赖
```
pip install -e .
```

设置API密钥

export DASHSCOPE_API_KEY="your_api_key_here"

完整实现代码

import os
from pydantic import BaseModel, Field, ValidationError
from typing import Literal, Optional, List
from agentscope.agent import ReActAgent
from agentscope.model import DashScopeChatModel
from agentscope.formatter import DashScopeChatFormatter
from agentscope.message import Msg

# 1. 定义商品数据模型
class ProductModel(BaseModel):
    name: str = Field(description="商品名称")
    price: float = Field(description="商品价格", gt=0)  # 确保价格为正数
    category: Literal["electronics", "clothing", "home", "beauty"] = Field(description="商品分类")
    stock: int = Field(description="库存数量", ge=0)  # 确保库存不为负
    tags: List[str] = Field(description="商品标签列表")
    description: Optional[str] = Field(description="商品描述")

# 2. 创建结构化输出Agent
agent = ReActAgent(
    name="ProductExtractor",
    sys_prompt="你是专业的电商数据提取助手，需要从商品描述中提取结构化信息",
    model=DashScopeChatModel(
        api_key=os.environ.get("DASHSCOPE_API_KEY"),
        model_name="qwen-max",
    ),
    formatter=DashScopeChatFormatter(),
)

# 3. 处理商品描述并提取信息
async def extract_product_info(description: str) -> ProductModel:
    query_msg = Msg(
        "user",
        f"请从以下商品描述中提取信息：{description}",
        "user",
    )
    
    # 指定结构化模型，获取AI响应
    response = await agent(query_msg, structured_model=ProductModel)
    
    try:
        # 验证并返回结构化数据
        return ProductModel(**response)
    except ValidationError as e:
        print(f"数据验证失败: {e}")
        return None

# 4. 运行示例
if __name__ == "__main__":
    import asyncio
    
    product_description = """
    【新品上市】Apple AirPods Pro 2代无线蓝牙耳机，支持主动降噪，
    续航可达30小时，防水防汗设计，现价1999元，库存仅剩25件。
    适合运动、办公等多种场景，提供白色和星空灰色选择。
    """
    
    result = asyncio.run(extract_product_info(product_description))
    if result:
        print("提取结果:")
        print(result.json(indent=2))

预期输出

{
  "name": "Apple AirPods Pro 2代无线蓝牙耳机",
  "price": 1999.0,
  "category": "electronics",
  "stock": 25,
  "tags": ["新品", "主动降噪", "防水防汗", "无线蓝牙"],
  "description": "支持主动降噪，续航可达30小时，适合运动、办公等多种场景，提供白色和星空灰色选择"
}