CUA支持的模型大全：从Claude到UI-TARS的完整列表

2026-02-04 05:13:39作者：江焘钦

还在为选择合适的计算机使用（Computer-Use）模型而烦恼？CUA框架提供了业界最全面的模型支持体系，从云端大厂到本地开源模型，从全功能智能体到专业点击预测模型，一应俱全。本文将为你详细解析CUA支持的所有模型类型、使用场景和配置方法。

📋 模型分类总览

CUA支持的模型可分为四大类，每类都有其独特优势和适用场景：

模型类型	核心能力	典型模型	适用场景
全功能智能体	自主任务规划+执行	Claude系列、OpenAI CUA	复杂多步任务自动化
统一视觉语言模型	端到端视觉理解	UI-TARS、GLM-4.5V	视觉密集型任务
组合式智能体	规划+执行分离	GTA1+LLM、OmniParser+LLM	成本优化+高精度
专业点击预测	精准坐标定位	GTA1、专业grounding模型	UI元素精确定位

🚀 全功能计算机使用智能体

Anthropic Claude系列

Claude模型提供业界领先的计算机使用能力，支持完整的自主任务执行：

# Claude 4.1系列 - 最新最强版本
model="anthropic/claude-opus-4-1-20250805"

# Claude 4系列 - 稳定高性能版本  
model="anthropic/claude-opus-4-20250514"
model="anthropic/claude-sonnet-4-20250514"

# Claude 3.7系列 - 性价比优选
model="anthropic/claude-3-7-sonnet-20250219"

# Claude 3.5系列 - 经典可靠版本
model="anthropic/claude-3-5-sonnet-20241022"

使用示例：

from agent import ComputerAgent
from computer import Computer

async with Computer(os_type="linux") as computer:
    agent = ComputerAgent(
        model="anthropic/claude-3-5-sonnet-20241022",
        tools=[computer],
        max_trajectory_budget=5.0
    )
    
    # 复杂多步任务自动化
    async for result in agent.run("打开Firefox浏览器，访问github.com，搜索'computer-use'项目"):
        # 处理执行结果
        pass

OpenAI计算机使用预览版

OpenAI提供的计算机使用预览模型，具备强大的视觉理解和操作能力：

model="openai/computer-use-preview"

🎯 统一视觉语言模型

UI-TARS 1.5系列

字节跳动开源的统一视觉语言模型，专为计算机使用场景优化：

# Hugging Face本地部署
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"

# TGI服务端点（需要TGI服务）
model="huggingface/ByteDance-Seed/UI-TARS-1.5-7B"

# MLX优化版本（Apple Silicon）
model="mlx/mlx-community/UI-TARS-1.5-7B-6bit"

# Ollama部署
model="ollama_chat/0000/ui-tars-1.5-7b"

GLM-4.5V系列

智谱AI的视觉语言模型，具备优秀的计算机使用能力：

# OpenRouter服务
model="openrouter/z-ai/glm-4.5v"

# Hugging Face本地部署
model="huggingface-local/zai-org/GLM-4.5V"

🔄 组合式智能体架构

组合式智能体采用"规划模型+执行模型"的架构，实现最佳的成本效益比：

语法格式

model="grounding_model+thinking_model"

支持的规划模型（Thinking Model）

# Anthropic系列
"anthropic/claude-3-5-sonnet-20241022"
"anthropic/claude-3-opus-20240229"

# OpenAI系列  
"openai/gpt-5"
"openai/gpt-o3" 
"openai/gpt-4o"

# Google系列
"gemini/gemini-1.5-pro"
"vertex_ai/gemini-pro-vision"

# 本地模型（任何Hugging Face视觉语言模型）
"huggingface-local/your-vision-model"

支持的执行模型（Grounding Model）

# OmniParser（OCR专用）
"omniparser"

# GTA1系列（专业点击预测）
"huggingface-local/HelloKKMe/GTA1-7B"
"huggingface/HelloKKMe/GTA1-32B" 
"vllm_hosted/HelloKKMe/GTA1-72B"

# UI-TARS（统一模型）
"huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"

# 全功能模型也可作为执行组件
"claude-3-5-sonnet-20241022"
"openai/computer-use-preview"

组合示例

# GTA1专业点击 + GPT-5强大规划
model="huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5"

# GTA1 + Claude 3.5 Sonnet（性价比优选）
model="huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-3-5-sonnet-20241022"

# UI-TARS + GPT-4o（双视觉模型增强）
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B+openai/gpt-4o"

# OmniParser + 本地模型（完全离线方案）
model="omniparser+ollama_chat/mistral-small3.2"

🎯 专业点击预测模型

GTA1系列

专为UI元素定位优化的专业模型，在GUI Agent Grounding Leaderboard上表现优异：

# 7B版本（轻量高效）
model="huggingface-local/HelloKKMe/GTA1-7B"

# 32B版本（更高精度）
model="huggingface/HelloKKMe/GTA1-32B"

# 72B版本（顶级性能）
model="vllm_hosted/HelloKKMe/GTA1-72B"

专业点击预测使用：

agent = ComputerAgent("huggingface-local/HelloKKMe/GTA1-7B", tools=[computer])

# 精准定位UI元素坐标
login_coords = agent.predict_click("定位登录按钮")
search_coords = agent.predict_click("找到搜索输入框") 
menu_coords = agent.predict_click("识别汉堡菜单图标")

print(f"登录按钮坐标: {login_coords}")
print(f"搜索框坐标: {search_coords}") 
print(f"菜单图标坐标: {menu_coords}")

🏠 本地部署方案

Hugging Face Transformers

# 任何Hugging Face模型
model="huggingface-local/模型名称"

MLX（Apple Silicon优化）

# MLX社区优化版本
model="mlx/mlx-community/模型名称"

Ollama

# Ollama本地模型
model="ollama_chat/模型名称"

📊 模型选择指南

根据任务复杂度选择

flowchart TD
    A[任务类型] --> B{简单点击任务}
    A --> C{中等复杂度任务}
    A --> D{复杂多步任务}
    
    B --> E[专业点击模型<br/>GTA1系列]
    C --> F[统一视觉模型<br/>UI-TARS/GLM-4.5V]
    D --> G[全功能智能体<br/>Claude/OpenAI CUA]
    
    E --> H[成本: $<br/>精度: ⭐⭐⭐⭐⭐]
    F --> I[成本: $$<br/>能力: ⭐⭐⭐⭐]
    G --> J[成本: $$$<br/>能力: ⭐⭐⭐⭐⭐]

根据部署环境选择

环境需求	推荐模型	优势
完全离线	UI-TARS本地版 + Ollama	零网络依赖，数据安全
成本敏感	GTA1+轻量LLM	专业点击+廉价规划
高性能	Claude 4.1系列	最强能力，响应最快
开发测试	OpenAI CUA预览版	快速原型验证

根据精度要求选择

pie title 点击预测精度对比
    "GTA1-72B" : 35
    "GTA1-32B" : 30
    "GTA1-7B" : 25
    "UI-TARS" : 10

⚙️ 安装配置指南

基础安装

# 全功能安装
pip install "cua-agent[all]"

# 按需安装
pip install "cua-agent[openai]"        # OpenAI支持
pip install "cua-agent[anthropic]"     # Anthropic支持  
pip install "cua-agent[omni]"          # OmniParser支持
pip install "cua-agent[uitars]"        # UI-TARS支持
pip install "cua-agent[uitars-mlx]"    # UI-TARS + MLX支持
pip install "cua-agent[uitars-hf]"     # UI-TARS + Huggingface支持
pip install "cua-agent[glm45v-hf]"     # GLM-4.5V支持

环境变量配置

# API密钥配置
export ANTHROPIC_API_KEY="your-anthropic-key"
export OPENAI_API_KEY="your-openai-key"
export OPENROUTER_API_KEY="your-openrouter-key"

# 计算机实例配置
export CUA_CONTAINER_NAME="your-container-name"
export CUA_API_KEY="your-cua-api-key"

🎯 最佳实践示例

企业级自动化流程

# 使用Claude 4.1处理复杂业务流程
agent = ComputerAgent(
    model="anthropic/claude-opus-4-1-20250805",
    tools=[computer],
    callbacks=[BudgetManagerCallback(max_budget=50.0)],
    trajectory_dir="enterprise_automation"
)

# 自动化财务报销流程
task = """
1. 登录公司管理系统
2. 进入报销申请模块
3. 填写报销信息：交通费200元，餐饮费150元
4. 上传发票附件
5. 提交审批流程
6. 确认提交成功并截图保存
"""

async for result in agent.run(task):
    # 监控执行过程并记录
    log_trajectory(result)

高精度UI测试自动化

# 使用GTA1专业模型进行精准UI测试
agent = ComputerAgent(
    model="huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-3-5-sonnet-20241022",
    tools=[computer]
)

# 精确测试登录流程
test_cases = [
    "定位用户名输入框并输入'testuser'",
    "定位密码输入框并输入'password123'", 
    "定位登录按钮并点击",
    "验证登录成功后的欢迎页面"
]

for test_case in test_cases:
    coords = agent.predict_click(test_case)
    execute_click(coords)
    validate_result()