使用Google Gemini Python SDK处理PDF文件的完整指南

2025-05-18 16:33:18作者：何举烈Damon

Google Gemini作为新一代多模态大模型，在处理PDF文档方面展现了强大的能力。本文将详细介绍如何通过Python SDK向Gemini模型传递PDF文件，并解析两种不同API路径的选择与实现方法。

Gemini API与Vertex AI API的区别

Google提供了两套Python SDK来访问Gemini模型，分别针对不同使用场景：

Gemini API SDK (google-generativeai)：面向所有开发者，仅需API密钥即可使用，无需Google Cloud账号
Vertex AI SDK (google-cloud-aiplatform)：专为Google Cloud Platform用户设计，深度集成GCP服务

选择建议：若项目已部署在GCP环境中，推荐使用Vertex AI SDK；若仅需快速接入Gemini能力，Gemini API SDK更为简便。

通过Vertex AI处理PDF文件

Vertex AI提供了直接处理云存储中PDF文件的能力：

import vertexai
from vertexai.generative_models import GenerativeModel, Part

# 初始化配置
vertexai.init(project="YOUR_PROJECT_ID", location="YOUR_LOCATION")
        
# 选择模型
model = GenerativeModel("gemini-1.5-pro-preview-0409")

# 构建PDF文件路径
gcs_path = "gs://YOUR_BUCKET_NAME/FILE_NAME.pdf"

# 发送请求
response = model.generate_content([
    Part.from_uri(gcs_path, mime_type="application/pdf"),
    "请总结这份文档的主要内容"
])

关键点说明：

文件需先上传至Google Cloud Storage
使用Part.from_uri方法指定文件URI和MIME类型
需要配置GCP认证环境

通过Gemini API处理PDF文件

Gemini API提供了文件上传接口，但需注意PDF支持情况：

import google.generativeai as genai

# 配置API密钥
genai.configure(api_key="YOUR_API_KEY")

# 上传文件
uploaded_file = genai.upload_file(
    path="/path/to/file.pdf",
    display_name="示例PDF"
)

# 创建模型实例
model = genai.GenerativeModel("models/gemini-1.5-pro-latest")

# 生成内容
try:
    response = model.generate_content([
        "请分析这份PDF",
        uploaded_file
    ])
finally:
    # 清理上传的文件
    genai.delete_file(uploaded_file.name)