PocketFlow-Typescript中的MapReduce设计模式解析

2025-06-19 22:14:30作者：裘晴惠Vivianne

什么是MapReduce模式

MapReduce是一种经典的分布式计算模式，最初由Google提出，用于处理大规模数据集。在PocketFlow-Typescript项目中，MapReduce被实现为一种设计模式，用于处理以下两种情况：

输入数据量大的场景（如需要处理多个文件）
输出数据量大的场景（如需要填写多种表单）

这种模式的核心思想是将复杂任务分解为更小的、理想情况下可以独立执行的子任务。

MapReduce工作原理

MapReduce模式包含两个主要阶段：

Map阶段：使用BatchNode将大任务分解为多个小任务
Reduce阶段：将Map阶段的结果进行聚合处理

输入数据 → Map(分解) → 中间结果 → Reduce(聚合) → 最终结果

实际应用示例：文档摘要系统

让我们通过一个文档摘要系统的例子来理解PocketFlow-Typescript中MapReduce的实现。

场景描述

假设我们有一组文档文件，需要完成以下任务：

为每个文件生成单独的摘要
将所有文件的摘要合并成一个综合摘要

代码实现解析

1. 定义共享存储结构

首先定义存储中间结果和最终结果的数据结构：

type SharedStorage = {
  files?: Record<string, string>;          // 原始文件集合
  file_summaries?: Record<string, string>; // 各文件摘要
  all_files_summary?: string;              // 综合摘要
};

2. Map阶段：单个文件摘要生成

使用BatchNode实现Map阶段，为每个文件生成摘要：

class SummarizeAllFiles extends BatchNode<SharedStorage> {
  // 准备阶段：将文件集合转换为[文件名, 内容]的数组
  async prep(shared: SharedStorage): Promise<[string, string][]> {
    return Object.entries(shared.files || {});
  }

  // 执行阶段：为单个文件生成摘要
  async exec([filename, content]: [string, string]): Promise<[string, string]> {
    const summary = await callLLM(`Summarize the following file:\n${content}`);
    return [filename, summary];
  }

  // 后处理：存储所有文件的摘要
  async post(shared: SharedStorage, _: [string, string][], summaries: [string, string][]): Promise<string> {
    shared.file_summaries = Object.fromEntries(summaries);
    return "summarized";
  }
}

3. Reduce阶段：摘要合并

使用Node实现Reduce阶段，合并所有文件摘要：

class CombineSummaries extends Node<SharedStorage> {
  // 准备阶段：获取所有文件的摘要
  async prep(shared: SharedStorage): Promise<Record<string, string>> {
    return shared.file_summaries || {};
  }

  // 执行阶段：合并摘要
  async exec(summaries: Record<string, string>): Promise<string> {
    const text_list = Object.entries(summaries).map(
      ([fname, summ]) => `${fname} summary:\n${summ}\n`
    );

    return await callLLM(
      `Combine these file summaries into one final summary:\n${text_list.join("\n---\n")}`
    );
  }

  // 后处理：存储最终的综合摘要
  async post(shared: SharedStorage, _: Record<string, string>, finalSummary: string): Promise<string> {
    shared.all_files_summary = finalSummary;
    return "combined";
  }
}

4. 构建并执行流程

将两个节点连接起来并执行：

const batchNode = new SummarizeAllFiles();
const combineNode = new CombineSummaries();
batchNode.on("summarized", combineNode);

const flow = new Flow(batchNode);
flow.run({
  files: {
    "file1.txt": "Alice was beginning to get very tired of sitting by her sister...",
    "file2.txt": "Some other interesting text ...",
  },
});