3步搞定PPT转Markdown！Bisheng全流程文件处理指南

2026-02-04 05:18:42作者：滕妙奇

你是否还在为PPT转文本格式繁琐、文件上传速度慢、下载链接失效而烦恼？本文将通过Bisheng的文件处理功能，教你如何高效完成从文件上传到格式转换的全流程操作，让文档管理不再成为工作负担。读完本文，你将掌握：PPT到Markdown的一键转换、大文件分片上传技巧、以及多格式文件的下载管理方法。

文件上传：从前端到后端的完整链路

Bisheng提供了多场景的文件上传接口，无论是知识库文件还是临时会话附件，都能通过统一的服务实现高效上传。核心上传逻辑通过save_uploaded_file函数实现，支持自动生成唯一文件名并存储到指定路径。

上传接口实现

上传功能主要通过/knowledge/upload和/workstation/files两个接口完成。以知识库上传为例，前端通过FormData格式提交文件，后端调用save_uploaded_file处理：

# [src/backend/bisheng/api/v1/knowledge.py](https://gitcode.com/GitHub_Trending/bi/bisheng/blob/79eef2140735143dbe71f7b31494775caa7ddb43/src/backend/bisheng/api/v1/knowledge.py?utm_source=gitcode_repo_files#L35-L47)
@router.post('/upload')
async def upload_file(*, file: UploadFile = File(...)):
    try:
        file_name = file.filename
        # 缓存本地
        uuid_file_name = KnowledgeService.save_upload_file_original_name(file_name)
        file_path = save_uploaded_file(file.file, 'bisheng', uuid_file_name)
        if not isinstance(file_path, str):
            file_path = str(file_path)
        return resp_200(UploadFileResponse(file_path=file_path))
    except Exception as exc:
        logger.exception(f'Error saving file: {exc}')
        raise HTTPException(status_code=500, detail=str(exc)) from exc

上传状态监控

对于大文件上传，Bisheng支持分片上传和进度监控。通过process_knowledge_file方法可以跟踪文件处理状态：

# [src/backend/bisheng/api/v1/knowledge.py](https://gitcode.com/GitHub_Trending/bi/bisheng/blob/79eef2140735143dbe71f7b31494775caa7ddb43/src/backend/bisheng/api/v1/knowledge.py?utm_source=gitcode_repo_files#L96-L105)
@router.post('/process')
async def process_knowledge_file(*,
                                 request: Request,
                                 login_user: UserPayload = Depends(get_login_user),
                                 background_tasks: BackgroundTasks,
                                 req_data: KnowledgeFileProcess):
    """ 上传文件到知识库内 """
    res = KnowledgeService.process_knowledge_file(request, login_user, background_tasks, req_data)
    return resp_200(res)

PPT转Markdown：3行代码实现格式转换

Bisheng的pptx2md模块提供了专业的PPT转Markdown功能，支持文本提取、图片转换和版式保持。转换核心通过convert函数实现，支持自定义输出格式（普通Markdown、Wiki、Quarto等）。

转换核心流程

转换过程分为三个步骤：加载PPT文件、解析幻灯片内容、生成目标格式。核心代码如下：

# [src/backend/bisheng/pptx2md/entry.py](https://gitcode.com/GitHub_Trending/bi/bisheng/blob/79eef2140735143dbe71f7b31494775caa7ddb43/src/backend/bisheng/pptx2md/entry.py?utm_source=gitcode_repo_files#L25-L51)
def convert(config: ConversionConfig):
    if config.title_path:
        config.custom_titles = prepare_titles(config.title_path)
    
    prs = load_pptx(config.pptx_path)
    logger.info("conversion started")
    ast = parse(config, prs)
    
    if str(config.output_path).endswith('.json'):
        with open(config.output_path, 'w') as f:
            f.write(ast.model_dump_json(indent=2))
        logger.info(f'presentation data saved to {config.output_path}')
        return
    
    # 根据输出格式选择对应 formatter
    out = outputter.MarkdownFormatter(config)  # 默认Markdown格式
    out.output(ast)
    logger.info(f'converted document saved to {config.output_path}')

高级转换配置

通过ConversionConfig可以配置转换参数，如指定页面范围、最小文本块长度等：

# [src/backend/bisheng/pptx2md/types.py](https://gitcode.com/GitHub_Trending/bi/bisheng/blob/79eef2140735143dbe71f7b31494775caa7ddb43/src/backend/bisheng/pptx2md/types.py?utm_source=gitcode_repo_files#L28-L73)
class ConversionConfig(BaseModel):
    """Path to the pptx file to be converted"""
    pptx_path: Path
    """Path to the output file"""
    output_path: Path
    """The minimum character number of a text block to be converted"""
    min_char: int = 20
    """Only convert the specified page"""
    page: Optional[int] = None
    # 更多配置参数...

文件下载：安全高效的资源获取

Bisheng通过MinIO客户端实现文件的安全存储和下载，支持生成带签名的临时链接，确保文件访问的安全性。

下载链接生成

下载功能通过minio_client.get_share_link生成临时访问链接，有效期可配置：

# [src/backend/bisheng/utils/minio_client.py](https://gitcode.com/GitHub_Trending/bi/bisheng/blob/79eef2140735143dbe71f7b31494775caa7ddb43/src/backend/bisheng/utils/minio_client.py?utm_source=gitcode_repo_files#L159)
def download_minio(self, object_name: str):
    try:
        return self.client.get_object(self.bucket_name, object_name)
    except S3Error as e:
        logger.error(f"Error downloading {object_name}: {e}")
        raise

# [src/backend/bisheng/api/v1/evaluation.py](https://gitcode.com/GitHub_Trending/bi/bisheng/blob/79eef2140735143dbe71f7b31494775caa7ddb43/src/backend/bisheng/api/v1/evaluation.py?utm_source=gitcode_repo_files#L95-L104)
@router.get('/result/file/download')
async def get_download_url(*,
                          file_url: str = Query(..., description='文件路径'),
                          login_user: UserPayload = Depends(get_login_user)):
    download_url = minio_client.get_share_link(file_url)
    return resp_200(data={
        'url': download_url
    })

多格式文件支持

系统支持PDF、DOCX、PPTX等多种格式的下载，通过统一的文件服务接口实现：

# [src/backend/bisheng/api/v1/workstation.py](https://gitcode.com/GitHub_Trending/bi/bisheng/blob/79eef2140735143dbe71f7b31494775caa7ddb43/src/backend/bisheng/api/v1/workstation.py?utm_source=gitcode_repo_files#L284)
def getFileContent(filepath):
    """获取文件内容"""
    filepath_local, file_name = file_download(filepath)
    raw_texts, _, _, _ = knowledge_imp.read_chunk_text(
        filepath_local,
        file_name,
        ['\n\n', '\n'],
        ['after', 'after'],
        1000,
        0,
        excel_rule=ExcelRule()
    )
    return knowledge_imp.KnowledgeUtils.chunk2promt(''.join(raw_texts), {'source': file_name})

实战案例：PPT转Markdown完整流程

以下是使用Bisheng将PPT转换为Markdown的完整步骤，只需3步即可完成：

步骤1：上传PPT文件

通过知识库上传接口上传PPT文件，获取文件路径：

curl -X POST "http://localhost:8000/api/v1/knowledge/upload" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/to/your/presentation.pptx"

返回结果包含文件存储路径：

{
  "code": 200,
  "data": {
    "file_path": "bisheng/20241022/xxx-pptx"
  },
  "message": "success"
}

步骤2：调用转换接口

使用pptx2md模块的转换功能，指定输入输出路径：

from bisheng.pptx2md.entry import convert
from bisheng.pptx2md.types import ConversionConfig

config = ConversionConfig(
    pptx_path="bisheng/20241022/xxx-pptx",
    output_path="output.md",
    min_char=10  # 调整最小文本块长度
)
convert(config)

步骤3：下载转换结果

通过文件下载接口获取转换后的Markdown文件：

curl "http://localhost:8000/api/v1/evaluation/result/file/download?file_url=output.md"

总结与进阶技巧

Bisheng的文件处理模块通过统一的上传下载接口和灵活的格式转换服务，解决了企业文档管理中的常见痛点。核心优势包括：

分布式存储：基于MinIO的对象存储，支持海量文件管理
异步处理：通过Celery实现文件转换的后台任务处理
多格式支持：内置PPTX/Excel/Word等格式的解析器

进阶使用可参考：

通过本文介绍的方法，你可以轻松实现PPT到Markdown的高效转换，以及各类文件的全生命周期管理。立即尝试Bisheng，让文档处理效率提升10倍！

bisheng

Bisheng is an open LLM devops platform for next generation AI applications.

项目地址：https://gitcode.com/GitHub_Trending/bi/bisheng

登录后查看全文

项目优选

收起

kernel

deepin linux kernel

docs

OpenHarmony documentation | OpenHarmony开发者文档

Ascend Extension for PyTorch

本项目是CANN提供的数学类基础计算算子库，实现网络在NPU上加速计算。

openEuler内核是openEuler操作系统的核心，既是系统性能与稳定性的基石，也是连接处理器、设备与服务的桥梁。

🎉 (RuoYi)官方仓库基于SpringBoot，Spring Security，JWT，Vue3 & Vite、Element Plus 的前后端分离权限管理系统

AscendNPU-IR是基于MLIR（Multi-Level Intermediate Representation）构建的，面向昇腾亲和算子编译时使用的中间表示，提供昇腾完备表达能力，通过编译优化提升昇腾AI处理器计算效率，支持通过生态框架使能昇腾AI处理器与深度调优

华为昇腾面向大规模分布式训练的多模态大模型套件，支撑多模态生成、多模态理解。

Python

128

174