OpenPDF项目中的PDF合并功能问题分析与解决方案

2025-06-18 13:17:00作者：钟日瑜

OpenPDF is an open-source Java library for creating, editing, rendering, and encrypting PDF documents, as well as generating PDFs from HTML. It is licensed under the LGPL and MPL.

项目地址：https://gitcode.com/gh_mirrors/op/OpenPDF

在OpenPDF项目中，开发者经常需要处理PDF文档的合并操作。近期发现一个典型问题：当尝试将多个PDF文件合并为一个时，程序抛出"文档没有页面"的异常。这个问题看似简单，实则揭示了PDF处理中一些关键的技术要点。

问题现象

开发者尝试使用OpenPDF库合并多个PDF文件时，遇到了"com.lowagie.text.ExceptionConverter: The document has no pages"异常。核心代码如下：

Document document = new Document(PageSize.A4.rotate());
FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
PdfWriter pdfWriter = PdfWriter.getInstance(document,fileOutputStream);
// ... 其他代码
PdfCopy pdfCopy = new PdfCopy(document,fileOutputStream);

问题根源分析

经过深入分析，发现问题的根本原因在于：

资源重复使用：代码中同时创建了PdfWriter和PdfCopy两个对象，但它们共享同一个Document和FileOutputStream实例。这种设计违反了PDF处理的基本原则。
职责混淆：PdfWriter和PdfCopy是OpenPDF中两种不同的PDF生成机制，前者用于从头创建PDF，后者用于复制现有PDF内容。将它们混用会导致文档结构混乱。
页面管理不当：虽然代码中调用了document.newPage()，但PdfCopy机制有其自身的页面管理方式，这种混合使用方式造成了冲突。

解决方案

正确的实现方式应该完全基于PdfCopy机制，避免使用PdfWriter。以下是优化后的代码结构：

public static void mergePdfFiles(List<String> pdfFiles, String outputFile) 
    throws IOException, DocumentException {
    
    Document document = new Document();
    FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
    PdfCopy pdfCopy = new PdfCopy(document, fileOutputStream);
    
    document.open();
    
    for (String pdfFile : pdfFiles) {
        PdfReader reader = new PdfReader(pdfFile);
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
            PdfImportedPage page = pdfCopy.getImportedPage(reader, i);
            pdfCopy.addPage(page);
        }
        reader.close();
    }
    
    document.close();
}