使用Java的Pdf页数

目前我正在使用itext阅读pdf的页数.这需要很长时间,因为lib似乎扫描整个文件.

页面信息是否位于pdf的标题中,还是需要完整的文件?

没错iText在打开时解析相当多的PDF(它不读取流对象的内容,但是这是关于它)…

除非你使用PdfReader(RandomAccessFileOrArray)构造函数,在这种情况下,它只会读取外部参照(大部分是必需的),但不会解析任何东西,直到你开始请求特定对象(直接或通过各种调用).

The first PDF program I ever wrote did exactly this. It opened up a PDF and doing the bare minimum amount of work necessary, read the number of pages. It didn’t even parse the xrefs it didn’t have to. Haven’t thought about that program in years…

所以虽然效率不是很高,但使用RandomAccessFileOrArray将会更有效率:

int efficientPDFPageCount(String path) {
  RandomAccessFileOrArray file = new RandomAccessFileOrArray(path, false, true );
  PdfReader reader = new PdfReader(file);
  int ret = reader.getNumberOfPages();
  reader.close();
  return ret;
}

更新:

itext API经过了一番大修.现在(5.4.x版)正确的使用方法是通过java.io.RandomAccessFile:

int efficientPDFPageCount(File file) {
     RandomAccessFile raf = new RandomAccessFile(file, "r");
     RandomAccessFileOrArray pdfFile = new RandomAccessFileOrArray(
          new RandomAccessSourceFactory().createSource(raf));
     PdfReader reader = new PdfReader(pdfFile, new byte[0]);
     int pages = reader.getNumberOfPages();
     reader.close();
     return pages;
  }
相关文章
相关标签/搜索