首页 > 其他 > 详细

pdfminer获取整页文本

时间:2018-07-12 10:14:26      阅读:156      评论:0      收藏:0      [点我收藏+]
 1 #! python2
 2 # coding: utf-8
 3 
 4 import sys
 5 from cStringIO import StringIO
 6 from pdfminer import pdfinterp
 7 from pdfminer import pdfpage
 8 from pdfminer import converter
 9 from pdfminer import layout
10 
11 with file(path, rb) as fp:
12     rsrcmgr = pdfinterp.PDFResourceManager()
13     retstr = StringIO()
14     codec = utf-8
15     laparams = layout.LAParams()
16     device = converter.TextConverter(
17         rsrcmgr, retstr, codec=codec, laparams=laparams)
18     # Create a PDF interpreter object.
19     interpreter = pdfinterp.PDFPageInterpreter(rsrcmgr, device)
20     # Process each page contained in the document.
21     pages = pdfpage.PDFPage.get_pages(fp)
22     for page in pages:
23         interpreter.process_page(page)
24         data = retstr.getvalue()

 

pdfminer获取整页文本

原文:https://www.cnblogs.com/Greenseer/p/9297885.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!