Комментарии:
pymupdf is more faster and advanced
ОтветитьPypdf2
Pdfreader
Not work
How all pages with fitz
How can geometric shapes be extracted?
Ответитьis ther a way to combine tables and text extraction, I men the result should be "text1, then a table [name, etc], another text"
ОтветитьIn the last part of the video it is said that a table of content can be extracted with pymupdf, but I dont see anything like that in the code you are showing?
ОтветитьIn a test, I had POOR results with pdfplumber : It failed to detect multiple columns, and treated them as 1 row!
It also failed a number of times at detecting blank spaces in words - and they get all smushed together.
Copy-and-pasted appalling scan results:
Themovementofoceanwaterisoneofthetwoprinci- shapeofthebasininwhichthecurrentisrunning,extentand
pal sources of discrepancy between dead reckoned and location of land, and deflection by the rotation of the earth.
PyMuPDF, by contrast, did just fine.
thanks for video and the proper documentation, appreciate your work keep-it-up bro..
ОтветитьThank you for the video 👍
ОтветитьSomeone please tell me where is the file.pdf used on this video?
ОтветитьThank you 🙏 so easy to understand and helpful
I hope you explain desktop applications
Hi is there any way to make some thing that can identify how many pages in a PDF are having image and how many pages are non Image using python or any other language
ОтветитьThank you so much for this! I've been looking for a clear video on how to get information out of pdf's, and you provided a very good start
ОтветитьThe table has a line above it- A sample table to extract. Is there a way I can extract that line along with the table as well using PDF plumber or any other library?
Ответитьcan you send me the pdf link to download
Ответить