Комментарии:
what if we want to extract text for any particular page
ОтветитьIn some of the latest updates to PyPDF2 the class "PdfFileReader" got replaced with "PdfReader". Code still works fine with "PdfReader". :)
Ответитьno idea how this is setup kina pointless where is pypdf do i get it from inside my bum bum? and what is this program?
ОтветитьHi sir..is it Work on Local Language Like Telugu
ОтветитьI am pretty sure there are over a thousand isntances of the word "coffee" in the pdf. However, this seems to have only counted the number of pages that the word appeared.
ОтветитьThe code did not work for me on a Windows 11 PC. I kept having ChatGPT analyze the code and error messages and after many tires it fixed it:
import os
import PyPDF2
import re
import math
def extract_text_from_pdf(pdf_file: str) -> [str]:
# Open the PDF file of your choice
with open(pdf_file, 'rb') as pdf:
reader = PyPDF2.PdfReader(pdf)
pdf_text = []
for page in reader.pages:
content = page.extract_text()
pdf_text.append(content)
return pdf_text
def main():
extracted_text = extract_text_from_pdf('sample.pdf')
for text in extracted_text:
print(text)
if _name_ == '__main__':
main()
I keep on getting Syntax Error: unmatched ')' on line 4 I'm running python 3.9 could that be the case?
Ответитьhelped me a lot. Thanks
Ответитьplease the resolution of your screen is not clear
Ответитьgreat as always.
ОтветитьThank you for the awesome tutorial. I have a some question about extracting articles. I hope you can help me. While extracting articles and reports there are many references and table legends, titles which is not required. Would it be possible to remove all those references and table contents including legends and titles when extracting the pdf file?
ОтветитьOh, a name changed.
ОтветитьIt's so helpful...loved it ❤
ОтветитьU r awesome 👏
Ответить