Extract Text from any PDF File in Python 3.10 Tutorial

Indently

1 год назад

43,145 Просмотров

Скачать видео

Комментарии:

Athar Khalid - 19.07.2023 11:50

what if we want to extract text for any particular page

Ответить

Light Yagami - 28.06.2023 23:01

In some of the latest updates to PyPDF2 the class "PdfFileReader" got replaced with "PdfReader". Code still works fine with "PdfReader". :)

Ответить

BakaOppai - 14.05.2023 22:15

no idea how this is setup kina pointless where is pypdf do i get it from inside my bum bum? and what is this program?

Ответить

Sathish EduTech - 26.04.2023 21:28

Hi sir..is it Work on Local Language Like Telugu

Ответить

John Wee - 30.03.2023 01:23

I am pretty sure there are over a thousand isntances of the word "coffee" in the pdf. However, this seems to have only counted the number of pages that the word appeared.

Ответить

Dave T - 23.02.2023 07:33

The code did not work for me on a Windows 11 PC. I kept having ChatGPT analyze the code and error messages and after many tires it fixed it:

import os
import PyPDF2
import re
import math

def extract_text_from_pdf(pdf_file: str) -> [str]:
# Open the PDF file of your choice
with open(pdf_file, 'rb') as pdf:
reader = PyPDF2.PdfReader(pdf)
pdf_text = []

for page in reader.pages:
content = page.extract_text()
pdf_text.append(content)

return pdf_text

def main():
extracted_text = extract_text_from_pdf('sample.pdf')
for text in extracted_text:
print(text)

if _name_ == '__main__':
main()

Ответить

Zain Saqib - 10.02.2023 23:43

I keep on getting Syntax Error: unmatched ')' on line 4 I'm running python 3.9 could that be the case?

Ответить

オタヴィオルイス - 05.02.2023 17:39

helped me a lot. Thanks

Ответить

Rania Rasmy - 27.11.2022 14:07

please the resolution of your screen is not clear

Ответить

mehdi smaeili - 26.11.2022 14:58

great as always.

Ответить

Rauniyar Santosh - 03.11.2022 05:55

Thank you for the awesome tutorial. I have a some question about extracting articles. I hope you can help me. While extracting articles and reports there are many references and table legends, titles which is not required. Would it be possible to remove all those references and table contents including legends and titles when extracting the pdf file?

Ответить