Extract Text from any PDF File in Python 3.10 Tutorial

Extract Text from any PDF File in Python 3.10 Tutorial

Indently

1 год назад

43,145 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Athar Khalid
Athar Khalid - 19.07.2023 11:50

what if we want to extract text for any particular page

Ответить
Light Yagami
Light Yagami - 28.06.2023 23:01

In some of the latest updates to PyPDF2 the class "PdfFileReader" got replaced with "PdfReader". Code still works fine with "PdfReader". :)

Ответить
BakaOppai
BakaOppai - 14.05.2023 22:15

no idea how this is setup kina pointless where is pypdf do i get it from inside my bum bum? and what is this program?

Ответить
Sathish EduTech
Sathish EduTech - 26.04.2023 21:28

Hi sir..is it Work on Local Language Like Telugu

Ответить
John Wee
John Wee - 30.03.2023 01:23

I am pretty sure there are over a thousand isntances of the word "coffee" in the pdf. However, this seems to have only counted the number of pages that the word appeared.

Ответить
Dave T
Dave T - 23.02.2023 07:33

The code did not work for me on a Windows 11 PC. I kept having ChatGPT analyze the code and error messages and after many tires it fixed it:

import os
import PyPDF2
import re
import math

def extract_text_from_pdf(pdf_file: str) -> [str]:
# Open the PDF file of your choice
with open(pdf_file, 'rb') as pdf:
reader = PyPDF2.PdfReader(pdf)
pdf_text = []

for page in reader.pages:
content = page.extract_text()
pdf_text.append(content)

return pdf_text


def main():
extracted_text = extract_text_from_pdf('sample.pdf')
for text in extracted_text:
print(text)


if _name_ == '__main__':
main()

Ответить
Zain Saqib
Zain Saqib - 10.02.2023 23:43

I keep on getting Syntax Error: unmatched ')' on line 4 I'm running python 3.9 could that be the case?

Ответить
オタヴィオルイス
オタヴィオルイス - 05.02.2023 17:39

helped me a lot. Thanks

Ответить
Rania Rasmy
Rania Rasmy - 27.11.2022 14:07

please the resolution of your screen is not clear

Ответить
mehdi smaeili
mehdi smaeili - 26.11.2022 14:58

great as always.

Ответить
Rauniyar Santosh
Rauniyar Santosh - 03.11.2022 05:55

Thank you for the awesome tutorial. I have a some question about extracting articles. I hope you can help me. While extracting articles and reports there are many references and table legends, titles which is not required. Would it be possible to remove all those references and table contents including legends and titles when extracting the pdf file?

Ответить
ComputerZero
ComputerZero - 08.08.2022 18:13

Oh, a name changed.

Ответить
Akash Nath
Akash Nath - 08.08.2022 13:56

It's so helpful...loved it ❤

Ответить
#rs7
#rs7 - 08.08.2022 13:56

U r awesome 👏

Ответить