Extract Text from PDFs & Images for LLMs Using Python

Extract Text from PDFs & Images for LLMs Using Python

ZoumDataScience

11 месяцев назад

15,921 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@valmirrastelyjunior9400
@valmirrastelyjunior9400 - 28.12.2023 00:14

Ok

Ответить
@user-cb7yl4nr6h
@user-cb7yl4nr6h - 30.09.2023 16:07

كل الاساليب لتحويل الملفات فشلت ولا اعرف السبب

Ответить
@user-cb7yl4nr6h
@user-cb7yl4nr6h - 27.09.2023 15:01

مااسم صفحة الانترنت التى تكتب فيها الاكواد

Ответить
@user-sv3bk3jf5j
@user-sv3bk3jf5j - 20.09.2023 16:44

Hi there, I plan on using the EasyOCR Library for some sensitive Documents, Is it safe, like can any data Leaks Occur, also Is there any Documentations of the Library I can refer to ?

Thanks !!

Ответить
@shooby117
@shooby117 - 11.09.2023 20:18

I run into the following error when I try langchain's UnstructuredImageLoader:

TypeError: stat: path should be string, bytes, os.PathLike or integer, not JpegImageFile

Ответить
@GavSP
@GavSP - 09.09.2023 18:21

Hello when I am trying to run the same code you have 'text_with_langchain_image = extract_text_with_langchain_image(convert_pdf_to_images)'
I am getting the error 'TypeError: stat: path should be string, bytes, os.PathLike or integer, not JpegImageFile'
Even though I prepared the data using the same convert_pdf_to_images function. Any ideas?

Ответить
@SuiGio
@SuiGio - 06.09.2023 21:56

Important question: If a pdf does have a picture in it, when converting to picture, firstly, is that picture added as text or is it skipped?
Secondly, is there a way to know that the extracted text is coming from an image within the pdf? Some sort of metadata at least to get that info?
Thanks for the video, nice content with overall breadth, wish you could answer my question.

Ответить
@ibrahimkouma6751
@ibrahimkouma6751 - 19.08.2023 17:38

Bonjour êtes vous malien? super tutoriel merci pour le partage

Ответить
@zoumdatascience
@zoumdatascience - 18.08.2023 06:28

Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)

Ответить
@abhishekacharya8486
@abhishekacharya8486 - 17.08.2023 17:05

which among these has the best accuracy?

Ответить
@AbdulAhad-Family
@AbdulAhad-Family - 16.08.2023 05:43

Fantastic tutorial, so much simplified ...great job

Ответить
@susmitsekhar5100
@susmitsekhar5100 - 06.08.2023 03:58

Great work. can we extract information from charts like histogram/barplot ?

Ответить
@behardcorepeople
@behardcorepeople - 03.08.2023 18:08

Great Work, thank you ! 😀

Ответить
@anubhav963
@anubhav963 - 01.08.2023 11:15

I am getting list index of range for langchain. Can you suggest sometime there

Ответить
@AIJasonZ
@AIJasonZ - 30.07.2023 10:09

This is awesome, great work!

Ответить
@jonyswe580
@jonyswe580 - 25.07.2023 09:47

Good one!

Ответить