NLP Data Import part 2 - Document parsing|How to parse pdf files in Python|Document parsing Python

NLP Data Import part 2 - Document parsing|How to parse pdf files in Python|Document parsing Python

Unfold Data Science

4 года назад

16,859 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@user-yv7fe8jp3t
@user-yv7fe8jp3t - 18.10.2023 13:34

If the pdf contains 50 pages and I want to parse/extract a particular value in each page. Please help in this context.

Ответить
@ImranKhan-jn6zh
@ImranKhan-jn6zh - 20.03.2023 08:04

Hello Aman,

Can u please let me know why u have used != -1
in second for loop in if condition

Ответить
@xxxsamxxx130
@xxxsamxxx130 - 16.02.2023 05:49

what is the purpose of not equal to -1, sir ?

Ответить
@kishanbeesa4139
@kishanbeesa4139 - 16.01.2023 17:44

Please or make a video on extracting the checkboxes from the word document or pdf

Ответить
@AshwiniHindlekar
@AshwiniHindlekar - 01.10.2022 22:59

Hi..thanks its great learning..do u also do freelancing?

Ответить
@priyadharshinisivakumar4951
@priyadharshinisivakumar4951 - 24.09.2022 17:04

It's very useful.Thank you Aman.

Ответить
@mohammedalshami3937
@mohammedalshami3937 - 23.09.2022 18:24

I am really enjoying your NLP series. Thank you for making it look as simple as this.

Ответить
@alfredoderodt6519
@alfredoderodt6519 - 07.08.2022 00:17

Thank you so much! This is great. I have a question though....¿How would you save this information in JSON format? : D

Ответить
@sahajanayak48
@sahajanayak48 - 03.08.2022 03:27

HOW CAN I SCRAPE A KANNADA PDF TO UNICODE IN PYTHON

Ответить
@vishalgarg8423
@vishalgarg8423 - 25.06.2022 18:28

Dear Sir,
Thanks for This Video , Is there any way that I can enter a word and search in thousands of pdf and the pdf which contains the word will open.

Ответить
@brendensong8000
@brendensong8000 - 17.06.2022 07:06

Great video! Thank you for sharing!

Ответить
@alvin3428
@alvin3428 - 09.04.2022 14:59

How do I extract specific data from invoice having different formats, please help sir.

Ответить
@sandipansarkar9211
@sandipansarkar9211 - 29.01.2022 16:07

finished watching

Ответить
@raghudharavath2299
@raghudharavath2299 - 10.12.2021 13:25

Please do it with pdfminer

Ответить
@nakshatrasingh446
@nakshatrasingh446 - 16.09.2021 17:30

Great video sir, how do I save those values in a CSV file? And my second question is how do I split on next line rather than : ?

Ответить
@mujeebullahkhan5201
@mujeebullahkhan5201 - 26.05.2021 14:37

Sir, My Folder Has Various Files Like
txt,docs,excel,pdf etc then what is the solution? Can you make a separate video for them?

Ответить
@sandeyche
@sandeyche - 05.05.2021 15:33

Could you please suggest if in case all the Invoices format are different each other.

Ответить
@yash422vd
@yash422vd - 21.04.2021 16:02

GIving an error at this line ---> invoice_no = file_contents[i].split(': ')[1]
ERROR: IndexError: list index out of range
I tried & replicated same format of bills in word and saved them in PDF format, used random values in invoice, date and amount.
Please suggest!

Ответить
@prakharupadhyay9465
@prakharupadhyay9465 - 19.04.2021 22:23

for match in self._lang_vars.period_context_re().finditer(text):

TypeError: expected string or bytes-like object
while performing tokenization
please help

Ответить
@porudoryu
@porudoryu - 13.04.2021 09:01

Still learning Python and your simple teaching style is really helpful.
You got yourself a subscriber sir. Thanks!

Ответить
@shreygrover3850
@shreygrover3850 - 30.03.2021 17:46

Hi, I am getting this error 'PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]'. Any idea why that's happening?

Ответить
@kiranvanukuri9382
@kiranvanukuri9382 - 19.03.2021 15:28

Bro plz make video on a how to extract data from docs and pdfs and how to add that entities to data frame plz bro

Ответить
@yitao_
@yitao_ - 02.03.2021 17:18

very good thank you.

Ответить
@sandipansarkar9211
@sandipansarkar9211 - 29.01.2021 14:08

Bur this is not working in my google colab
import os
dir_Path = 'C://Users//server//Desktop'
os.chdir(dir_Path)
print(dir_Path)
The eror which i am getting is
FileNotFoundError Traceback (most recent call last)
<ipython-input-13-13a426d276e1> in <module>()
1 import os
2 dir_Path = 'C://Users//server//Desktop'
----> 3 os.chdir(dir_Path)
4 print(dir_Path)

FileNotFoundError: [Errno 2] No such file or directory: 'C://Users//server//Desktop'

Please guide me

Ответить
@bharaths3167
@bharaths3167 - 14.01.2021 21:24

How can i parse doc file its very challenging one in Windows 10 Python?


Thanks in advance

Ответить
@csprusty
@csprusty - 23.12.2020 08:54

The content is simple yet very useful to start with.

Ответить
@thanzeersalim620
@thanzeersalim620 - 18.12.2020 09:09

AWESOME GR8

Ответить
@nicolasaraujo4757
@nicolasaraujo4757 - 28.10.2020 00:18

Is there a way to get through the pages of the file? I don't want just the informations on page 0.

Ответить
@kajalchaudhary6024
@kajalchaudhary6024 - 15.10.2020 13:45

I want to learn web scraping from basic to advance. If u are providing the online classes plz let me know sir 🙏

Ответить
@shresthmishra9329
@shresthmishra9329 - 10.10.2020 02:30

This is such a great simple playlist. Thank you.

Ответить
@jexos_
@jexos_ - 14.08.2020 08:33

So useful! This helped me automate a huge amount of work for my company. Thank you very much

Ответить
@Kumarsashi-qy8xh
@Kumarsashi-qy8xh - 15.04.2020 08:07

Thanks for the information sir

Ответить
@GopiKumar-ny3xx
@GopiKumar-ny3xx - 14.04.2020 17:37

Good presentation...

Ответить
@EngRiadAlmadani
@EngRiadAlmadani - 14.04.2020 14:33

Nice work

Ответить
@preranatiwary7690
@preranatiwary7690 - 14.04.2020 14:12

Nice video on Doc parsing.

Ответить