Web Scraping with Python - Get URLs, Extract Data

Web Scraping with Python - Get URLs, Extract Data

John Watson Rooney

7 месяцев назад

8,222 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@Mac_Edits1
@Mac_Edits1 - 30.11.2023 01:11

"parse_page(html)" from lesson 2 suddenly became "parse_search_page(html: HTMLParser):" in lesson 3 without any explanation. Anyway great tutorial as well as a whole series. Very good for beginners.

Ответить
@acharafranklyn5167
@acharafranklyn5167 - 16.11.2023 19:16

Nice job is there a way to put this whole stuff in a cron job or scheduler to run intermittently

Ответить
@nuno2032
@nuno2032 - 06.11.2023 19:38

Beautiful job. How can I find the code?

Ответить
@playboipablo
@playboipablo - 05.11.2023 23:05

can you stop smashing your keyboard

Ответить
@KushalSharmatheOne
@KushalSharmatheOne - 04.11.2023 01:12

Man, your videos are great. Your videos on playwirght have really been helpful. I was able to follow your videos and then make my own playwirhgt script in my project. Until I got stuck dealing with dynamic pop-ups. I am unable to get past those. I am supposed to enter a piece of data in those pop-ups (not like captcha stuff). Just unable to make it work. It would help if you could cover dealing with dynamic pop-ups. Thanks.

Ответить
@AliceShisori
@AliceShisori - 03.11.2023 22:43

if we can combine playwright with this, then basically we can scrape any dynamic sites? (e.g: social media websites)
thank you so much John this series is very fulfilling.

Ответить
@michaelscheider6414
@michaelscheider6414 - 02.11.2023 23:24

very very good

Ответить
@daveys
@daveys - 01.11.2023 13:45

Excellent video series, much appreciated. Thank you for posting.

Ответить
@jaswanth333
@jaswanth333 - 31.10.2023 09:12

Also kindly add the product urls column for each product and make it clickable when writing to CSV

Ответить
@bathuudamdin
@bathuudamdin - 28.10.2023 11:04

Hi John, what is the fastest scraper for webpage with dynamically loaded content. I am using selenium and find it very slow in terms of speed. Any other options?

Ответить
@milyastroc
@milyastroc - 23.10.2023 23:51

This is very helpful! I appreciate it a lot.

Ответить
@SkullTraill
@SkullTraill - 21.10.2023 01:38

Can you show how we can do this on websites where we have to log in first?

Ответить
@Fabricio-mq2uk
@Fabricio-mq2uk - 20.10.2023 21:48

Thank you very much big John!

Ответить
@samoylov1973
@samoylov1973 - 19.10.2023 23:53

Based on one of your previous videos figured out, how to get nested objects from tricky div's . Thank you!
Could you please advise, how in function below do I get not only <p>'s but also <h2>'s, <pre>'s and <ul><li>'s elements?
Should it be some sort of pipe like syntax "div.article-formatted-body > div > p | h2 | pre | ul | li |"?

def read_article(html):
article_body = html.css("div.article-formatted-body > div > p")
paragraphs = [i.text() for i in article_body]
print(*paragraphs, sep='\n')

Ответить
@abdifatahabdi3939
@abdifatahabdi3939 - 19.10.2023 21:31

you are genius man, thank you very much

Ответить
@Lorem04
@Lorem04 - 19.10.2023 18:05

thank you! we need more of this sh!t
and i hope a serie like this of BeatifulSoup either

Ответить
@sallycakes472
@sallycakes472 - 19.10.2023 02:20

thanks heaps for these John, can we please get the code into a pastebin or something pls? 🙏

Ответить
@hreedaymishra7761
@hreedaymishra7761 - 18.10.2023 20:38

Thank you please continue this series

Ответить
@eduardop5487
@eduardop5487 - 18.10.2023 17:06

Excellent video, great learning experience

Ответить
@muhammadhaddid9927
@muhammadhaddid9927 - 18.10.2023 16:34

Hi kindly make a video of python with Selenium because no updated chrome driver available so I don't know how we run script now.
Thanks

Ответить
@bakasenpaidesu
@bakasenpaidesu - 18.10.2023 16:25

Ohayou ❤

Ответить