Python Tutorial: Web Scraping with Requests-HTML

Python Tutorial: Web Scraping with Requests-HTML

Corey Schafer

5 лет назад

189,654 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Ohio Homes Inc
Ohio Homes Inc - 27.08.2023 15:29

Thankyou, Corey.

Your explanations are always complete and very helpful!

Ответить
arcy
arcy - 25.06.2023 14:43

You are the best, Corey 🥳

Ответить
Ahmed Ziada
Ahmed Ziada - 22.04.2023 20:32

If I could give you a billion likes for this video I would. This is top quality content.

Ответить
Pardener
Pardener - 05.02.2023 07:20

good video

Ответить
Jean Wang
Jean Wang - 12.01.2023 06:14

Video id can be extracted by regex (?<=embed\/).+(?=\?)

Ответить
Sadasivam Eniasivam
Sadasivam Eniasivam - 04.01.2023 16:59

Can you do a webscraping with python and scrappy and xpath and Hidden API.

Ответить
Алексей Соков
Алексей Соков - 22.12.2022 13:45

2022

Ответить
10
10 - 17.09.2022 22:54

On the webpage/url that I call session.get(url) on, there is a javascript script, one thing this script does is send a request of its own, how can I capture the response to this request?

Ответить
Africa Plus
Africa Plus - 30.07.2022 20:41

Please can you make a another video on how to scrape application?

Ответить
sepehr soltani
sepehr soltani - 27.07.2022 13:26

Simply described to the point! Thanks

Ответить
Quan Lien
Quan Lien - 25.06.2022 09:42

Im confusing between this video and the web scraping with bs4. Can someone explain to me is this video the same as the other one or is it different?

Ответить
אלי ששון
אלי ששון - 03.06.2022 13:05

Hi Corey, can u make a tutorial about how to call public java script api from python?

Ответить
Gisle Berge
Gisle Berge - 04.04.2022 19:55

Very thorough and complete on the topic, thanks for educational video 🙂

Ответить
bayy420
bayy420 - 23.03.2022 18:00

hi, im new about coding and stuff

i want to scrape url inside <a> tag with class .shortc-button but as you already know that 1 element can have multiple class...

in my case theres:
.shortc-button medium green (for mediafire url)
and
.shortc-button medium orange (for zippyshare url)

i want to access '.shortc- button medium green' that contains mediafire links

how i should write the code?
1. r.html.find('.shortc-button .medium .green')
2. r.html.find('.shortc-button medium green')
3.
shortc = r.html.find('.shortc-button')
for medium in shortc:
medium.find('.medium')
for green in medium:
green.find('.green')

Ответить
Imran Ullah
Imran Ullah - 09.03.2022 05:29

Sir I get empty list from soup.find_all("div",class_="some class"), although there are some children of this class
What can be the reason?

Ответить
ARIC KAJI
ARIC KAJI - 13.01.2022 08:21

it's very useful thank you so much 💯

Ответить
D I
D I - 08.01.2022 21:38

Hi @Corey, for your tutorial related to AsyncHTMLSession.

I'm getting the
"RuntimeError: This event loop is already running."

I checked the documentation did not really see the reason for it. Could you please take a look if that is expected. I'm running in Windows 10. Python 3.10.

Ответить
harshita nailwal
harshita nailwal - 29.12.2021 10:50

i tried scraping one, but got a status code - 406 , can you please help, i can't find a solution!

Ответить
Jose Yubero 杜
Jose Yubero 杜 - 26.12.2021 19:27

Daaamn this is the greatest video I’ve ever seen about scrapping, nice I was looking for this kind of explanation for long time since I’m working on a project with python 3

Ответить
Collins Saguru
Collins Saguru - 07.12.2021 20:27

if anyone understands what html = HTML(html = source) is doing please assist? Any links to another video where it's explained would be welcome as well. thanks

Ответить
Honkleton Donkleton
Honkleton Donkleton - 25.10.2021 18:57

top notch as usual thank you

Ответить
Highering AI
Highering AI - 01.10.2021 03:59

how to scrap innerHTML content?

Ответить
helloworld
helloworld - 26.08.2021 18:36

tysm

Ответить
rick segal
rick segal - 23.08.2021 10:37

12 minutes in, I can grab website information from this tutorial. Why is this a big deal? I know next to nothing about Python. Corey is high value in a very condensed time. Others would take hours to get to his 12-minute mark. Subscribed.

Ответить
The Global Conflict
The Global Conflict - 19.08.2021 07:09

do you have any series about asynchronous programming in python ???

Ответить
prateek sarangi
prateek sarangi - 14.08.2021 03:40

Wow detailed info !!
Request to go for coroutines and asyncio and async await please

Ответить
Eric Li
Eric Li - 27.07.2021 16:19

What a great tutorial! I bet this is the first long tutorial that I ever watched nonstop.

Ответить
Sulav Lohani
Sulav Lohani - 05.07.2021 09:27

Corey, it was really much informative. Can you clearify me what is the difference bet using BeautifulSoup and HTMLSession. Like for which types of sites, we use BeautifulSoup and for which type of site to use HTMLSession.

Ответить
Eddie
Eddie - 27.06.2021 19:05

Hi Corey, thanks for your video, it's really helpful.

I want to ask if the website requires log-in to see the data, how can we do that? I see there's a way to do it with normal request library but found none with requests-html. Thanks

Ответить
A A
A A - 22.06.2021 00:11

Hi Corey. How does one find an element by its attribute and not by using css selector?

Ответить
tiger12506
tiger12506 - 25.05.2021 11:44

This is really cool. I was looking for the ability to scrape a website and found requests_html. Quickly ran headlong into a wall as the site is a React.js site. :( Thought maybe I could find some information on performing clicks and such with requests_html, but looks like that is not possible. Your tutorial on the subject is great though. Really well thought out and explained, Great presentation!

Ответить
SHIVEN KHAJURIA
SHIVEN KHAJURIA - 08.05.2021 17:08

@coreyschaffer - Please provide code for web scrapping for this video.Github repo link doesnt contain the code files.

Ответить
Stocks Unlocked
Stocks Unlocked - 26.04.2021 16:00

Great stuff. Quick question. I'm able to scrape links but when they output on the HTML page it's just the text, not the clickable hyperlink. Any ideas on how to fix this so I can have a clickable link?

Ответить
nofyat hp
nofyat hp - 13.04.2021 23:19

But how bout a login page? Is it still worth?

Ответить
Troglodyte
Troglodyte - 31.01.2021 09:45

Brilliant as usual! Salute!!!

Ответить
Harman Hundal
Harman Hundal - 30.01.2021 16:25

Just a suggestion Corey. Can you please tag your videos 'Beginner', 'Intermediate', 'Advanced' for the benefit of noobs like me. Thanks already. Keep the awesome stuff coming.

Ответить
Jun Ouyang
Jun Ouyang - 21.01.2021 16:45

How AsyncHTMLSession work with concurrent.futures? Don’t want to write a function for each thread.

Ответить
curruption018
curruption018 - 22.12.2020 07:39

Whenever I run .find(), the type thats returned is a list. For example the variable you have named "headline" would be a list. So I cant run .find() again. Also for some reason it's not recognizing .html as a method of the r object. I even explicitly declared the variable type but it still cannot see .html as a method from whatever session.get returns. Any suggestions?

Ответить
RATAN AGARWAL
RATAN AGARWAL - 02.12.2020 15:29

nice tutorial

Ответить
Joel Smith
Joel Smith - 13.11.2020 01:53

I'm learning a ton about webscraping from this tutorial, but I'm not able to run the code. Like many folks, I've got a few Python versions installed. I ran the code in the Thonny IDE, but I get a traceback on 'no requests_html module found.' Did some research on it, and discovered that requests_html is only supported on Python 3.6 (and my Thonny default was 3.7). I reset Thonny to run 3.6.5, but got the same error. Now I'm installing 3.6 to see if requests_html will be imported in that version. Anyone else see a similar issue with a traceback? What was your workaround?

Ответить
Pythusiast
Pythusiast - 09.11.2020 04:45

Hello Corey, can you please make a full on tutorial on webscraping using Scrapy? Thanks in advance.

Ответить
Martin Gladis
Martin Gladis - 05.11.2020 01:21

In my code I have this error: There is no current event loop in thread 'Thread-1'
My code(I use Django):
session = HTMLSession()
r = session.get(url)
r.html.render()

Ответить
STEPHEN ABORHEY
STEPHEN ABORHEY - 01.11.2020 16:35

i really love this video @Corey Schafer but i would like to learn about using the api to scrap data from social media like Facebook, twitter and the rest so if you do a video about that will be appreciated thank you

Ответить
Nick Palmieri
Nick Palmieri - 28.10.2020 04:32

r.html only works in the terminal but not in IDE. help pls!

Ответить
The Frustrated Programmer
The Frustrated Programmer - 11.09.2020 14:03

hi @corey schafer even i found the same problem while working with requests_html where there is no prettify method but we can overcome that one with full_text method instead of text

thanks and hope i helped you

Ответить