Slow Web Scraper? Try this with ASYNC and Requests-html

Slow Web Scraper? Try this with ASYNC and Requests-html

John Watson Rooney

3 года назад

17,498 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Luca Lo Iacono
Luca Lo Iacono - 20.09.2023 17:30

John thank you so much for your video, everything is explained in the most understandable way.
I tried using your teachings in scraping some html sites with excellent results.
However, I have big problems adding to the function code await response.arender() for scraping sites with data in javascript.
After running the script, I get no results because the script enters an infinite loop.
this is the complete function I use:

async def name (session,url):
response = await session.get(url)
await response.html.arender()
return response


When I use the code non-asynchronously and therefore not in a function, the code works fine:

session = AsyncHTMLSession()
response = await session.get(url)
await response.html.arender()

Can you please help me in fixing this problem?
Thanks a lot,
Luca


P.S.(Congratulations for your coding channel, absolutely the best!)

Ответить
Mihai Lazarescu
Mihai Lazarescu - 27.06.2023 02:11

John Watson Rooney, can you explain, please, the method to render JavaScript using AsyncHTMLSession and asyncio?

Ответить
Davi Uliana
Davi Uliana - 14.02.2023 19:40

Can this be incorporated with Scrapy?

Ответить
GitGość
GitGość - 06.09.2022 13:06

another great video as always

Ответить
A PRIORI PROGRAMMER
A PRIORI PROGRAMMER - 17.08.2022 20:17

DUDE, WHY ARE YOU SO SERIOUS?

Ответить
Creation Spam
Creation Spam - 18.06.2022 19:11

Where code

Ответить
Long Pham Thanh
Long Pham Thanh - 24.04.2022 07:20

Thank you John for this tutorial.
Could you suggest on how to approach getting the list of urls from the beginning?
Should I use another HTMLSession() to define a get_urls() function and return the list of urls? Or how should I use async/await with this case?
Again, thank you!

Ответить
Christian Hetmann
Christian Hetmann - 22.04.2022 23:40

Thanks a lot for your video. This was exactly what I was looking for.

Ответить
Daniel Baković
Daniel Baković - 03.04.2022 00:47

Thank you for the nice tutorial.

What if you have for example a dynamic list of URLs? The list could be updated by the scrape if it found a pagination on the target page. How would you create or manage tasks for something like that?

Ответить
Ervan Kurniawan
Ervan Kurniawan - 07.03.2022 07:05

i've used this async, but just 3 sec more faster than manual looping use requests

so, it depends on the website server too?

Ответить
Eduardo Coronado
Eduardo Coronado - 15.01.2022 00:36

Thank you for all your videos! Recently found your channel and wish I had seen it ages ago. These videos have been incredibly helpful for me. I would like to use this technique with a post request to sign in to a site, but I can't figure out where to place this to preserve the sign in throughtout the session. Any suggestions on where or how to place this?

Ответить
Hengky Ariputra
Hengky Ariputra - 27.12.2021 11:02

Can you make another video about async and request-html more detail. Really need it hahaha

Ответить
QIZIQARLI FAKTLAR
QIZIQARLI FAKTLAR - 27.11.2021 17:53

When webpage is dynamicly generated by javascript and we use render it stays so slow isn't? Or how to use render correctly that it will work faster/

Ответить
Almo
Almo - 19.10.2021 12:45

Hey John, great video once again. How can i use this async way using proxies and different sessions?

Ответить
RTX MAX
RTX MAX - 13.09.2021 10:10

Why did you not use session.render command after session.get(url) ? I am working on a project and my program runs fine without using threads or asyncio, but when I use threads or asyncio the render command doesnt works. Cna you help me understand how can I overcome this?

Ответить
Ritik Jain
Ritik Jain - 10.08.2021 22:32

Thanks a lot for the video :)

Ответить
IndirectThought
IndirectThought - 10.08.2021 02:19

How could we use arender() to render javascript html asynchronously??

Ответить
Cheerfulnag
Cheerfulnag - 30.07.2021 03:29

This code does not close sessions at the end and leaves a lot of open chromiums (not all, but with 980 links, for example, it went over a hundred) I hoped that here I will find a solution to this problem, but here the same thing. And by the way, this option using asyncio is not suitable if you need to render something(i mean if you use arender), because then the result does not produce at all, all the code simply ends at the moment leaving open chromiums. But your other videos are good anyway.

Ответить
Rick van Putten
Rick van Putten - 04.07.2021 20:29

Awesome and thanks for the amazing videos. Does this also works with Beautifulsoup?

Ответить
yopp
yopp - 23.06.2021 18:04

This was very smooth for experienced programmer. Time flew and the information stuck

Ответить
Hayat Skills
Hayat Skills - 26.05.2021 08:37

Hi Sir

it is showing this error on my side : RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Kindly help me

Thanks!

Ответить
KIRAN RAJ DHAKAL
KIRAN RAJ DHAKAL - 28.04.2021 10:43

I am stuck in a project ,
If you could help me out then please contact me

Ответить
erdem günal
erdem günal - 25.04.2021 15:50

is there a way to scrape data from cloudflare protection website

Ответить
Art Abra
Art Abra - 11.04.2021 22:33

i watch you past video its really help me the grequests module is this better than grequests

Ответить
5astelija
5astelija - 11.04.2021 17:37

Just wanted to say this is the best coding channel I have ever stumbled upon. Teaching through actual examples, explaining only the necessary parts, trusting that the viewer has an actual brain for themselves. Perfect.

Ответить
Arjun b
Arjun b - 11.04.2021 05:48

Thanks so much, can you please upload the code on git

Ответить
Muhammad Luay
Muhammad Luay - 10.04.2021 22:22

Thank you. Excellent quality as always

Ответить
TheKingbode
TheKingbode - 10.04.2021 05:45

Excellent as usual, Thanks a lot

Ответить
Linxx
Linxx - 08.04.2021 14:58

Would you suggest this method over grequests?

Ответить
Liu Zhihao
Liu Zhihao - 08.04.2021 01:30

Could you show an example of how to scrape a page that has lazyloaders? For example: Aliexpress. Thank you for all the free contents.

Ответить
Humayun butt
Humayun butt - 07.04.2021 23:28

Very useful to speed up things.👍💖

Ответить
Ryan Lynch
Ryan Lynch - 07.04.2021 22:18

Thanks John , always learn a thing or two from your videos.
Is there a high chance of an IP ban due to such a high volume of requests in such a small timeframe without mitigating with proxie rotation / headers etc?

Ответить
A
A - 07.04.2021 18:33

Hey you’re one of the best, do u have discord? I want to contact you about a business opportunity. Thanks

Ответить
Alex Tomas
Alex Tomas - 07.04.2021 18:20

John, thank you for your time on making these vids, appreciate your time and do know your time is worth it...

Ответить
bharath babu
bharath babu - 07.04.2021 17:35

How can I extract all links in a Javascript website using request-html?? Can I automate the web scraping in Javascript driven html website??

Ответить
bharath babu
bharath babu - 07.04.2021 17:33

Thanks a lot

Ответить