Slow Web Scraper? Try this with ASYNC and Requests-html

3 года назад

17,498 Просмотров

Комментарии:

Luca Lo Iacono - 20.09.2023 17:30

John thank you so much for your video, everything is explained in the most understandable way.
I tried using your teachings in scraping some html sites with excellent results.
However, I have big problems adding to the function code await response.arender() for scraping sites with data in javascript.
After running the script, I get no results because the script enters an infinite loop.
this is the complete function I use:

async def name (session,url):
response = await session.get(url)
await response.html.arender()
return response

When I use the code non-asynchronously and therefore not in a function, the code works fine:

session = AsyncHTMLSession()
response = await session.get(url)
await response.html.arender()

Can you please help me in fixing this problem?
Thanks a lot,
Luca

P.S.(Congratulations for your coding channel, absolutely the best!)

Ответить

Mihai Lazarescu - 27.06.2023 02:11

John Watson Rooney, can you explain, please, the method to render JavaScript using AsyncHTMLSession and asyncio?

Ответить

Davi Uliana - 14.02.2023 19:40

Can this be incorporated with Scrapy?

Ответить

GitGość - 06.09.2022 13:06

another great video as always

Ответить

A PRIORI PROGRAMMER - 17.08.2022 20:17

DUDE, WHY ARE YOU SO SERIOUS?

Ответить

Creation Spam - 18.06.2022 19:11

Where code

Ответить

Long Pham Thanh - 24.04.2022 07:20

Thank you John for this tutorial.
Could you suggest on how to approach getting the list of urls from the beginning?
Should I use another HTMLSession() to define a get_urls() function and return the list of urls? Or how should I use async/await with this case?
Again, thank you!

Ответить

Christian Hetmann - 22.04.2022 23:40

Thanks a lot for your video. This was exactly what I was looking for.

Ответить

Daniel Baković - 03.04.2022 00:47

Thank you for the nice tutorial.

What if you have for example a dynamic list of URLs? The list could be updated by the scrape if it found a pagination on the target page. How would you create or manage tasks for something like that?

Ответить

Ervan Kurniawan - 07.03.2022 07:05

i've used this async, but just 3 sec more faster than manual looping use requests

so, it depends on the website server too?

Ответить

Eduardo Coronado - 15.01.2022 00:36

Thank you for all your videos! Recently found your channel and wish I had seen it ages ago. These videos have been incredibly helpful for me. I would like to use this technique with a post request to sign in to a site, but I can't figure out where to place this to preserve the sign in throughtout the session. Any suggestions on where or how to place this?

Ответить

Hengky Ariputra - 27.12.2021 11:02

Can you make another video about async and request-html more detail. Really need it hahaha

Ответить

QIZIQARLI FAKTLAR - 27.11.2021 17:53

When webpage is dynamicly generated by javascript and we use render it stays so slow isn't? Or how to use render correctly that it will work faster/

Ответить

Almo - 19.10.2021 12:45

Hey John, great video once again. How can i use this async way using proxies and different sessions?

Ответить

RTX MAX - 13.09.2021 10:10

Why did you not use session.render command after session.get(url) ? I am working on a project and my program runs fine without using threads or asyncio, but when I use threads or asyncio the render command doesnt works. Cna you help me understand how can I overcome this?

Ответить

Ritik Jain - 10.08.2021 22:32

Thanks a lot for the video :)

Ответить

IndirectThought - 10.08.2021 02:19

How could we use arender() to render javascript html asynchronously??

Ответить

Cheerfulnag - 30.07.2021 03:29

This code does not close sessions at the end and leaves a lot of open chromiums (not all, but with 980 links, for example, it went over a hundred) I hoped that here I will find a solution to this problem, but here the same thing. And by the way, this option using asyncio is not suitable if you need to render something(i mean if you use arender), because then the result does not produce at all, all the code simply ends at the moment leaving open chromiums. But your other videos are good anyway.

Ответить

Rick van Putten - 04.07.2021 20:29

Awesome and thanks for the amazing videos. Does this also works with Beautifulsoup?

Ответить

yopp - 23.06.2021 18:04

This was very smooth for experienced programmer. Time flew and the information stuck

Ответить

Hayat Skills - 26.05.2021 08:37

Hi Sir

it is showing this error on my side : RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Kindly help me

Thanks!

Ответить

KIRAN RAJ DHAKAL - 28.04.2021 10:43

I am stuck in a project ,
If you could help me out then please contact me

Ответить

erdem günal - 25.04.2021 15:50

is there a way to scrape data from cloudflare protection website

Ответить

Art Abra - 11.04.2021 22:33

i watch you past video its really help me the grequests module is this better than grequests

Ответить

5astelija - 11.04.2021 17:37

Just wanted to say this is the best coding channel I have ever stumbled upon. Teaching through actual examples, explaining only the necessary parts, trusting that the viewer has an actual brain for themselves. Perfect.

Ответить

Arjun b - 11.04.2021 05:48

Thanks so much, can you please upload the code on git

Ответить

Muhammad Luay - 10.04.2021 22:22

Thank you. Excellent quality as always

Ответить

TheKingbode - 10.04.2021 05:45

Excellent as usual, Thanks a lot

Ответить

Linxx - 08.04.2021 14:58

Would you suggest this method over grequests?

Ответить

Liu Zhihao - 08.04.2021 01:30

Could you show an example of how to scrape a page that has lazyloaders? For example: Aliexpress. Thank you for all the free contents.

Ответить

Humayun butt - 07.04.2021 23:28

Very useful to speed up things.👍💖

Ответить

Ryan Lynch - 07.04.2021 22:18

Thanks John , always learn a thing or two from your videos.
Is there a high chance of an IP ban due to such a high volume of requests in such a small timeframe without mitigating with proxie rotation / headers etc?

Ответить