Dynamic Javascript Scraping - Web scraping with Beautiful Soup 4 p.4

Dynamic Javascript Scraping - Web scraping with Beautiful Soup 4 p.4

sentdex

7 лет назад

161,375 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@choudhurysudip666
@choudhurysudip666 - 10.10.2018 21:01

Hey guys.. please read the problem here: I use usually Selenium to scrape data, but now I'm facing a website that identifies Selenium and blocks its JavaScript functionality so as to not reveal the data I need. Like for the first 10 times, it gives the proper data, then it just blacklists any approach with Selenium and gives no proper response.
Hence, I used the BS4 module and the approach discussed here (with PyQt5 though), and the Website worked only ONCE! And then again it just gives the 'source' HTML without any dynamic data. How is it possible??? Do websites recognize PyQt calls etc.???
What do I do?? Please help guys (especially sentdex if you are still getting this!)

Ответить
@atulanand2118
@atulanand2118 - 18.10.2018 23:48

Hi Sentdex, thanks for great explanation, but I am not able to import PyQt4.I tried in both OS: Windows as well as Linux. It seems now PyQt5 is also available. I am able to install these but I am not able to import the same.

Request you to please create a lecture video to install and import PyQt4

Ответить
@KhalilYasser
@KhalilYasser - 05.11.2018 09:11

Thanks a lot. I have encountered this error (I am using pycharm)
ModuleNotFoundError: No module named 'PyQt5.QtWebEngineWidgets'
Any ideas?

Ответить
@satishpatil115
@satishpatil115 - 05.11.2018 19:40

Works fine with PyQt5, Thanks for tutor

Ответить
@huongluu2632
@huongluu2632 - 15.11.2018 11:11

Hi there, I want to get all URLs from a domain, but I don't know how to do.... can you suggest me something? Thanks for reading!!!

Ответить
@sangitasable6919
@sangitasable6919 - 21.12.2018 07:45

I have seen your all videos. Sir I wanted to identify the computer subject sites only. I wanted to build such tool which can recognise only computer subject sites.

Ответить
@alexhernandez8550
@alexhernandez8550 - 23.12.2018 05:08

pip install PyQt5 for windows

Ответить
@HarshPatel-ly3dh
@HarshPatel-ly3dh - 28.12.2018 08:58

its WOW... i spent whole lot of time scraping dynamic content but couldn't. This was a very good idea.

Ответить
@diogooliveira8046
@diogooliveira8046 - 07.01.2019 02:13

How do you get this code to run within a loop ?
I've managed to make it work for a single url, but once I put the bottom portion in a for loop, I only get it to run 2 times and then the python Shell restarts. Anyone ?

Ответить
@7208044878
@7208044878 - 02.02.2019 06:31

I was banging my head with all those headless browser methods to run javascript. This is so much simpler. Thanks man! appreciated!

Ответить
@SiliconAddictTV
@SiliconAddictTV - 08.03.2019 02:47

This is great, however in my situation, the website is adding content every 1 minute, how do I loop and reload just the page without reloading the PyQt client every loop?

Ответить
@atineshs
@atineshs - 25.03.2019 14:39

How about Selenium

Ответить
@HustonPetty94
@HustonPetty94 - 26.03.2019 23:49

Any one try this method but using a for loop to do the same work on multiple pages? It works on the first loop and then gets stuck on the "self.loadFinished.connect(self.on_page_load)" part. I have tried everything to get it to work but no luck.

Ответить
@shreyaraj7brollno.286
@shreyaraj7brollno.286 - 09.05.2019 16:14

Seriously what's the point of even using QApplication ! Can I remove that >_>

Ответить
@shreyaraj7brollno.286
@shreyaraj7brollno.286 - 09.05.2019 16:20

Wait WTF we can do multiprocessing in python damn I didn't knew thanks for telling bro !!!

Ответить
@tuobraun
@tuobraun - 19.05.2019 14:07

I installed PyQt5 for Python 3.7 (x64) but getting this error in VS Code: that "No module named 'PyQt5.QtWebKit'". Could you please suggest any solution?

Ответить
@shyambutani8618
@shyambutani8618 - 24.05.2019 21:05

You are GOD.. thank you

Ответить
@abhishekkwatra1426
@abhishekkwatra1426 - 30.05.2019 18:16

I've installed pyqt5 and these statements aren't working for me:


from PyQt5.QtWebKitWidgets import QWebPage
from PyQt5.QtWebKitWidgets import QWebView
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl


is there any solution to it?

Ответить
@shyambutani8618
@shyambutani8618 - 09.06.2019 15:27

Hey sentdex.. please help me.
In my case the html is generated dynamically through ajax call. With this code, I am not able to scrape the required data. Is there any way through which I can wait till the ajax call is made? i have tried qWait but it did not worked.

Ответить
@HassanMalikTW
@HassanMalikTW - 13.06.2019 12:49

i think it's me, didn't understand a word. Selenium is much better than this..

Ответить
@dataaholic
@dataaholic - 11.08.2019 10:48

Is it possible to scrape the pinned location from an Embedded google map which loads all its data using the jaavscript?
The problem is that the location and data that i want to fetch is only load up for that location when we click on the particular location

Thanks in advance

Ответить
@datahat642
@datahat642 - 30.08.2019 13:31

Some of the words are actually hard to listen. It seems like your sound intensity gradually decreases before completing a sentence. Nonetheless your videos are really helpful.

Ответить
@datahat642
@datahat642 - 30.08.2019 13:41

Could you please explain the reason for using PyQT and not anything else here? Also state the alternatives. Thank You.

Ответить
@elahehosseini3933
@elahehosseini3933 - 31.08.2019 14:56

You can't imagine how your tutorials are useful to me. I'm really thankful and hope to you continue making videos like this

Ответить
@fredericjuge9762
@fredericjuge9762 - 07.11.2019 16:24

How can I get the Source Code showed in this video ? It could be faster than retype all :) Thanks

Ответить
@lakshyanegi668
@lakshyanegi668 - 17.11.2019 09:49

How do I scrape content of pseudo elements like ::before and ::after?

Ответить
@Roottech25
@Roottech25 - 08.12.2019 20:36

Why not use selenium

Ответить
@Victor_Marius
@Victor_Marius - 13.01.2020 20:00

I've done something similar yesterday with PyQt5. I've combined html, javascript and python into one app (and some css goodies)

Ответить
@EndersupremE
@EndersupremE - 21.03.2020 07:31

I was just searching for a problem with this and BAM, u have an entire series on webscraping. I think its the 5th time this happens. Just sayin realy appreciate ur channel.

Ответить
@GlennMascarenhas
@GlennMascarenhas - 30.04.2020 13:24

Selenium seems like a better option for scraping dynamic webpages

Ответить
@computinghub9550
@computinghub9550 - 24.05.2020 17:12

It's better to use selenium webdriver (headless) instead of using PyQt to run JavaScript...

Ответить
@PKrishnamaNaidu
@PKrishnamaNaidu - 22.08.2020 16:46

Hi, I have been working a lot lately on web scraping tasks and I was using selenium as it required interaction with the web page. My question is there a generic or more common way to extract any web page content instead of navigating and identifying tags which has required information. If not, why?
Also looking for how to control sending multiple requests to a server at a time while trying to fetch the data so that it would not stop taking my requests.

Ответить
@Londonwebfactory
@Londonwebfactory - 27.08.2020 12:41

Great Tutorial Chum! Many thanks.

Ответить
@minurapunchihewa4592
@minurapunchihewa4592 - 06.09.2020 22:59

I tried the PyQt5 equivalent to this, but I am not getting the expected results. The dynamic content still cannot be extracted. Any suggestions?

Ответить
@AD-qg9jd
@AD-qg9jd - 09.09.2020 21:35

Using pyqt5 im getting Unresolved reference 'Client', i have the same code as the tutorial

Ответить
@noelcovarrubias7490
@noelcovarrubias7490 - 22.11.2020 10:52

Could you please make an update video of this? PyQt has had a few updates or there is other modules to use. I'm trying to do it using selenium because I feel like it is the best for what I want but I just can't pass the "verify your identity" bs since webdriver doesn't take headers, and I haven't found a different way to do it. Thank you!!!

Ответить
@OBPagan
@OBPagan - 09.04.2021 04:45

in 2021 I am unable to install PyQt4 on the latest version of Python 3.9. I use PyCharm under Windows 10 and just can't figure out how to get it to install. Any ideas would be greatly appreciated.

Ответить
@shelaraarti6082
@shelaraarti6082 - 19.04.2021 14:07

How to resolve content security error ,
I'm scrapping LinkedIn page

Ответить
@CHAMP_GUY
@CHAMP_GUY - 06.06.2021 07:05

Spyder is not launching after installing PyQt

Ответить
@theglobalconflict6904
@theglobalconflict6904 - 30.09.2021 19:49

but, this is'nt working with pyqt5 and I'm unable to install pyqt4. What's the solution???

Ответить
@dieuhuyen0812
@dieuhuyen0812 - 21.05.2022 10:55

Why can't you just parse the script tag instead of the p tag?

Ответить
@sajjadhossan7972
@sajjadhossan7972 - 29.08.2022 13:30

If it is possible I would like give this video thousands of likes

Ответить