Комментарии:
finally a talk on Web Scraping! good to see you again wesbos and scott!
ОтветитьOk
ОтветитьLol I've been watching every episode since CJ joined and yet I'm not subscribed 😅
Time to change that
If someone scrapes for indexing and links to your site to consume it I am totally cool with it, but if someone scrapes to bypass the site I'm not.
ОтветитьAwesome! On the same line, I’d love an episode on reverse engineering scrambled or minified webapps 😏
ОтветитьI’ve just started scraping a few months ago and somehow managed to figure most of the tricks you talked about. I was moved by contempt for a service I paid $200 for in an annual subscription and when the subscription expired not only did they cut off the premium features, but also blurred out over 20k data points I had previously processed in their platform while I had a premium subscription. I got it all back in json using their internal API. I wouldn’t be able to do it without the help of ChatGPT.
ОтветитьIs there a course you recommend for this?
Ответитьlove this podcast and this episode since i’m also an scrape OG/ automation panda :) side question will the video format of the podcast ever pan into visual snapshots; when talking about something like when mention console then pan into a snapshot of that or if a website is mentioned than a print screen of that like wes did once during the this video; i know this will add in more work during editing but it would be extra coolness if it was included as a standard; thanks keep up the awesomeness 🎉👍;
ОтветитьWorking on a scraper rn.
ОтветитьAwesome! I was using puppeteer to scrape a site and converted it to pinging their api directly. So much faster and no random errors when a element fails to load. Where would you host your scraping scripts that run everyday, hour or minute? I used a package to run it as a service on windows.
ОтветитьI never thought I’d hear XPath mention on a podcast. It’s really too bad XML became a 4 letter word. There was actually some cool things you could do with it that you can’t do with JSON. It also having a DOM for one thing.
ОтветитьHow would you alert if something was available? I want instant, attention ambushing feedback if my scraper finds something.
If i run a cypress script in headless to check a site for tickets, say, and it found one, i want a desktop alert somehow. Browser alerts work if i run it manually, but if I schedule it on mac, then it runs in the background and i dont get any alerts.
Love you both from Sri Lanka...🇱🇰 ❤
ОтветитьHave you or anyone else extracted data from an interactive chart?
ОтветитьGreat sode fellas
Ответить@syntaxfm
can i propose a challenge???
So I have been trying to create an export tool, to basically backup whatsapp messages including all media like images, videos, voice messages, emojis, etc etc...
The backup tool would have the ability to backup messages that go back up to a certain date, like 6 months ago, OR all messages for a given chat... This seems to be impossible, or at least SUPER DIFFICULT to do... because everything in WhatsApp WebApp is being done via websockets...and is encrypted...
I was looking for ways to reverse engineer their hidden APIs, but i dont have they have any classic APIs, i think its all Websockets...
And the only way to really scrape the data is actually by using Puppeteer...or some other headless browser approach...
Would love to get some info from Wes and whoever else has any insight into this issue...
How does HasData handle scraping dynamic websites like those with a lot of JavaScript? Really curious about its efficiency compared to puppeteer.
ОтветитьI'm learning about web scraping and wonder if HasData can help with navigating protected routes. What do you guys use?
Ответить