Комментарии:
actually if you'd like to take things up a step - adding in categories into your videos would speed this up.
Ответитьselectolax wants me to install cython! It's not even python. That's already more. At least give us a warning lol.
Ответитьwhy can't we have something where we can just import a pack, type in the websource to pull from, and just type in what we want to pull, and where it goes? Where's the template for that? Why so much extra stuff?
Ответитьbeautifulsoup really is clunky, because it doesn't auto-turn web script automatically into html. Having something already do that really helps!
Ответитьtruth be told - we really just need an interface - where people can click on the places on a website that would need to be copied, along with the direction of the copies and let it roll. Why code it when you can just get it to run?
ОтветитьI'm not convinced these are ideal, but do agree that beautifulsoup is honestly way too clunky for what it needs to do.
Ответитьit's weird that it's working off of css - but then again - having the option of having html + css is really helpful.
ОтветитьHello sir,
When I try to code same as above, its throwing an error<built-in method text of selectolax.parser.Node object at 0×000001E78E494A40>
Please help how to rectify this error.
Thanks
Are you in Seattle? Seattle fan? just noticed your shirt.
ОтветитьUpdates on selectolax? How's it goin for you?
ОтветитьGreat video would be cool to see one on inspecting request/response headers without selenium
ОтветитьYour videos are terrific at encouraging me to try new things, but latency isn't a problem. I've never been successful converting your scripts to run on "real" websites without getting blocked for life, even when adding a time.sleep(60) after each pull. I think the html-world just doesn't like me. &^) That said, I haven't found a good example of using selectolax to parse tables. Gonna take another look through your videos. Also, I see selectolax has modest and lexbor engines. Wonder what the pros and cons?
Ответитьbs4 has built-in support for CSS selectors using soup.select() or soup.select_one()
ОтветитьPersonally I don't mind beautifulSoup latency, it's serve as requests delay. If the parsing takes some time it's good specially if I have a loop to make multiple requests to the same website. Nice video of course 👍
Edit: I forgot to mention pofiling: Python has cProfile and pstats libs to profile and display nicely time consumed by funcs and io, it may help you compare these new librairies, instead of comparing syntaxe only. From what I've tested so far, requests connection take some time (> 10s often) so in my understanding it's the requests library which take time not parsing :) hope this helps.
Your videos are super helpful and you're also a handsome man :)
ОтветитьJust import pandas and do a pd.read_html
ОтветитьWhat is scraping used for in the industry? Most of the scraping video's I have seen focus on "home projects".
Selectolax looks cool tho!
Hey John, which parser is quickest? I've been using Python 'Requests' library with the 'regex' library. Anything faster than this?
ОтветитьGoing to give this a try on a new script I'm writing for a client today! Thanks for everything you do 🙏🙏🙏
ОтветитьIs selectolax faster than scrapy?
ОтветитьAs someone learning to code, your videos are a godsend. Keep up the great work. You're helping a lot of amateurs get their footing.
ОтветитьCan we use it for dynamic websites?
ОтветитьAwesome. I'm going to build a best chili dog scraper.
ОтветитьThanks! that was so koool. little correction on line 14 in selectolax .py file you need to add "( )" to ".text" in order to call the method properly
Ответитьcant use selectolax to scrape items based on div styles attributes like i can on beautifulsoup, unfortunate
ОтветитьI'm waiting for tool with regular expressions inside (many sites creating dynamic classes) and I don't like the way of solutions.
<div class = dhdhdh_rddhuud_hello_text>
Hello </div>
....
....
....
<div class = dhdhdh_3773_7372fb_hello_text>
World </div>
Wanna do some parser.find(p.*hello_text)
Can you do a LinkedIn company scraper video?
ОтветитьPlease I need an aliexpress web scraping tutorial
ОтветитьUPDATE: I tried the selectolax and its really fast.... about 20x+
Ответитьhi sir, months ago i meet one web scraping project can only use xpath selector to get the exact element, which library should i use can go as nearly fast as the seletolax?
ОтветитьPlease upload a video about how to solve a form based captcha.
ОтветитьAwesome! I'll try !
I really like BeautifulSoup because I can find elements in html using combinations, for example:
class + attributes
regex on attribute value
I confess that I'm still not that good at finding elements by the css selector
do you have any content about it? :D
Usually skip over sponsors but this is actually interesting 🧐 will check it out indeed.
ОтветитьJohn, great video, I would like to know your thoughts on a few things. First, how would you approach crawling a website using GraphQL and requires scrolling down on a webpage to get more data? Is it possible to to retrieve this data without using a huge library like Playwright or Selenium to crawl it? Can we still get the data we want with our authentication cookies?
ОтветитьGreat if speed is key to scale as you say.
ОтветитьWhat abaout request-html ? It does supports css and xpath.
ОтветитьPerfect timing. I’m going to create my own headline news scraper and this is perfect. Thank you!
ОтветитьYou said "pure css selector(s)" multiple times in this video, I may have missed where you explain it, but what do you mean by "pure css selector"?
Selectolax does look pretty clean, for now don't really care about scalability, but as long as it's as readable (if not more readable than BS4), definitely looks like something I wanna give a go next time I need to do some html parsing. Thanks for introducing this!
Beautifulsoup do have css selector
soup.select_one("h1.className")
thanks for introducing this to us, john!
ОтветитьNice! We are about to redesign our crawlers and I was starting to review parsers.
ОтветитьGosh. It is really FAST.
Ответитьcan you make a series about neo vim configuration for webscraping? ;) - thanks for another great material!
ОтветитьThanks for the introduction of the new parsing Library it is really worth a shot
I was using scrapy for everything 😅
So cool! Will experiment with it one day 🤌🏽
Ответить