Python 3 Programming Tutorial - Parsing Websites with re and urllib

Python 3 Programming Tutorial - Parsing Websites with re and urllib

sentdex

9 лет назад

196,800 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Jeff Rojas
Jeff Rojas - 27.12.2021 05:28

Hello Sentdex, can you give a hints where i shold start to search. this is my story. I have a project in the project user can post image or text. now other user like the post and wanted to share the post to her own or him wall just like facebook. please and thank you,,

Ответить
MangDalin
MangDalin - 09.02.2021 12:49

wow 1M sub

Ответить
Dtomper
Dtomper - 09.11.2020 15:22

Thank you

Ответить
Sam Mraz
Sam Mraz - 28.07.2020 15:00

thanks god

Ответить
Yi Shao
Yi Shao - 27.05.2020 21:25

Amazing. Love you. You make parsing so easy to understand.

Ответить
RAHIM ZAHI
RAHIM ZAHI - 18.05.2020 02:53

Thank you bruh ❤

Ответить
mister tech
mister tech - 21.03.2020 14:46

Great Bruh

Ответить
Lokesh Bhirud
Lokesh Bhirud - 06.03.2020 16:25

how to replace spaces using symbol in python using regular expressions

Ответить
Patrick
Patrick - 15.02.2020 10:07

I tried this script on a different URL and got a Forbidden 403 error...do some websites block parsing via script?

Ответить
FrenchyFred
FrenchyFred - 23.11.2019 06:13

@sentdex Hi there!! Thanx for your great tutorial! I'm a newbie on python and programming in general and I have a problem right now that's kinda like what you show here. I've extracted a table from a website (using the api) and the results come in text (csv). I get around 20 different statistics (it's sports-related) and I only need 3 of them. So I would like to eliminate all the data that I don't need and just get those 3. Would you recommend the same Library modules (re and urllib) or another module to do that? As I said, it looks to be the same kinda thing you're showing here, the difference being that I need to basically remove stats instead of text when I scrape it and just get the one I need. Thanx again for your great tutorials!!

Ответить
Hasti Bozorgi
Hasti Bozorgi - 02.09.2019 16:50

Hi,
Thanks for these series of tutorials.l am new in this field and need help.I'm trying to write a code for scraping several pages of web and don't know how should I start?
I tried several times but hadn't true run☹I hope u help me🙏

Ответить
Suhas NM
Suhas NM - 15.07.2019 13:16

how to save that file you have extracted ?

Ответить
Logomonic Learning
Logomonic Learning - 10.07.2019 22:15

how do i get the full playlist, it's not in the user's profile. infact it is a totally different person but I want this guy!

Ответить
Chengyao Zheng
Chengyao Zheng - 20.06.2019 09:39

import re
what was re tho? I'm trying to recall this part now and I can't remember what it is.

Ответить
Hama Hawlery
Hama Hawlery - 11.05.2019 02:17

it does not print any thing in terminal i maybe know because "eachp"

Ответить
Yavor Daskaloff
Yavor Daskaloff - 12.02.2019 12:46

data = urllib.parse.urlencode(values)
data = data.encode('utf-8')

These two lines. You assign different values to the same variable. How does that work?

Ответить
A Jim Fan
A Jim Fan - 07.02.2019 04:37

So how does regex code work exactly? Is it one after the other? Would '.*?' yield different results than '*.?' or '?*.'?

Ответить
Alex Lasareishvili
Alex Lasareishvili - 12.01.2019 21:54

Thanks for your video.
I have one question.. instead of specifying the sample URL in the code, would it be possible to make it via input?
what I mean is, I work with web based tools that contain same data fields with different values of course.. like support tickets lets say.
I want script where I can paste my ticket URL and then to be parsed for specific fields like ticket number, customer name, etc and populate the excel table with the parsed data
I have lot of tickets to deal with sometimes and opening all the URLs in separate tabs is just not an option so I'm trying to consolidate everything in excel file (for now) to quickly see which ticket is in what state, when they are scheduled, etc.

Ответить
Richard C
Richard C - 27.11.2018 05:32

How would i do this in django?

Ответить
HoodedWarrior
HoodedWarrior - 16.09.2018 09:57

It may work on p but for scraping useful stuff like links it gets tricky especially if you wanna get the href and also the value inside the tags.
I did use a library for that before but now I wanna try without.
EDIT: nvm, doing a second findall on the result of the first for further filtering does it. Also you could use those url results to traverse through all the results and filter those as well... hmm
Thanks, good tutorial.

Ответить
Nitesh Jaiswal
Nitesh Jaiswal - 08.09.2018 15:10

please process json data using urllib and string slicing

Ответить
SoldierGaming
SoldierGaming - 13.08.2018 01:38

content = []
paragraphs = re.findall(r'<p>(.*?)</p>', str(respData))
for eachP in str(paragraphs):
content.append(eachP)
sentence = ''.join(content)

* This just cleans the output a little more so you are not reading in like a downwards fassion

Ответить
Vijay Suresh
Vijay Suresh - 06.08.2018 21:55

Thanks for the vid. Can anyone help me on how to send username and password to handle an authentication popup to automate it in chrome?

Ответить
Nikunj Parmar
Nikunj Parmar - 25.07.2018 21:25

You are awesome!

Ответить
Void Beats
Void Beats - 18.07.2018 00:46

@sentdex

values = {'s': 'basics',
'submit': 'search'}

I have tried to put some other links but It does not work, it only works with the link that you posted

Ответить
James Jemima
James Jemima - 07.07.2018 01:55

Instead of importing urllib.request and urllib.parse individually, is it possible to just import urllib as a whole library?
In the same respect, since in the last vid you said you mostly only use re.findall() , can we just import re.findall instead of the whole re library module?

Ответить
Josh Thomas
Josh Thomas - 05.07.2018 23:53

This is AWESOME! Thanks a lot!

Ответить
Andreas Papadakis
Andreas Papadakis - 04.07.2018 20:29

Hi, great video!

I just have a question, when you do this it doesn't save the webpage as "Complete" but rather as "HTML, only". Is there a way to do Complete using urllib?

Ответить
Rohan Naidu
Rohan Naidu - 24.06.2018 11:02

how can you do this with "google" i am not able to achieve this with google. it's just blank after execution.
but i'm curious to read the para data or any normal English data in the Html source code of google.

Ответить
Hoora RM
Hoora RM - 05.05.2018 23:35

Hi and thank you for the great tutorial.
I have extracted my paragraphs as you said but inside the <p> tags there are so many <a href=".....">some stuff in between</a> !!
I want to some how delete the <a href="...."> junks as well . I don't know how in your work u didn't face them :D
let me know if you have any comment on this.
thanks in advance for all the great videos you have uploaded for everyone. ! :)

Ответить
masteraravind
masteraravind - 25.03.2018 22:44

Could you explain how to parse HTML data which has two columns and have to go via login authtentication system

Ответить
Problem
Problem - 24.12.2017 18:47

Help me plz when i run the program it gives me that error AttributeError: module 'urllib' has no attribute 'encode'

Ответить
Walker Ward
Walker Ward - 21.12.2017 08:14

Awesome videos! Keep it up

Ответить
Jagmohan Yadav
Jagmohan Yadav - 15.12.2017 18:27

recommendable contribution, appreciate your effort to teach others

Ответить
Fernando Pinheiro
Fernando Pinheiro - 29.11.2017 22:27

thanks!!!!

Ответить
whistler6318
whistler6318 - 26.11.2017 21:11

Thank you for taking the time to make these videos... You are a great teacher

Ответить
Finn Buhse
Finn Buhse - 04.11.2017 20:33

very good but how to integrate the fake id info so you can get into google with this?

Ответить
bharath9190
bharath9190 - 26.08.2017 13:42

Usually all give intoduction on single page website, what about the website which had 100 pages in it?? Try to make tutorial on it!!!

Ответить
We Rate Bikes
We Rate Bikes - 25.08.2017 06:12

what happens of there's no closing (</p>) tag on the page?

Ответить
Jake Ambrose
Jake Ambrose - 19.08.2017 22:27

been watching entire series. no clue whats going on lol. hope i can make my own tutorials one day

Ответить
Mahfuz Shahin
Mahfuz Shahin - 05.08.2017 21:58

supper boss

Ответить
JAIDEEP BOMMIDI
JAIDEEP BOMMIDI - 30.07.2017 19:26

Hi,

Great vedio. Wonderful explanation.

I have a small doubt.

I have to copy the website url which is currently opened in a browser using a python code instead of manually copy pasting the URL.

And assign it to the URL variable.

And use the code which is given by you in this vedio.

Please help me with the code to copy the URL using the python code.

Regards,
Jaideep.

Ответить
pulkit gupta
pulkit gupta - 24.07.2017 13:03

make detailed lectures on url

Ответить