Combining multiple datasets - Data Analysis with Python and Pandas p.5

Combining multiple datasets - Data Analysis with Python and Pandas p.5

sentdex

5 лет назад

56,716 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@mihaisabadac9631
@mihaisabadac9631 - 03.03.2019 21:24

Great intro to pandas, thanks a lot! Now, I have a little problem, if anyone have some advice what to look after: At the merge step I get an error that at some point says: KeyError: 'County'. I looked in the structure of both dataframes and they have this column...and all until this point is working well. I googled and got some discussions but not something to fix it. Thanks

Ответить
@rchuso
@rchuso - 03.03.2019 22:42

Will you also be showing how to train ConvLSTM2D models? I have such a need and haven't come across good instructions yet.

Ответить
@alinabeliakova2220
@alinabeliakova2220 - 04.03.2019 14:15

Hi Harrison, thank you so much for your tutorials. I noticed that there is someone from Istanbul plagiarizing your content. I don't know if you already know that and if it bothers you but they LITERALLY copied your course ML tutorial with Python, I think you can find them easily though google search if you wish to claim your rights. All the best, gonna proceed watching your videos now :)

Ответить
@user-no2mv1zv9r
@user-no2mv1zv9r - 05.03.2019 09:06

woah, in the beginning, he did not import numpy and it still worked. hehhh????

Ответить
@user-no2mv1zv9r
@user-no2mv1zv9r - 05.03.2019 09:11

well, after the numpy incident, I'm starting to believe that sentdex is a god from above, I mean, he gives us riches(information about programming), he helps us with our problems, and, he seems to have some mythical power over technology.

Ответить
@user-no2mv1zv9r
@user-no2mv1zv9r - 05.03.2019 09:12

I'm convinced

Ответить
@Jabranalibabry
@Jabranalibabry - 06.03.2019 12:12

Dude, ur my digital master! Much learning, i do.

Ответить
@yousufazad6914
@yousufazad6914 - 08.03.2019 20:47

eh! another mug!

Ответить
@yueqian4700
@yueqian4700 - 12.03.2019 05:24

This is certainly very helpful! Can you make a video on efficient big dataset process, like 100M plus?

Ответить
@ujjwalkumar9590
@ujjwalkumar9590 - 24.03.2019 22:20

really sir your video is great, it help me a lot in my studies

Ответить
@maryamehsani7867
@maryamehsani7867 - 26.03.2019 04:54

@sentdex Instead of:
act_min_wage=act_min_wage.replace(0,np.NaN).dropna(axis=1)
Try:
act_min_wage=act_min_wage.loc[:,(act_min_wage.mean()!=0)]
Then you'll have data for get_min_wage(2015,'Texas'), get_min_wage(2015,'Flordia')!... and bunch of other states you zeroed for the year 2015!

Ответить
@Jakob6174
@Jakob6174 - 31.05.2019 21:48

Fantastic topic, I was actually interested to see the results as well as to learn about pandas

Ответить
@adempc
@adempc - 18.06.2019 21:58

Muad'Dib no longer needs the weirding module I see... Pandas works for him without import!


Soon he will call the great worm and lead us to the spice.

Ответить
@adempc
@adempc - 19.06.2019 04:47

For some reason I kept getting:


KeyError: 'County'


from:


for df in [county_2015, pres16]:
df.set_index(["County", "State"], inplace = True)


Couldn't figure out why... but a great video, as usual.

Ответить
@wiktor_kubis
@wiktor_kubis - 25.06.2019 02:56

If you have merge problem use 'right_index = True', 'left_index = True' instead of 'on'. And since pandas data type 'Object' indicates mixed types, you need to convert pres16['County'] to string.
One way of doing this: "pres16['County'] = pres16['County'].astype(str)" before you change "State" and "County" to indices.

Ответить
@shyjoshi7158
@shyjoshi7158 - 30.07.2019 05:32

The reason Mississippi has a bunch of NaNs is because the min_wage data does not have Mississippi as an index, they left that state out for some reason. min_wage.columns.unique() you will find Mississippi does not exist.

Ответить
@jaykabra2587
@jaykabra2587 - 03.11.2019 17:12

At what point of time is that state_abbv df created and saved. I can't find it anywhere. Also from where is it saved? Niether i am able to create such df using the given data nor i am able to find it anywhere.

Ответить
@oliviero1756
@oliviero1756 - 20.11.2019 01:32

IMPORT!!!

Ответить
@gracemalcom7358
@gracemalcom7358 - 19.12.2019 23:41

the donut cup has always been one of my favorites

Ответить
@shadid_io
@shadid_io - 28.01.2020 07:31

what: inplace = true stands for ?

Ответить
@monatamsi1429
@monatamsi1429 - 13.03.2020 07:50

I love your sense of humor. You really love what you are doing and really appreciate your efforts mate! Thanks

Ответить
@vipul8990
@vipul8990 - 26.03.2020 20:33

FileNotFoundError: File b'datasets/state_abbv.csv' does not exist


I am getting this error. Please can anyone help me out.

Ответить
@rohanaggarwal8718
@rohanaggarwal8718 - 30.04.2020 04:21

You did a great job Harrison I am having some trouble following along though I have gone through your basics a few times and have learnt them from other places too, any tips?

Ответить
@daniellewagner7070
@daniellewagner7070 - 01.05.2020 10:41

if anyone recently had trouble importing the second dataset, try encoding the file:
df = pd.read_csv("us-minimum-wage-by-state-from-1968-to-2017\Minimum Wage Data.csv", encoding= 'unicode_escape')

Ответить
@sery152
@sery152 - 01.05.2020 16:35

I get a: "KeyError: 'County'", in the for loop.

Someone knows how to fix it?

Ответить
@Luke7389
@Luke7389 - 03.05.2020 17:19

Would have made sense to replace the Rate with the average rate, grouped by State, then keep the unique values for State and then do the mapping of the new column? to speed up things

Ответить
@MoniqNansy
@MoniqNansy - 29.06.2020 06:48

cool coffee cup... i am smiling watching this coz it is gonna save my day!

Ответить
@segungtp
@segungtp - 14.07.2020 18:25

Your tutorials are really fantastic!!
and you have a wonderful mug collection

Ответить
@anshshrivastava9107
@anshshrivastava9107 - 07.09.2020 16:31

IMPORT

Ответить
@KevinTempelx
@KevinTempelx - 26.12.2020 08:46

Thank you!

Ответить
@Denverse
@Denverse - 10.04.2021 06:48

My  M1 MBP took 33.2 seconds!

Ответить
@nelohenriq
@nelohenriq - 25.08.2021 21:10

Having an issue at the very beggining with the ["Low.2018"] index

There's no column named like that on the dataset, nothing found to rename
"None of [Index(["Low.2018"], dtype='object')] are in the [columns]"

Can you help?

Ответить
@mikederp9612
@mikederp9612 - 02.11.2021 02:06

Miss him my pres

Ответить
@extremenoiseterron
@extremenoiseterron - 19.07.2022 16:40

As of today, the "result" file is a json. I can't seem to find a way to convert it to a csv. Do you have any idea? did anyone have the same problem?

Ответить
@Boringpenguin
@Boringpenguin - 20.07.2022 02:21

To merge unemp_county and get_min_wage, I think the fatest method would be to use pd.merge.
The key idea is to first convert get_min_wage back to the long format with "Year" and "State" as (multi-)indices, then we can directly use them as the merge key (by setting right_index=True in pd.merge).


Something like

pd.merge(
unemp_county,
get_min_wage.stack().to_frame(),
how='left',
left_on=["Year", "State"],
right_index=True
)

would work.

Ответить
@mustafaaldabbas711
@mustafaaldabbas711 - 20.07.2024 12:03

i dont understand why you said we'll merge votes with unemployment rate then you merged it with minimum wage???? where did you get the set from

Ответить