Python Pandas Tutorial (Part 8): Grouping and Aggregating - Analyzing and Exploring Your Data

Python Pandas Tutorial (Part 8): Grouping and Aggregating - Analyzing and Exploring Your Data

Corey Schafer

4 года назад

412,163 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Stock Garjana Hindi
Stock Garjana Hindi - 27.11.2023 18:14

hi Corey, I am using df.median() method but it is showing me error saying:
TypeError: could not convert string to float: 'I am a developer by profession'
Does anyone has any idea about it?

Ответить
Prakhar Arora
Prakhar Arora - 25.11.2023 20:47

hey if ur df.median() doesn't work and ur getting typeerror and valueerror u can do df.median(numeric_only=True)

Ответить
Shanghai Newbison
Shanghai Newbison - 24.11.2023 11:10

One way to calculate the percentage of people know python in each country.
def know_python(string):
return 'python' in string.lower() if isinstance(string, str) else False

df["know_python"] = df["LanguageWorkedWith"].apply(know_python)
df.pivot_table(index="Country", values="know_python").sort_values(by="know_python", ascending=False).head(20)

Ответить
Thota Rohith
Thota Rohith - 14.11.2023 22:40

Best method for finding the percentage of people using python in each country:
filt=df['LanguageWorkedWith'].str.contains('Python',na=False)
python_count=df.loc[filt]['Country'].value_counts()
python_count.rename('p_c',inplace=True)
python_count

--
total_count=country_grp['Country'].value_counts()
total_count.rename('t_c',inplace=True)
total_count
--
result_horizontal = pd.concat([total_count, python_count], axis=1)
import numpy as np
result_horizontal.replace({'p_c':np.nan},0,inplace=True)
result_horizontal['perc']=(result_horizontal['p_c']/result_horizontal['t_c'])*100
result_horizontal

Ответить
Nishit Kekane
Nishit Kekane - 06.11.2023 20:45

for practice question,
filt=(df['Country']=='India')
df.loc[filt]['LanguageWorkedWith'].str.contains('Python').value_counts(normalize=True)*100

Ответить
Charith Silva
Charith Silva - 04.11.2023 08:54

Hey, quick question... When I run:
country_grp['ConvertedComp'].median() , I get 6222.0 for the Afganistan. But when I do,
another_filter = df['Country'] == 'Afghanistan'
df.loc[filt]['ConvertedComp'].median()

I get a different. Can you please explain why

Ответить
Turkson Michael
Turkson Michael - 02.11.2023 21:16

Thank you for this. Had clearer understanding of pandas than before. Wish you the very best

Ответить
Marvin Espejon
Marvin Espejon - 25.10.2023 17:55

How can I extract the number of person of top common education level of each country?

Ответить
Soheyl Moheb
Soheyl Moheb - 23.10.2023 22:13

more efficient way:

country_grp = df.groupby("Country")
print(country_grp["LanguageWorkedWith"].apply(lambda x: x.str.contains("Python").value_counts(normalize=True)).loc["Iran"])

Ответить
Abdulkadir Güven
Abdulkadir Güven - 22.10.2023 12:52

For the percentage of programmers who use python i did like this :
country_uses_python = country_grp['LanguageWorkedWith'].apply(lambda x : x.str.contains('Python').sum())
total_country_count = country_grp['Country'].agg('count')
(country_uses_python / total_country_count) * 100

Ответить
OM J
OM J - 22.10.2023 11:04

Thanks a lot for your teaching! Here is the my solution at the end of the video:
# group object['column'] is a Series object, so the input of the function is a Series, ana the output value of the function is a float
def percent_know_python_each_country(countrySeries):
num_know_python = countrySeries.str.contains('Python').sum()
num_all = len(countrySeries)
percent = round((num_know_python / num_all * 100), 2)
return percent

country_group['LanguageWorkedWith'].apply(percent_know_python_each_country).sort_values(ascending=False).head(30)

Ответить
Halo Madhosh
Halo Madhosh - 20.10.2023 17:37

My solution:

country_grp["LanguageWorkedWith"].apply(lambda x: x.str.contains("Python").value_counts(normalize=True)).loc["United States"].loc[True]

Alternative to apply method on the whole DF and round the percentage to two decimals:

print(round(country_grp["LanguageWorkedWith"].apply(lambda x: x.str.contains("Python").value_counts(normalize=True)).to_frame().loc[pd.IndexSlice[:,True],:]*100,2))

Ответить
Anthony Erdenetuguldur
Anthony Erdenetuguldur - 19.10.2023 04:36

Age and CodRevHhs has been approved as of the 2023 survey results.

Ответить
Rohit Vishwakarma
Rohit Vishwakarma - 03.10.2023 20:50

The 🐐

Ответить
Mohammed Bayan
Mohammed Bayan - 03.10.2023 05:12

country_group["LanguageHaveWorkedWith"].apply(lambda x:x.str.contains("Python").sum()*100/len(x))

Ответить
Frank Vega
Frank Vega - 29.09.2023 16:53

Mr. Schafer, I am so happy I found your teaching. I have been on a journey to become a data analyst, and after completing the Google Analytics Course , I realized that I needed to learn much more. I am currently finishing a Python Course through Coursera offered by IBM.

Not every professional, no matter how good they are, have the natural ability to teach. Your method and technique are so amazing and helped me to overcome some of the confusions I had with coding in Python. I learned so much from just this video alone.

I will definitely visit the site you referenced, and look forward to learning more from your videos.

Thank you so much!

Ответить
Guangsheng Li
Guangsheng Li - 26.09.2023 11:04

can below codes achieve the same result? Thanks
country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()/x.count())

Ответить
Oscar Sibanda
Oscar Sibanda - 21.09.2023 08:41

Bravo Corey, Bravo!!!

Ответить
Arty Gecko
Arty Gecko - 18.09.2023 20:06

If anyone's getting an error when using .median() on the whole data frame, add the numeric_only=True argument:

data_frame.median(numeric_only=True)

Ответить
Mehdi Ezzine
Mehdi Ezzine - 01.09.2023 00:31

hey wouldnt it be smarter to use count on the country_group['languageworkedwith'] object and use that as a our total population , because some people might of not answered that question and were counting them too with corey's method.

Ответить
hehe xd
hehe xd - 23.08.2023 02:23

Can replacing .sum with .value_counts(normalize=True) work or no? I'm working with a different dataset not involving strings but rather integers so im not sure how to test it without asking you.

Ответить
Finn Collins
Finn Collins - 05.08.2023 19:06

thanks so much for this series. started from the first video two weeks ago, now in the 8th. this series so far made a lot progress in me,. thanks so much, .May God Bless You. Love from Sri Lanka...

Ответить
Head of the Table
Head of the Table - 26.07.2023 09:55

solution to problem in 1 line:
group['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').sum())/group['LanguageHaveWorkedWith'].count()

Ответить
Mohamad Osama
Mohamad Osama - 17.07.2023 19:12

I have a question please: Would it be right if we said that "Grouping is for categorical data, while aggregating is for numerical ones?"
Thank you Corey, really don't know what to say 😊😊

Ответить
Farshid
Farshid - 20.06.2023 08:50

a handy solution for the question:
country_grp["LanguageWorkedWith"].apply(lambda x:x.str.contains("Python").sum())/df["Country"].value_counts()*100

Ответить
Rahul Gupta
Rahul Gupta - 31.05.2023 22:57

THANK YOU FOR GIVING US GOOD CONTENT

Ответить
Jeremine
Jeremine - 06.05.2023 14:05

Just want to share here my solution for the practice question (but with the survey of 2022):
---
country_group['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))
---
And also give thanks to your wonderful videos, Corey!
It's been 3 years and they're still among one of the best tutorials.

Ответить
kartikeya choudhary
kartikeya choudhary - 04.05.2023 19:03

This is how i did it:
country_grp['LanguageHaveWorkedWith'].apply(lambda x:x.str.contains('Python').sum()/x.value_counts(dropna=False).sum())
I have just used lambda function first to find person who knows python and also to find the total count of person it the country group
I have also used the parameter 'dropna= False ' inside the value_counts method so that it will include NA terms also.

Ответить
Taylor McCoy
Taylor McCoy - 26.04.2023 00:20

Here's a simple solution: country_grp['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').sum()/x.count()). My name for the col is different because I used the most recent dataset as opposed to the 2016. Pretty sure this is the most simple solution since the count function just gives use the total number of respondents in the survey and doesn't require you to alter that number any further. But I could be wrong.

Ответить
Joel Pearman
Joel Pearman - 09.04.2023 00:47

For his practice problem, did anyone lese just divide the result by country_grp.size()? Unfortunately, it works like his method where NaN values are treated as 0, rather than dropping them, but it's pretty easy. I still prefer Felipe Gomez's solution of adding /len(x) to the lambda function, though

country_grp['LanguageWorkedWith'].apply(lambda x : x.str.contains('Python').sum()) / country_grp.size()

Ответить
mohamed aboobacker
mohamed aboobacker - 02.04.2023 09:26

df_Python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
df_Python.value_counts(normalize=True)

Ответить
Ananth Arjun
Ananth Arjun - 25.03.2023 19:41

Heyyy coreyyy I got drops in my eyes after watching the way you taught....you made my day❣️✨love you so much corey

Ответить
Adison Fryman
Adison Fryman - 18.03.2023 03:11

If anyone else is having issues with this tutorial will they let me know. I've read so many comments and I'm not seeing anyone having issues. But I can't usually use :

python_df.sort_values(by ='PercentknowPy', ascending = False).head(50)
python_df

or anything else like it... after trying for hours to look for some typo or some cell not being run correctly, or a kernal issue...(I don't really like Jupyter Notebook right now) Chat GPT finally told me I should save it to the original dataframe if I want to see the changes.

like this:

python_df = python_df.sort_values(by ='PercentknowPy', ascending = False).head(50)
python_df

Bam there it is...but I just can't figure out why its working for Corey and note me. lol

Things like this keep happening in this tutorial , I just don't know if its because I am making some mistake I am unaware of, or if one of the many components had changed in a new release. I'd love someone's thoughts!!!! Thanks!

Ответить
Akshay Ghadi
Akshay Ghadi - 16.03.2023 23:28

Hello Corey Sir, I love your teaching a lot. You are the best. Thank you

Ответить
Rohit Singh Majila
Rohit Singh Majila - 15.03.2023 03:03

Amazing videos. Thanks a lot for creating such an amazing series.

Ответить
Brandi Booth
Brandi Booth - 13.03.2023 04:56

Thank you Corey!

Ответить
zeko Mo
zeko Mo - 06.03.2023 19:04

.value_counts(normalize=True))

Ответить
Craig Austin
Craig Austin - 06.03.2023 03:02

Brilliant!

Ответить
Vijay Chopra
Vijay Chopra - 03.03.2023 18:35

Awesome video. Well-explained and to the point!

Ответить
saad chaudhry
saad chaudhry - 24.02.2023 21:40

you teach really well, I am learning a lot...Thanks. I have also learned other topics from your videos....I was stuck on opening and reading a csv file in python, till I saw your video and learned it....I am an absolute beginner...😅...Thanks

Ответить
karthick poovarasan
karthick poovarasan - 13.02.2023 20:58

I've solved the practice question in a slightly different manner

No_of_respondent = df1.groupby(['country'])['country'].count()
No_knows_python = df1.groupby(['country'])['languageworkedwith'].apply(lambda x : (x.str.contains('Python').sum()))

Percent = (No_knows_python / No_of_respondent)*100

Explanation for not using sort method -> i use groupby in both the variable so output should be the same set of rows

Ответить
Aktanbek Aidarov
Aktanbek Aidarov - 07.02.2023 12:02

country_grp['LanguageWorkedWith'].apply(lambda x: (x.str.contains('Python').sum()/len(x)) * 100)

Ответить
Kumar Shivam
Kumar Shivam - 29.01.2023 15:47

Nothing can match clearity of your video.

Ответить
abraham alvarez
abraham alvarez - 23.01.2023 01:22

how do you save the new column that you created it in mint 41 in the original file?

Ответить
Zsolt Pal
Zsolt Pal - 19.01.2023 02:37

so many thanks for this!

Ответить