Комментарии:
hi Corey, I am using df.median() method but it is showing me error saying:
TypeError: could not convert string to float: 'I am a developer by profession'
Does anyone has any idea about it?
hey if ur df.median() doesn't work and ur getting typeerror and valueerror u can do df.median(numeric_only=True)
ОтветитьOne way to calculate the percentage of people know python in each country.
def know_python(string):
return 'python' in string.lower() if isinstance(string, str) else False
df["know_python"] = df["LanguageWorkedWith"].apply(know_python)
df.pivot_table(index="Country", values="know_python").sort_values(by="know_python", ascending=False).head(20)
Best method for finding the percentage of people using python in each country:
filt=df['LanguageWorkedWith'].str.contains('Python',na=False)
python_count=df.loc[filt]['Country'].value_counts()
python_count.rename('p_c',inplace=True)
python_count
--
total_count=country_grp['Country'].value_counts()
total_count.rename('t_c',inplace=True)
total_count
--
result_horizontal = pd.concat([total_count, python_count], axis=1)
import numpy as np
result_horizontal.replace({'p_c':np.nan},0,inplace=True)
result_horizontal['perc']=(result_horizontal['p_c']/result_horizontal['t_c'])*100
result_horizontal
for practice question,
filt=(df['Country']=='India')
df.loc[filt]['LanguageWorkedWith'].str.contains('Python').value_counts(normalize=True)*100
Hey, quick question... When I run:
country_grp['ConvertedComp'].median() , I get 6222.0 for the Afganistan. But when I do,
another_filter = df['Country'] == 'Afghanistan'
df.loc[filt]['ConvertedComp'].median()
I get a different. Can you please explain why
Thank you for this. Had clearer understanding of pandas than before. Wish you the very best
ОтветитьHow can I extract the number of person of top common education level of each country?
Ответитьmore efficient way:
country_grp = df.groupby("Country")
print(country_grp["LanguageWorkedWith"].apply(lambda x: x.str.contains("Python").value_counts(normalize=True)).loc["Iran"])
For the percentage of programmers who use python i did like this :
country_uses_python = country_grp['LanguageWorkedWith'].apply(lambda x : x.str.contains('Python').sum())
total_country_count = country_grp['Country'].agg('count')
(country_uses_python / total_country_count) * 100
Thanks a lot for your teaching! Here is the my solution at the end of the video:
# group object['column'] is a Series object, so the input of the function is a Series, ana the output value of the function is a float
def percent_know_python_each_country(countrySeries):
num_know_python = countrySeries.str.contains('Python').sum()
num_all = len(countrySeries)
percent = round((num_know_python / num_all * 100), 2)
return percent
country_group['LanguageWorkedWith'].apply(percent_know_python_each_country).sort_values(ascending=False).head(30)
My solution:
country_grp["LanguageWorkedWith"].apply(lambda x: x.str.contains("Python").value_counts(normalize=True)).loc["United States"].loc[True]
Alternative to apply method on the whole DF and round the percentage to two decimals:
print(round(country_grp["LanguageWorkedWith"].apply(lambda x: x.str.contains("Python").value_counts(normalize=True)).to_frame().loc[pd.IndexSlice[:,True],:]*100,2))
Age and CodRevHhs has been approved as of the 2023 survey results.
ОтветитьThe 🐐
Ответитьcountry_group["LanguageHaveWorkedWith"].apply(lambda x:x.str.contains("Python").sum()*100/len(x))
ОтветитьMr. Schafer, I am so happy I found your teaching. I have been on a journey to become a data analyst, and after completing the Google Analytics Course , I realized that I needed to learn much more. I am currently finishing a Python Course through Coursera offered by IBM.
Not every professional, no matter how good they are, have the natural ability to teach. Your method and technique are so amazing and helped me to overcome some of the confusions I had with coding in Python. I learned so much from just this video alone.
I will definitely visit the site you referenced, and look forward to learning more from your videos.
Thank you so much!
can below codes achieve the same result? Thanks
country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()/x.count())
Bravo Corey, Bravo!!!
ОтветитьIf anyone's getting an error when using .median() on the whole data frame, add the numeric_only=True argument:
data_frame.median(numeric_only=True)
hey wouldnt it be smarter to use count on the country_group['languageworkedwith'] object and use that as a our total population , because some people might of not answered that question and were counting them too with corey's method.
ОтветитьCan replacing .sum with .value_counts(normalize=True) work or no? I'm working with a different dataset not involving strings but rather integers so im not sure how to test it without asking you.
Ответитьthanks so much for this series. started from the first video two weeks ago, now in the 8th. this series so far made a lot progress in me,. thanks so much, .May God Bless You. Love from Sri Lanka...
Ответитьsolution to problem in 1 line:
group['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').sum())/group['LanguageHaveWorkedWith'].count()
I have a question please: Would it be right if we said that "Grouping is for categorical data, while aggregating is for numerical ones?"
Thank you Corey, really don't know what to say 😊😊
a handy solution for the question:
country_grp["LanguageWorkedWith"].apply(lambda x:x.str.contains("Python").sum())/df["Country"].value_counts()*100
THANK YOU FOR GIVING US GOOD CONTENT
ОтветитьJust want to share here my solution for the practice question (but with the survey of 2022):
---
country_group['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))
---
And also give thanks to your wonderful videos, Corey!
It's been 3 years and they're still among one of the best tutorials.
This is how i did it:
country_grp['LanguageHaveWorkedWith'].apply(lambda x:x.str.contains('Python').sum()/x.value_counts(dropna=False).sum())
I have just used lambda function first to find person who knows python and also to find the total count of person it the country group
I have also used the parameter 'dropna= False ' inside the value_counts method so that it will include NA terms also.
Here's a simple solution: country_grp['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').sum()/x.count()). My name for the col is different because I used the most recent dataset as opposed to the 2016. Pretty sure this is the most simple solution since the count function just gives use the total number of respondents in the survey and doesn't require you to alter that number any further. But I could be wrong.
ОтветитьFor his practice problem, did anyone lese just divide the result by country_grp.size()? Unfortunately, it works like his method where NaN values are treated as 0, rather than dropping them, but it's pretty easy. I still prefer Felipe Gomez's solution of adding /len(x) to the lambda function, though
country_grp['LanguageWorkedWith'].apply(lambda x : x.str.contains('Python').sum()) / country_grp.size()
df_Python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
df_Python.value_counts(normalize=True)
Heyyy coreyyy I got drops in my eyes after watching the way you taught....you made my day❣️✨love you so much corey
ОтветитьIf anyone else is having issues with this tutorial will they let me know. I've read so many comments and I'm not seeing anyone having issues. But I can't usually use :
python_df.sort_values(by ='PercentknowPy', ascending = False).head(50)
python_df
or anything else like it... after trying for hours to look for some typo or some cell not being run correctly, or a kernal issue...(I don't really like Jupyter Notebook right now) Chat GPT finally told me I should save it to the original dataframe if I want to see the changes.
like this:
python_df = python_df.sort_values(by ='PercentknowPy', ascending = False).head(50)
python_df
Bam there it is...but I just can't figure out why its working for Corey and note me. lol
Things like this keep happening in this tutorial , I just don't know if its because I am making some mistake I am unaware of, or if one of the many components had changed in a new release. I'd love someone's thoughts!!!! Thanks!
Hello Corey Sir, I love your teaching a lot. You are the best. Thank you
ОтветитьAmazing videos. Thanks a lot for creating such an amazing series.
ОтветитьThank you Corey!
Ответить.value_counts(normalize=True))
ОтветитьBrilliant!
ОтветитьAwesome video. Well-explained and to the point!
Ответитьyou teach really well, I am learning a lot...Thanks. I have also learned other topics from your videos....I was stuck on opening and reading a csv file in python, till I saw your video and learned it....I am an absolute beginner...😅...Thanks
ОтветитьI've solved the practice question in a slightly different manner
No_of_respondent = df1.groupby(['country'])['country'].count()
No_knows_python = df1.groupby(['country'])['languageworkedwith'].apply(lambda x : (x.str.contains('Python').sum()))
Percent = (No_knows_python / No_of_respondent)*100
Explanation for not using sort method -> i use groupby in both the variable so output should be the same set of rows
country_grp['LanguageWorkedWith'].apply(lambda x: (x.str.contains('Python').sum()/len(x)) * 100)
ОтветитьNothing can match clearity of your video.
Ответитьhow do you save the new column that you created it in mint 41 in the original file?
Ответитьso many thanks for this!
Ответить