Комментарии:
Kevin your videos are super helpful! thank you!!!
ОтветитьOMG I WANT TO THAT YOU SOOOO MUCH 😊I been on the problem for days and the way you explain it make so easy then how I learned in class. I was so happy not to see that error message 😂 Thank you
Ответитьlove to have more videos like this
ОтветитьThank you so much, you made my day. Finally i found the row of code, that i really needed to finish my task:)(Code Line 17)
ОтветитьThis is so helpful!
Pandas has the best duplicates handling. Better than spreadsheets and SQL.
Thank you so much💕 your videos are really amazing...can you tell how to read any csv(without header on first line) and set first row with non null values as header...
Ответитьhow to remove just the names for example i have multiple coloum with same name but the same name has multiple heart rate measure. i just want single name. for example. imagine this is table
name heart rate
Aaron 79
Aaron 80
Aaron 90
. i want name to display only once
just find your channel , just watched this as my first watch for your videos , and pressed subscribe !!! , cause your explanation for the idea as whole is very remarkable 😃 thanks a lot .
Ответитьbeneficial videos. ❤
ОтветитьThank you! you sound like Kamala Harris lol
ОтветитьIt helps me a lot. Can you explain how do we get the count of each duplicated value.
ОтветитьGood lesson, but the datatype has to match. I found I had to process my pandas tables with .astype(str) before this worked.
Ответитьhow to drop a column which contains 95 % same values in python
Ответитьthank you ...!!!
ОтветитьThat was so accurate, thanks a lot genius!
ОтветитьIf I have a datataframe with a million rows and 15 columns, how do I figure out if any columns in my dataframe has mixed data type?
Ответитьi love you, sir.
ОтветитьThank you!
ОтветитьBrilliant video .
ОтветитьThank you for this content! I have a question : how can we handle quasi redundant values in different columns ? (Imagine two different columns each containing similar values at 80%). Thanks a lot
ОтветитьHOW DO YOU KNOW WHAT I NEED? YOU ARE MY FAV TEACHER FROM NOW
Ответитьhello, thank you for the video, I'm wondering if you can make some tutorials about the API requests
ОтветитьThanks a lot. It was a great help. Much appreciated!
ОтветитьHow do I access iPython Jupyter Notebook link? it is not available in the github repository.
ОтветитьThanks for the video
ОтветитьWhen I use the parameter keep=False I get a number of rows less than the first and last combined what is the reason of that??
ОтветитьJeez you just saved me so much work for a seemingly unsolvable project 🙏☕
ОтветитьClean and informative !
ОтветитьGreat video. But I'd like just to find a duplicate column and then go to another column and find the duplicate and go to another column and find the duplicate and remain only one row with certain information.
ОтветитьVery methodical explanation
ОтветитьYou have done very Good jobs about under standing of DataFrame and make very easy to understanding DataFrame it so easy with the people which are working in excel
Best wishes from me
How can we efficiently find near duplicates from a dataset?
ОтветитьThis is case of complete duplicates. So what should we do when we have to deal with incomplete duplicates..Ex age,gender and occupation same but zip is different..
could you also make a video on that please..
wait Kevin, keep=first means what is duplicated are the rows towards the bottom, meaning they have a much higher index. Keep= last means ?? Oh men am getting mixed up. Could someone please explain to me. Kevin,Please?
ОтветитьAmazing and thanks bro , the right place for data queries
ОтветитьGreat video. This helped me tremendously.
How would you go about finding duplicates "case insensitive" with a certain field?
Really, your teaching method is very good, your videoes give more knowledge, Thanks Data School
Ответитьlove u brother . u r changing so many lives, thanku ....the best teacher award goes to Data school.
Ответитьwow! you are already teaching data science in 2014 when it is not even popular! Btw, your videos are really good, you speak slow and clear, easy to understand and for me to catch. Kudos to you!
ОтветитьAt the end are you saying that "age" + "zip code" must TOGETHER be duplicates? Or are you saying "age" duplicates and "zip code" duplicates must remove their individual duplicates from their respective columns? Thanks
ОтветитьThanks for awesome videos on Pandas. I was able to automate few excel reporting at my work.. but stuck with something very complex(its complex for me!). Could you please help on some complex excel calculations using Python.?
for ex. suppose I have data in below format.
db_instance Hostname Disk_group disk_path disk_size disk_used header_status
abc_cr host1 data01 dev/mapper/asm01 240 90 Member
abc_cr host1 data01 dev/mapper/asm02 240 100 Member
abc_cr host1 data01 dev/mapper/asm03 240 60 Member
abc_xy host1 data01 dev/mapper/asm01 240 90 Member
abc_xy host1 data01 dev/mapper/asm02 240 100 Member
abc_xy host1 data01 dev/mapper/asm03 240 60 Member
abc_cr host1 acfs01 dev/mapper/asm04 90 30 Member
abc_cr host1 acfs01 dev/mapper/asm05 90 60 Member
abc_xy host1 acfs01 dev/mapper/asm04 90 30 Member
abc_xy host1 acfs01 dev/mapper/asm05 90 60 Member
host1 unassigned dev/mapper/asm06 180 0 Candidate
host1 unassigned dev/mapper/asm07 180 0 Former
res_du host2 data01 dev/mapper/asm01 240 90 Member
res_du host2 data01 dev/mapper/asm02 240 100 Member
res_du host2 data01 dev/mapper/asm03 240 60 Member
res_hg host2 data01 dev/mapper/asm01 240 90 Member
res_hg host2 data01 dev/mapper/asm02 240 100 Member
res_hg host2 data01 dev/mapper/asm03 240 60 Member
res_pq host2 acfs01 dev/mapper/asm04 90 30 Member
res_pq host2 acfs01 dev/mapper/asm05 90 60 Member
res_mn host2 acfs01 dev/mapper/asm04 90 30 Member
res_mn host2 acfs01 dev/mapper/asm05 90 60 Member
host2 unassigned dev/mapper/asm06 180 0 Candidate
host2 unassigned dev/mapper/asm07 180 0 Former
As you can see, disk_path is duplicated for each host..because of multiple db_instance. (Even though you see similar disk_paths for host1 & host2, but actually they are different disks from storage end.. but admins follow similar name conventions when they configure disks at host side, resulting similar disk_paths for different hosts)
My queries are, How
1. to remove duplicates for disks_path for each host?(considering only two columns Hostname & disk_path, that's how I remove duplicates in excel, I am not worried for db_instance)
2. once we remove duplicates, calculate total size of 'Member' disks... also total size of 'Candidate' and 'Former' disks combined.
3. to add another column 'Percent used', which will is result of 'disk_used'/'disk_size'*100 for each row.
Thanks in advance!
long live and prosper!
ОтветитьHow to Remove Leading and Trailing space in data frame
ОтветитьI have watched a lot of your videos; and I must say that the way, you explain is really good. Just to inform you that I am new to programming let alone Python.
I want to learn a new thing from you. Let me give you a brief. I am working on a dataset to predict App Rating from Google Play Store. There is an attribute by name "Rating" which has a lot of null values. I want to replace those null values using a median from another attribute by name "Reviews". But I want to categorize the attribute "Reviews" in multiple categories like:
1st category would be for the reviews less than 100,000,
2nd category would be for the reviews between 100,001 and 1,000,000,
3rd category would be for the reviews between 1,000,001 and 5,000,000 and
4th category would be for the reviews anything more than 5,000,000.
Although, I tried a lot, I failed to create multiple categories. I was able to create only 2 categories using the below command:
gps['Reviews Group'] = [1 if x <= 1000000 else 2 for x in gps['Reviews']]
gps is the Data Set.
I replaced the Null Values using the below command:
gps['Rating'] = gps.groupby('Reviews Group')['Rating'].transform(lambda x: x.fillna(x.median()))
Please help me create multiple categories for "Reviews" as mentioned above and replace all the Null Values in "Rating".
lol, just when I felt you wouldn't handle the exact subject I was looking for: there came the bonus! Thanks!
ОтветитьYou are the greatest teacher in the world
ОтветитьI can solve the duplicate data from my CSV file~~~ Thank you.
However, I suggest you can do more in this video. I think you can show after the delete result list. Such as:
>> new_data=df.drop_duplicates(keep='first')
>> new_data.head(24898)
If you have to add it, I think this video will be more perfect~~~
you're amazing we need more videos in your channel
Ответитьvery useful videos.. can you please tell me how to find duplicate of just one specific row?
Ответить