Featuring Engineering- Handle Categorical Features Many Categories(Count/Frequency Encoding)

Krish Naik

4 года назад

84,874 Просмотров

Скачать видео

Комментарии:

Bongiwe Khongolo - 10.12.2022 01:51

J.

John hip

Ответить

Elsa • Zone - 06.12.2022 19:33

how about adding like .000000001 or 1 value to that dictionary for similar values? does that work?

Ответить

Hazhir - 28.11.2022 19:19

dude, you are amazing. well done.

Ответить

The Algorithm - 18.09.2022 20:56

But it will create high weightage to the higher counts and model will overfit

Ответить

Devansh Goel - 27.06.2022 11:21

thank you sir

Ответить

Sandipan Sarkar - 21.10.2021 14:50

finished watching

Ответить

Tikendra Sahu - 10.10.2021 09:15

sir, since we are just assigning count numbers to the categorical values and even it may lead to problematic situation if the counts are same , why don't we use label encoding in each columns , YEs it may not be the ordinal data but it does better job then what this method that you are talking about is doing.the Goal is to assign numbers to strings .
-- I may be wrong but according to the info you provided what i said should work too , and it is easier.
CORRECT ME IF I AM WRONG @KRISH NAIK

Ответить

ThePresistence - 30.08.2021 10:34

singam kadhal vandhalle kalla redum thannale 😁😂, ennakum romba pudikum

Ответить

Sreerag Sasidharan - 02.07.2021 15:37

sir then what about categories which have similar count..???

Ответить

Ashit Debdas - 12.06.2021 17:54

similarly, i have IP_ADDRESS as a feature how can i encode them?

Ответить

Prabhat gupta - 18.05.2021 21:11

Hello Sir,
if the categories are not ordinal and,
If we replace each categories with their counts in that column. aren't we making it ordinal as different different category will have different numeric value...?

Can anyone please help me understand this?

Ответить

Rasel Uddin - 17.04.2021 13:32

Best tutorial

Ответить

Ashutosh Sinha - 16.04.2021 01:18

Thanks for sharing this informative lesson. I have a bank database and I need to identify categorical features in the table with column name
Customer Age,
Professional Experience,
Annual Income,
Family Size,
CC Avg monthly spend,
Education (1: Undergrad; 2: Graduate; 3: Advanced/Professional),
Mortgage Value,
Personal Loan (Yes/No),
Securities Account (Yes/No),
Credit Card (Yes/No). Can you help me with reasons for your selection.

Ответить

Akansha Bhatt - 06.03.2021 17:56

Can someone help me in explaining 2nd disadvantage of this algorithm?

Ответить

Anil Sunny - 15.01.2021 11:32

I thought we were to consider the top 10 most frequent labels and perform one hot encoding on them, but this is completely different :/. Somebody pls help, I'm a beginner at this and some lead on this would be greatly appreciated.

Ответить

Srishti Kumari - 27.12.2020 09:33

Thanks for sharing this video!
I was looking for this type of code for replacement of column with it's count column. I am learning feature engineering from kaggle couses. There, count_encoder() is used and I was trying to write code for steps used in the count_encoder() which I find in this video.
I have a doubt, the column in that tutorial was numerical and both the column (col and count_column) were used in predicting output and calculating the validation score. I calculated validation score using both the ways (i.e.included both col & count_col and then only count_col), I found score to be higher in the model using both(col & count_col). Can you please clarify, should we use the original column as well if it's numerical ?

Ответить

nivedita parab - 18.12.2020 14:50

in this case how to interpreat result for prediction coefficent with respect to encoded feature ?

Ответить

Trapti Gupta - 10.12.2020 16:40

so helpful 😁

Ответить

My Name - 02.10.2020 22:16

Sir, may you please recommend ML algorithms which can be effectively applied with this technique? Pros and cons?

Ответить

Mohammad Arif - 23.08.2020 15:36

what if the two diff categories have the same count

Ответить

Sandeep Nataraj - 17.08.2020 15:47

Thanks Krish, for sharing your knowledge. I want to know how to encode multi select categorical variables in Python. For example there is a field "Languages Known' which can hold multiple values like English, Kannada, Hindi or French, Telugu, English or Malayalam, Tamil, English etc. and assume the overall choice of unique languages is around 25.

Ответить

Vinoth Kumar Selvaraj - 28.07.2020 18:18

Friends,

What if we have a column (say "Location") which consists of more than 1000 categorical variables?? FYI, this column is an independent variable and one of the important parameter for predicting the label. Answer pls....

Thanks in advance

Ответить

Deepak Mishra - 07.06.2020 22:12

Sir can you share that feature engineering zip file again pls !

Ответить

Meharaj Begum A - 01.06.2020 10:14

Hi, Sir, Can you explain how to do encoding for the target feature with multiple labels for each instance separated by comma?

Ответить

kiran arun - 09.05.2020 21:29

How is this different than LabelEncoding...even there it counts the unique label and assigns accordingly..here it counts the frequency nd assigns ...if both are same y use this??

Ответить

Manjunath Angadi - 23.02.2020 15:01

where did you completed your data science course

Ответить

Omkar R - 21.01.2020 13:17

Thanks for sharing your research ideas... whenever I feel I am comfortable with knowing some type of concept in DS... .every video of yours adds an interesting additional perspective to my knowledge...thanks for taking the efforts to share everything that you do😀

Ответить

Arjya Basu - 23.12.2019 01:26

Sir is mean encoding and Frequency encoding same?? If not. please do make a video on it

Ответить

sunny savita - 17.12.2019 09:05

sir please make video on roc_auc

Ответить

muhammad zubair Baloch - 14.09.2019 16:55

Sir I am very impressed from mode of teaching. Easily understandable. Sir please some high level computer vision based video lectures. Thanks

Ответить

Ogbeide Nelson - 13.09.2019 13:56

Can we have a video on hypothesis testing and implementation in real world problem with python

Ответить

venkatesh sadagopan - 13.09.2019 10:58

Hi krish,
If we have a very large dataset with less features i.e number of features very very less compared to number of samples in the dataset, how can I approach the problem and what techniques i can apply to get reasonable result and how to avoid problems like over fitting in this case.
Thanks in Advance

Ответить

Shashwat Bilgrami - 12.09.2019 21:45

Thank you for the videos they help a lot.

Ответить

Shashwat Bilgrami - 12.09.2019 21:44

Bro can you please perform feature engineering technique in air flight price problem

Ответить

Vinay Kumar - 12.09.2019 16:29

Hi can you upload a video explaining lightGBM catBoost with simple examples

Ответить

Balaji varma - 12.09.2019 16:05

Hi Sir, Could you please complete your kaggle competition code. Waiting for next part. Thank so much you are best 👍💯

Ответить

Shaz-z - 12.09.2019 14:31

Hi Krish,
It's a good video, probably anyone knows about it, I haven't encountered this technique either, but I'm not sure how the distance-based algorithm will work and interpret the encoded categorical feature with this technique, please let me know if there are set of the algorithm to be used when performing regression or classification or this technique is applicable to all to all algorithm.

Ответить