Featuring Engineering- Handle Categorical Features Many Categories(Count/Frequency Encoding)

Featuring Engineering- Handle Categorical Features Many Categories(Count/Frequency Encoding)

Krish Naik

4 года назад

84,874 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Bongiwe Khongolo
Bongiwe Khongolo - 10.12.2022 01:51

J.

John hip

Ответить
Elsa • Zone
Elsa • Zone - 06.12.2022 19:33

how about adding like .000000001 or 1 value to that dictionary for similar values? does that work?

Ответить
Hazhir
Hazhir - 28.11.2022 19:19

dude, you are amazing. well done.

Ответить
The Algorithm
The Algorithm - 18.09.2022 20:56

But it will create high weightage to the higher counts and model will overfit

Ответить
Devansh Goel
Devansh Goel - 27.06.2022 11:21

thank you sir

Ответить
Sandipan Sarkar
Sandipan Sarkar - 21.10.2021 14:50

finished watching

Ответить
Tikendra Sahu
Tikendra Sahu - 10.10.2021 09:15

sir, since we are just assigning count numbers to the categorical values and even it may lead to problematic situation if the counts are same , why don't we use label encoding in each columns , YEs it may not be the ordinal data but it does better job then what this method that you are talking about is doing.the Goal is to assign numbers to strings .
-- I may be wrong but according to the info you provided what i said should work too , and it is easier.
CORRECT ME IF I AM WRONG @KRISH NAIK

Ответить
ThePresistence
ThePresistence - 30.08.2021 10:34

singam kadhal vandhalle kalla redum thannale 😁😂, ennakum romba pudikum

Ответить
Sreerag Sasidharan
Sreerag Sasidharan - 02.07.2021 15:37

sir then what about categories which have similar count..???

Ответить
Ashit Debdas
Ashit Debdas - 12.06.2021 17:54

similarly, i have IP_ADDRESS as a feature how can i encode them?

Ответить
Prabhat gupta
Prabhat gupta - 18.05.2021 21:11

Hello Sir,
if the categories are not ordinal and,
If we replace each categories with their counts in that column. aren't we making it ordinal as different different category will have different numeric value...?

Can anyone please help me understand this?

Ответить
Rasel Uddin
Rasel Uddin - 17.04.2021 13:32

Best tutorial

Ответить
Ashutosh Sinha
Ashutosh Sinha - 16.04.2021 01:18

Thanks for sharing this informative lesson. I have a bank database and I need to identify categorical features in the table with column name
Customer Age,
Professional Experience,
Annual Income,
Family Size,
CC Avg monthly spend,
Education (1: Undergrad; 2: Graduate; 3: Advanced/Professional),
Mortgage Value,
Personal Loan (Yes/No),
Securities Account (Yes/No),
Credit Card (Yes/No). Can you help me with reasons for your selection.

Ответить
Akansha Bhatt
Akansha Bhatt - 06.03.2021 17:56

Can someone help me in explaining 2nd disadvantage of this algorithm?

Ответить
Anil Sunny
Anil Sunny - 15.01.2021 11:32

I thought we were to consider the top 10 most frequent labels and perform one hot encoding on them, but this is completely different :/. Somebody pls help, I'm a beginner at this and some lead on this would be greatly appreciated.

Ответить
Srishti Kumari
Srishti Kumari - 27.12.2020 09:33

Thanks for sharing this video!
I was looking for this type of code for replacement of column with it's count column. I am learning feature engineering from kaggle couses. There, count_encoder() is used and I was trying to write code for steps used in the count_encoder() which I find in this video.
I have a doubt, the column in that tutorial was numerical and both the column (col and count_column) were used in predicting output and calculating the validation score. I calculated validation score using both the ways (i.e.included both col & count_col and then only count_col), I found score to be higher in the model using both(col & count_col). Can you please clarify, should we use the original column as well if it's numerical ?

Ответить
nivedita parab
nivedita parab - 18.12.2020 14:50

in this case how to interpreat result for prediction coefficent with respect to encoded feature ?

Ответить
Trapti Gupta
Trapti Gupta - 10.12.2020 16:40

so helpful 😁

Ответить
My Name
My Name - 02.10.2020 22:16

Sir, may you please recommend ML algorithms which can be effectively applied with this technique? Pros and cons?

Ответить
Mohammad Arif
Mohammad Arif - 23.08.2020 15:36

what if the two diff categories have the same count

Ответить
Sandeep Nataraj
Sandeep Nataraj - 17.08.2020 15:47

Thanks Krish, for sharing your knowledge. I want to know how to encode multi select categorical variables in Python. For example there is a field "Languages Known' which can hold multiple values like English, Kannada, Hindi or French, Telugu, English or Malayalam, Tamil, English etc. and assume the overall choice of unique languages is around 25.

Ответить
Vinoth Kumar Selvaraj
Vinoth Kumar Selvaraj - 28.07.2020 18:18

Friends,

What if we have a column (say "Location") which consists of more than 1000 categorical variables?? FYI, this column is an independent variable and one of the important parameter for predicting the label. Answer pls....

Thanks in advance

Ответить
Deepak Mishra
Deepak Mishra - 07.06.2020 22:12

Sir can you share that feature engineering zip file again pls !

Ответить
Meharaj Begum A
Meharaj Begum A - 01.06.2020 10:14

Hi, Sir, Can you explain how to do encoding for the target feature with multiple labels for each instance separated by comma?

Ответить
kiran arun
kiran arun - 09.05.2020 21:29

How is this different than LabelEncoding...even there it counts the unique label and assigns accordingly..here it counts the frequency nd assigns ...if both are same y use this??

Ответить
Manjunath Angadi
Manjunath Angadi - 23.02.2020 15:01

where did you completed your data science course

Ответить
Omkar R
Omkar R - 21.01.2020 13:17

Thanks for sharing your research ideas... whenever I feel I am comfortable with knowing some type of concept in DS... .every video of yours adds an interesting additional perspective to my knowledge...thanks for taking the efforts to share everything that you do😀

Ответить
Arjya Basu
Arjya Basu - 23.12.2019 01:26

Sir is mean encoding and Frequency encoding same?? If not. please do make a video on it

Ответить
sunny savita
sunny savita - 17.12.2019 09:05

sir please make video on roc_auc

Ответить
muhammad zubair Baloch
muhammad zubair Baloch - 14.09.2019 16:55

Sir I am very impressed from mode of teaching. Easily understandable. Sir please some high level computer vision based video lectures. Thanks

Ответить
Ogbeide Nelson
Ogbeide Nelson - 13.09.2019 13:56

Can we have a video on hypothesis testing and implementation in real world problem with python

Ответить
venkatesh sadagopan
venkatesh sadagopan - 13.09.2019 10:58

Hi krish,
If we have a very large dataset with less features i.e number of features very very less compared to number of samples in the dataset, how can I approach the problem and what techniques i can apply to get reasonable result and how to avoid problems like over fitting in this case.
Thanks in Advance

Ответить
Shashwat Bilgrami
Shashwat Bilgrami - 12.09.2019 21:45

Thank you for the videos they help a lot.

Ответить
Shashwat Bilgrami
Shashwat Bilgrami - 12.09.2019 21:44

Bro can you please perform feature engineering technique in air flight price problem

Ответить
Vinay Kumar
Vinay Kumar - 12.09.2019 16:29

Hi can you upload a video explaining lightGBM catBoost with simple examples

Ответить
Balaji varma
Balaji varma - 12.09.2019 16:05

Hi Sir, Could you please complete your kaggle competition code. Waiting for next part. Thank so much you are best 👍💯

Ответить
Shaz-z
Shaz-z - 12.09.2019 14:31

Hi Krish,
It's a good video, probably anyone knows about it, I haven't encountered this technique either, but I'm not sure how the distance-based algorithm will work and interpret the encoded categorical feature with this technique, please let me know if there are set of the algorithm to be used when performing regression or classification or this technique is applicable to all to all algorithm.

Ответить
Anand Acharya
Anand Acharya - 12.09.2019 14:14

Sir please explain where we use binomial distribution, poss ion dis, normal dis, anova test, one and two tail test in machine learning data science ?

Ответить
BRIGHT SIDES
BRIGHT SIDES - 12.09.2019 12:23

How many hours do the data scientist work in a week in service based companies?

Ответить
Sachin Borgave
Sachin Borgave - 12.09.2019 12:10

Super sir

Ответить
Gaurav Padawe
Gaurav Padawe - 12.09.2019 11:30

Do make video on Complete Hypothesis Testing

Ответить