Комментарии:
J.
John hip
how about adding like .000000001 or 1 value to that dictionary for similar values? does that work?
Ответитьdude, you are amazing. well done.
ОтветитьBut it will create high weightage to the higher counts and model will overfit
Ответитьthank you sir
Ответитьfinished watching
Ответитьsir, since we are just assigning count numbers to the categorical values and even it may lead to problematic situation if the counts are same , why don't we use label encoding in each columns , YEs it may not be the ordinal data but it does better job then what this method that you are talking about is doing.the Goal is to assign numbers to strings .
-- I may be wrong but according to the info you provided what i said should work too , and it is easier.
CORRECT ME IF I AM WRONG @KRISH NAIK
singam kadhal vandhalle kalla redum thannale 😁😂, ennakum romba pudikum
Ответитьsir then what about categories which have similar count..???
Ответитьsimilarly, i have IP_ADDRESS as a feature how can i encode them?
ОтветитьHello Sir,
if the categories are not ordinal and,
If we replace each categories with their counts in that column. aren't we making it ordinal as different different category will have different numeric value...?
Can anyone please help me understand this?
Best tutorial
ОтветитьThanks for sharing this informative lesson. I have a bank database and I need to identify categorical features in the table with column name
Customer Age,
Professional Experience,
Annual Income,
Family Size,
CC Avg monthly spend,
Education (1: Undergrad; 2: Graduate; 3: Advanced/Professional),
Mortgage Value,
Personal Loan (Yes/No),
Securities Account (Yes/No),
Credit Card (Yes/No). Can you help me with reasons for your selection.
Can someone help me in explaining 2nd disadvantage of this algorithm?
ОтветитьI thought we were to consider the top 10 most frequent labels and perform one hot encoding on them, but this is completely different :/. Somebody pls help, I'm a beginner at this and some lead on this would be greatly appreciated.
ОтветитьThanks for sharing this video!
I was looking for this type of code for replacement of column with it's count column. I am learning feature engineering from kaggle couses. There, count_encoder() is used and I was trying to write code for steps used in the count_encoder() which I find in this video.
I have a doubt, the column in that tutorial was numerical and both the column (col and count_column) were used in predicting output and calculating the validation score. I calculated validation score using both the ways (i.e.included both col & count_col and then only count_col), I found score to be higher in the model using both(col & count_col). Can you please clarify, should we use the original column as well if it's numerical ?
in this case how to interpreat result for prediction coefficent with respect to encoded feature ?
Ответитьso helpful 😁
ОтветитьSir, may you please recommend ML algorithms which can be effectively applied with this technique? Pros and cons?
Ответитьwhat if the two diff categories have the same count
ОтветитьThanks Krish, for sharing your knowledge. I want to know how to encode multi select categorical variables in Python. For example there is a field "Languages Known' which can hold multiple values like English, Kannada, Hindi or French, Telugu, English or Malayalam, Tamil, English etc. and assume the overall choice of unique languages is around 25.
ОтветитьFriends,
What if we have a column (say "Location") which consists of more than 1000 categorical variables?? FYI, this column is an independent variable and one of the important parameter for predicting the label. Answer pls....
Thanks in advance
Sir can you share that feature engineering zip file again pls !
ОтветитьHi, Sir, Can you explain how to do encoding for the target feature with multiple labels for each instance separated by comma?
ОтветитьHow is this different than LabelEncoding...even there it counts the unique label and assigns accordingly..here it counts the frequency nd assigns ...if both are same y use this??
Ответитьwhere did you completed your data science course
ОтветитьThanks for sharing your research ideas... whenever I feel I am comfortable with knowing some type of concept in DS... .every video of yours adds an interesting additional perspective to my knowledge...thanks for taking the efforts to share everything that you do😀
ОтветитьSir is mean encoding and Frequency encoding same?? If not. please do make a video on it
Ответитьsir please make video on roc_auc
ОтветитьSir I am very impressed from mode of teaching. Easily understandable. Sir please some high level computer vision based video lectures. Thanks
ОтветитьCan we have a video on hypothesis testing and implementation in real world problem with python
ОтветитьHi krish,
If we have a very large dataset with less features i.e number of features very very less compared to number of samples in the dataset, how can I approach the problem and what techniques i can apply to get reasonable result and how to avoid problems like over fitting in this case.
Thanks in Advance
Thank you for the videos they help a lot.
ОтветитьBro can you please perform feature engineering technique in air flight price problem
ОтветитьHi can you upload a video explaining lightGBM catBoost with simple examples
ОтветитьHi Sir, Could you please complete your kaggle competition code. Waiting for next part. Thank so much you are best 👍💯
ОтветитьHi Krish,
It's a good video, probably anyone knows about it, I haven't encountered this technique either, but I'm not sure how the distance-based algorithm will work and interpret the encoded categorical feature with this technique, please let me know if there are set of the algorithm to be used when performing regression or classification or this technique is applicable to all to all algorithm.
Sir please explain where we use binomial distribution, poss ion dis, normal dis, anova test, one and two tail test in machine learning data science ?
ОтветитьHow many hours do the data scientist work in a week in service based companies?
ОтветитьSuper sir
ОтветитьDo make video on Complete Hypothesis Testing
Ответить