Комментарии:
Sir Xgboost not installing in my system please suggest solution
ОтветитьHello sir,
Can you explain the dummy coding.
kitna powerful hein tera caamputar
ОтветитьA very useful sharing !😍😍😍
ОтветитьHi Krish, would you mind sharing your email? I have some questions and need help to understand. I really appreciated the way you explained the problem statement and the feature engineering process. Your explanation was clear and insightful.
ОтветитьHas anyone tried to fit the model using Sklearn library ? I am getting error - float() argument must be a string or a real number, not 'method'
I have changed the floating points too.
why their is no feature scaling in this code
Ответитьwhy didn't u scaled the features data??
ОтветитьRather going feature by feature go like this
df.isnull().sum().sort_values(ascending=False).head(20)
PoolQC 1453
MiscFeature 1406
Fence 1179
FireplaceQu 690
GarageType 81
GarageFinish 81
GarageQual 81
GarageCond 81
GarageYrBlt 81
BsmtFinType2 38
BsmtExposure 38
BsmtCond 37
BsmtFinType1 37
BsmtQual 37
MasVnrArea 8
MasVnrType 8
Electrical 1
BsmtFullBath 0
Functional 0
HalfBath 0
dtype: int64
it will save time :)
This man is always there to help🙌
Thanks 🥺❤🙏
Men que buen video, muchas gracias, Saludos desde Perú
ОтветитьI might be missing something but what is advanced about this?
ОтветитьIn summary: the machine learning engineer is making a “guess” about the dataset he is working on, and making multiple repetitive tasks like fill null values, feature extraction, and all these work that can be automatized
Ответитьthis helped me a lot. thanks
Ответитьtqsm brother!!
ОтветитьThank you a lot
ОтветитьJust need to know 1 major thing here ,
For ex for most of the Problems - We are given a Train data of 8000 rows and Test data of 4000 rows and also a submission file which contains a sample result containing all 4000 rows of test dataset , So my question is while performing Data Cleaning on Train Data if I found out few NAN values/Outliers are present in few rows and I remove these rows and suppose Train data now becomes 6500 rows and 1500 rows are removed and I apply same process for Test Data and suppose we are left with 3300 rows out of 4000 rows in test data after performing data cleaning but as we know, we are given sample solution which contains ALL 4000 rows of Test Data but after Data Cleaning I am left with 3300 rows which will create a mismatch error and whole ML model will be a waste, So in the end I want to know can we not remove any NAN values/ Outliers containing rows and only impute without removing the row? How to do it I am getting confused, please anyone know about it can comment in reply, I would be glad to hear.
you haven't done normalization?
ОтветитьThis video was really helpful for me because I am just fresher in the Data Science world and I even don't know how to deal with such real-world data science problems, So thanks Krish sir for this kind of video, pls make another video regarding another Kaggle competitions.
ОтветитьHas anybody used a public data from upgini python library in Kaggle? I Looking for any feedbacks about this data enrichment library.
Ответитьthanks a lot
ОтветитьI don't understand I thing why we combine train data with test data for pd.get dummy pls reply
ОтветитьI really need helo on this i keep getting this issue: ValueError: DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter `enable_categorical` must
be set to `True`. Invalid columns:
it is my exam project. Thanks a lot for the video!!!!
ОтветитьThank you! Good tutorial!
Ответитьbhai tum bolte bahot achcha ho. maine kuch nhi to 10 video dekhi same problem per. but your video is good one
Ответитьwhat was your r2 score of final model? Mine never cross above .78 for this dataset
ОтветитьIt was a great explanatory video, however, there was some jumping over the map. I am fine with it, but for a beginner, it will be problematic.
Ответитьin the description file it says that N/A values should be considered as a value for the absence of the feature. So if there is a null value in any of the basement columns this means that the house doesn't have a basement and so on..
Ответитьthis is not machine learning, you're just calling functions and sorting rows .....
ОтветитьThank you it was helpful
ОтветитьWhy isn't xgboost working on my notebook. It keeps telling me there is no new module for xgboost?
Ответить👍👍👍
ОтветитьHi Krish sir,
what is the purpose of concatenating the train and test datasets, when the test dataset has no dependent feature "SalePrice", how can the extra categories help in predicting sale price?
Kindly clear my doubt, Thankyou
Hi Krish sir,
Thank you so much for your valuable time to share your knowledge with us.
Please answer my doubt, sir,
what is the purpose of concatenating the train and test datasets, when the test dataset has no dependent feature "SalePrice", how can the extra categories help in predicting sale price?
I hope you will clear my doubt sir, Thanks in advance
I have learnt plethora of things from this project.
Lastly ,can you please tell me ---what is the use of 0 here> mode()[0] ??
Hi Krish
I need to talk to you on this. How can I contact you on this. Plz let me know
guys iam getting an error in 37 th line im train dataset that name 'columns' is not defined what to do...
ОтветитьHey Krish, Thanks for your effort! just one quick question is data heteregenous data isnt it?
ОтветитьCan we do the preprocessing for the whole data and then split it instead of concatenating it?
ОтветитьGaragecars is a categorical variable ,but mean is used for fill na
ОтветитьHi Krish, Thank you for your videos. I have a question, if train and test are in same file. Do we need to encode data after train_test_split or before. Your answer would be helpful for me.
ОтветитьThank you very much Mr Krish you have given me a clear start
ОтветитьHi Krish! Thank you so much for this ml intro to kaggle via house price prediction. I am a novice in the field and had a doubt. I shall be grateful if you could help me out. In theory, isn't testing data the data that is not touched at all meaning how can we perform preprocessing on the new untouched data and not violate that concept?
ОтветитьSir , in this you combined train and test dataset and then split it using sklearn library . Then how the id in sample_submission.csv file (taken from competition) will match to your predicted values.
ОтветитьThanks for your videos it helped a lot to clear doubts & learn new things but I have few queries as below:
1. how to convert/inverse predicted values to original format after using any transformation because np.exp/np.expm1 is not giving actual value
2. why we have to save it new column while transformation
3. if skewness not reduced in 1 or 2 iteration what to do