Kaggle Competition - House Prices: Advanced Regression Techniques Part1

Kaggle Competition - House Prices: Advanced Regression Techniques Part1

Krish Naik

4 года назад

282,502 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@aboutbusiness.
@aboutbusiness. - 31.10.2023 17:04

Sir Xgboost not installing in my system please suggest solution

Ответить
@edilmonica386
@edilmonica386 - 30.10.2023 10:27

Hello sir,

Can you explain the dummy coding.

Ответить
@subparman6553
@subparman6553 - 29.10.2023 19:25

kitna powerful hein tera caamputar

Ответить
@greenshadowooo
@greenshadowooo - 27.10.2023 14:56

A very useful sharing !😍😍😍

Ответить
@aakashgohil859
@aakashgohil859 - 14.09.2023 05:31

Hi Krish, would you mind sharing your email? I have some questions and need help to understand. I really appreciated the way you explained the problem statement and the feature engineering process. Your explanation was clear and insightful.

Ответить
@shwetasaini6892
@shwetasaini6892 - 04.07.2023 18:55

Has anyone tried to fit the model using Sklearn library ? I am getting error - float() argument must be a string or a real number, not 'method'
I have changed the floating points too.

Ответить
@natures_soul763
@natures_soul763 - 04.07.2023 09:23

why their is no feature scaling in this code

Ответить
@Arceus948
@Arceus948 - 17.06.2023 17:01

why didn't u scaled the features data??

Ответить
@chiragpatel1491
@chiragpatel1491 - 19.05.2023 13:26

Rather going feature by feature go like this

df.isnull().sum().sort_values(ascending=False).head(20)

PoolQC 1453
MiscFeature 1406
Fence 1179
FireplaceQu 690
GarageType 81
GarageFinish 81
GarageQual 81
GarageCond 81
GarageYrBlt 81
BsmtFinType2 38
BsmtExposure 38
BsmtCond 37
BsmtFinType1 37
BsmtQual 37
MasVnrArea 8
MasVnrType 8
Electrical 1
BsmtFullBath 0
Functional 0
HalfBath 0
dtype: int64

it will save time :)

Ответить
@bivekyadav08
@bivekyadav08 - 10.05.2023 21:56

This man is always there to help🙌
Thanks 🥺❤🙏

Ответить
@kevinmartinezperez4111
@kevinmartinezperez4111 - 11.03.2023 20:05

Men que buen video, muchas gracias, Saludos desde Perú

Ответить
@christiansetzkorn6241
@christiansetzkorn6241 - 19.02.2023 15:59

I might be missing something but what is advanced about this?

Ответить
@Marcel-f1
@Marcel-f1 - 06.02.2023 20:46

In summary: the machine learning engineer is making a “guess” about the dataset he is working on, and making multiple repetitive tasks like fill null values, feature extraction, and all these work that can be automatized

Ответить
@Peter-ns6jg
@Peter-ns6jg - 24.12.2022 04:32

this helped me a lot. thanks

Ответить
@vivianjoseph822
@vivianjoseph822 - 25.10.2022 17:05

tqsm brother!!

Ответить
@nguyennhi8524
@nguyennhi8524 - 15.10.2022 21:23

Thank you a lot

Ответить
@trashantrathore4995
@trashantrathore4995 - 28.08.2022 21:12

Just need to know 1 major thing here ,
For ex for most of the Problems - We are given a Train data of 8000 rows and Test data of 4000 rows and also a submission file which contains a sample result containing all 4000 rows of test dataset , So my question is while performing Data Cleaning on Train Data if I found out few NAN values/Outliers are present in few rows and I remove these rows and suppose Train data now becomes 6500 rows and 1500 rows are removed and I apply same process for Test Data and suppose we are left with 3300 rows out of 4000 rows in test data after performing data cleaning but as we know, we are given sample solution which contains ALL 4000 rows of Test Data but after Data Cleaning I am left with 3300 rows which will create a mismatch error and whole ML model will be a waste, So in the end I want to know can we not remove any NAN values/ Outliers containing rows and only impute without removing the row? How to do it I am getting confused, please anyone know about it can comment in reply, I would be glad to hear.

Ответить
@me_debankan4178
@me_debankan4178 - 15.08.2022 14:25

you haven't done normalization?

Ответить
@sauravsrivastava2353
@sauravsrivastava2353 - 09.08.2022 15:04

This video was really helpful for me because I am just fresher in the Data Science world and I even don't know how to deal with such real-world data science problems, So thanks Krish sir for this kind of video, pls make another video regarding another Kaggle competitions.

Ответить
@dsfromussr1116
@dsfromussr1116 - 20.06.2022 20:13

Has anybody used a public data from upgini python library in Kaggle? I Looking for any feedbacks about this data enrichment library.

Ответить
@mohammadhegazy1285
@mohammadhegazy1285 - 12.06.2022 17:28

thanks a lot

Ответить
@dikshagupta3276
@dikshagupta3276 - 02.06.2022 13:07

I don't understand I thing why we combine train data with test data for pd.get dummy pls reply

Ответить
@nicholasjordan5661
@nicholasjordan5661 - 27.05.2022 06:31

I really need helo on this i keep getting this issue: ValueError: DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter `enable_categorical` must
be set to `True`. Invalid columns:

Ответить
@oleholeynikov8659
@oleholeynikov8659 - 26.05.2022 15:13

it is my exam project. Thanks a lot for the video!!!!

Ответить
@jeremyheng8573
@jeremyheng8573 - 10.03.2022 20:18

Thank you! Good tutorial!

Ответить
@shivadigitalsolutionsandam56
@shivadigitalsolutionsandam56 - 08.03.2022 15:59

bhai tum bolte bahot achcha ho. maine kuch nhi to 10 video dekhi same problem per. but your video is good one

Ответить
@gtaunlimited007
@gtaunlimited007 - 03.02.2022 16:54

what was your r2 score of final model? Mine never cross above .78 for this dataset

Ответить
@mohammadj.shamim9342
@mohammadj.shamim9342 - 12.01.2022 13:21

It was a great explanatory video, however, there was some jumping over the map. I am fine with it, but for a beginner, it will be problematic.

Ответить
@dennisbesseling9267
@dennisbesseling9267 - 17.11.2021 15:57

in the description file it says that N/A values should be considered as a value for the absence of the feature. So if there is a null value in any of the basement columns this means that the house doesn't have a basement and so on..

Ответить
@ibrahimnada4702
@ibrahimnada4702 - 09.11.2021 08:51

this is not machine learning, you're just calling functions and sorting rows .....

Ответить
@thejswaroop5230
@thejswaroop5230 - 27.10.2021 19:15

Thank you it was helpful

Ответить
@izike09
@izike09 - 25.09.2021 13:55

Why isn't xgboost working on my notebook. It keeps telling me there is no new module for xgboost?

Ответить
@zac231
@zac231 - 06.09.2021 17:02

👍👍👍

Ответить
@ravikanth6534
@ravikanth6534 - 30.08.2021 16:38

Hi Krish sir,
what is the purpose of concatenating the train and test datasets, when the test dataset has no dependent feature "SalePrice", how can the extra categories help in predicting sale price?
Kindly clear my doubt, Thankyou

Ответить
@ravikanth7179
@ravikanth7179 - 30.08.2021 16:28

Hi Krish sir,
Thank you so much for your valuable time to share your knowledge with us.
Please answer my doubt, sir,
what is the purpose of concatenating the train and test datasets, when the test dataset has no dependent feature "SalePrice", how can the extra categories help in predicting sale price?
I hope you will clear my doubt sir, Thanks in advance

Ответить
@arrafihriday1333
@arrafihriday1333 - 14.08.2021 20:46

I have learnt plethora of things from this project.
Lastly ,can you please tell me ---what is the use of 0 here> mode()[0] ??

Ответить
@somashaker5753
@somashaker5753 - 04.08.2021 09:43

Hi Krish
I need to talk to you on this. How can I contact you on this. Plz let me know

Ответить
@athikurrahumans4740
@athikurrahumans4740 - 03.08.2021 11:01

guys iam getting an error in 37 th line im train dataset that name 'columns' is not defined what to do...

Ответить
@eceerdinc3716
@eceerdinc3716 - 20.07.2021 21:34

Hey Krish, Thanks for your effort! just one quick question is data heteregenous data isnt it?

Ответить
@x_x3557
@x_x3557 - 15.07.2021 15:22

Can we do the preprocessing for the whole data and then split it instead of concatenating it?

Ответить
@anoopk4659
@anoopk4659 - 08.07.2021 17:16

Garagecars is a categorical variable ,but mean is used for fill na

Ответить
@sripuramneeraja
@sripuramneeraja - 07.07.2021 23:30

Hi Krish, Thank you for your videos. I have a question, if train and test are in same file. Do we need to encode data after train_test_split or before. Your answer would be helpful for me.

Ответить
@katocharles1501
@katocharles1501 - 05.07.2021 22:33

Thank you very much Mr Krish you have given me a clear start

Ответить
@ApurvaMishra9
@ApurvaMishra9 - 29.06.2021 11:49

Hi Krish! Thank you so much for this ml intro to kaggle via house price prediction. I am a novice in the field and had a doubt. I shall be grateful if you could help me out. In theory, isn't testing data the data that is not touched at all meaning how can we perform preprocessing on the new untouched data and not violate that concept?

Ответить
@learnforfuture2611
@learnforfuture2611 - 26.06.2021 19:55

Sir , in this you combined train and test dataset and then split it using sklearn library . Then how the id in sample_submission.csv file (taken from competition) will match to your predicted values.

Ответить
@Mayur586
@Mayur586 - 03.06.2021 08:16

Thanks for your videos it helped a lot to clear doubts & learn new things but I have few queries as below:
1. how to convert/inverse predicted values to original format after using any transformation because np.exp/np.expm1 is not giving actual value
2. why we have to save it new column while transformation
3. if skewness not reduced in 1 or 2 iteration what to do

Ответить