Train Test Split vs K Fold vs Stratified K fold Cross Validation

Train Test Split vs K Fold vs Stratified K fold Cross Validation

Krish Naik

5 лет назад

63,918 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@baburamchaudhary159
@baburamchaudhary159 - 23.11.2022 20:14

I would be best if you had provided link of the dataset for the practice and confirmation.
Thanks.

Ответить
@panosp5711
@panosp5711 - 24.03.2022 20:37

Hello very nice video, but ONE QUESTION, what is the train/ test ratio in every iteration when you use stratiified k cross validation. I mean somehow combining stratified k fold cross validation with train test split

Ответить
@abhi9029
@abhi9029 - 11.10.2021 16:44

This video could have been better.

Ответить
@ayushijmusic
@ayushijmusic - 02.09.2021 20:15

You are a saviour!

Ответить
@louerleseigneur4532
@louerleseigneur4532 - 08.08.2021 07:49

We are living in a wonderful universe

Ответить
@mohe4ever514
@mohe4ever514 - 01.06.2021 15:37

Krish, In K fold validation, You fitted the classifier on diff sets of X train and Y train and got the different accuracies. This is fine to evaluate the model but you didn't mention on which data we need to train the model if we want to evaluate the performance using k fold. Are we going to train our classifier on full data i.e. X and Y? Final model which we want to use later on should be trained on full dataset?

Ответить
@skasifali2202
@skasifali2202 - 27.05.2021 10:11

Hello Krish sir...What if need to perform customize prediction. I do need to perform classifier. predict(test). But in my code it shows me feature name missing. I'm using the Pima Diabetes dataset in kaggle

Ответить
@gebremedhnmehari8451
@gebremedhnmehari8451 - 21.05.2021 09:53

how about precision, recall and f-measure?

Ответить
@Aaronisification
@Aaronisification - 18.05.2021 21:21

I love your content, it is very helpful. You are a treasure. But this video would have been loads better if you slowly allowed students to copy over the code.

Ответить
@karan9837768555
@karan9837768555 - 15.04.2021 12:00

Sir plz upload github link also

Ответить
@banjiaderibigbe1415
@banjiaderibigbe1415 - 14.04.2021 19:24

is this video notebook available in gitbut @krish

Ответить
@saltanatkhalyk3397
@saltanatkhalyk3397 - 05.04.2021 12:01

so clear explanation
thanks

Ответить
@saifulislamsanto6147
@saifulislamsanto6147 - 30.03.2021 02:52

How can I find roc curve and confusion matrix from this project in all Train Test Split vs K Fold vs Stratified K fold Cross Validation please give us a video of this.

Ответить
@saifulislamsanto6147
@saifulislamsanto6147 - 25.03.2021 00:55

Y.iloc[number] is not working. Error is
AttributeError: 'numpy.ndarray' object has no attribute 'iloc'

Ответить
@maxwellpatten9227
@maxwellpatten9227 - 26.02.2021 17:18

great video, thank you!

Ответить
@dollylollapaloosa
@dollylollapaloosa - 28.01.2021 05:36

I've been wondering for this topic for a while, very happy to find your content!!!

Ответить
@pavanim6258
@pavanim6258 - 17.01.2021 18:18

Thanks for very clear explanation Krish..can u pls share github link also

Ответить
@tomstomsable
@tomstomsable - 05.01.2021 23:19

higher bias does not necessarily mean a good accuracy, best one is low variance and low bias

Ответить
@sridhar6358
@sridhar6358 - 10.12.2020 15:28

with stratified k-fold the only difference is that the classes of type Yes and No are also considered when choosing the test size and the rest of it is the same as k fold cross validation is that so

Ответить
@pratikramteke3274
@pratikramteke3274 - 22.11.2020 13:42

By selecting n_splits as 4, i got highest accyracy in the 4th ie the last fold.. any idea on how to extract the exact dataset fed to train test so that i can replicate the output of the 4th split???

Ответить
@ravindrachauhan4078
@ravindrachauhan4078 - 17.10.2020 02:08

How to get confusion matrix and auc roc curve after k fold varification?

Ответить
@ravindrachauhan4078
@ravindrachauhan4078 - 15.10.2020 07:19

How to get confusion matrix after cross validation

Ответить
@user-ur1oj7vj2z
@user-ur1oj7vj2z - 09.10.2020 11:50

where can i get the dataset set 🙏

Ответить
@kashyap2034
@kashyap2034 - 05.10.2020 15:26

May I have ur mail-id sir

Ответить
@kushagrak4903
@kushagrak4903 - 17.09.2020 09:37

Hello sir can you share your Jupyter notebook, please.

Ответить
@ishantguleria870
@ishantguleria870 - 26.08.2020 13:34

cross value is coming what does it mean

Ответить
@philipokiokio3000
@philipokiokio3000 - 14.08.2020 03:09

Thank you for teaching can I get a link to the notebook.

Ответить
@kamenxxx9037
@kamenxxx9037 - 12.08.2020 05:33

wow this is very enlightening!!!! thank you sir! one question tho. what if we need to have the confusion matrix? I am using Repeated Stratified K-Fold, and im curious on how to obtain a reasonable and easy to execute confusion matrix. Any suggestion on this?

Ответить
@clashiverse
@clashiverse - 09.08.2020 16:00

Isn't stratified validation by default included in cross_val_score library..

Ответить
@megalaramu
@megalaramu - 04.08.2020 08:51

Hi Kris,
When we use cross_val_score and give the paramter to cv as int which tells about number of folds and if the model we are using is classification it chooses stratified by default right and not k-Fold type of cross validation. I found this in the sklearn library

cv: int, cross-validation generator or an iterable, default=None
Determines the cross-validation splitting strategy. Possible inputs for cv are:

1)None, to use the default 5-fold cross validation,
2)int, to specify the number of folds in a (Stratified)KFold,

3)CV splitter,

4)An iterable yielding (train, test) splits as arrays of indices.


For int/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

This is just a query. Please let me know if my understanding is wrong.

Ответить
@supersql8406
@supersql8406 - 29.07.2020 20:23

Thank you! Keep going with other vids tutorial!!

Ответить
@hasnain-khan
@hasnain-khan - 25.07.2020 08:29

If i have 1000 rows in dataset. Then how can select first 200 rows for testing and last 800 rows for training instead of select randomly in splitting?

Ответить
@HARDYBOY290988
@HARDYBOY290988 - 22.06.2020 12:43

Hello Krish

I have subscribed to your 799 plan, please let me know how i can add myself to your whatsapp group

Ответить
@shantanupagare7141
@shantanupagare7141 - 15.06.2020 19:26

Sir your content is great, thanks for uploading such important and informational videos. These videos are very helpful. Keep making these, more power to you. <3

Ответить
@21Gannu
@21Gannu - 12.06.2020 05:33

So my understanding is with cross-validation we can look up what is the achievable score however it lacks interpretability likeability to make confusion matrix out of it.

Ответить
@ajayvishwakarma6943
@ajayvishwakarma6943 - 24.05.2020 21:04

thanks man

Ответить
@amitagarahari8501
@amitagarahari8501 - 12.03.2020 10:24

Gud sir, I like your videos very much, sir I have a question that, in K fold validation, after score value getting how can i make confusion matrix sir....?

Ответить
@mashalnabh2747
@mashalnabh2747 - 21.02.2020 18:51

Namaskar Krish Ji! Great video, well done. The question I have is regarding the imbalanced dataset and StratifiedKFold validation. Taking your example of Churn, lets say your churn rate is 1% which means in total 50k observations your churners are 500. Now, because your data is high imbalanced with very rare events, suppose you want to do some balancing (over , under or both) and then do the stratifiedkfold validation. How would stratifiedkfold validation work in this case? Will stratifiedkfold validation take test data (lets say 10%) without balancing and build model on balanced dataset (90%), hence we would know the validation is done on real data? Or even validation is done on balanced data? If later, we would need a separate test dataset to see how model fits on real unbalanced dataset, isn't it? I hope its clear. thanks Sachin

Ответить
@HarishS12137
@HarishS12137 - 29.12.2019 20:45

How different is this from in setting the stratify parameter (stratify=y) while splitting the data using train_test_split?

Ответить
@manusharma8527
@manusharma8527 - 13.12.2019 00:17

sir in terms of skf you have miss the line nubmber 86. skf = StratifiedKFold()

Ответить
@mahnabraja1086
@mahnabraja1086 - 11.10.2019 08:39

excellent work

Ответить
@jawaharunited
@jawaharunited - 20.09.2019 09:33

Can you provide the link of source code...

Ответить
@siddharthwaghela7234
@siddharthwaghela7234 - 21.08.2019 05:35

Hello Sir, Stratified K fold works only for categorical and multiclass target variables. What if the target variable is continuous? Binning the target variable is the solution ?
Thanks

Ответить
@mgcgv
@mgcgv - 16.08.2019 10:09

just awesome!!!!
But can it be possible that you also share GIT repository for its code

Ответить
@babayaga626
@babayaga626 - 09.08.2019 09:35

Hello Sir, Thanks for the wonderful explanation. However I have a naive question to ask, How is RandomizedSearchCV and GridSearchCV different from K-Fold, Stratified K-Fold?

Ответить