Machine Learning Classification How to Deal with Imbalanced Data Practical ML Project with Python

DecisionForest

3 года назад

19,718 Просмотров

Скачать видео

Комментарии:

@subhajit20111 - 19.01.2023 05:25

Unfortunately, your website link and notebook link are not available here. Any suggestion?

Ответить

@TrainingDay2001 - 24.11.2022 00:58

In order to truly evaluate you need to test on an IMBALANCED test set. :) you can train on a balanced train set but hold out needs to be on a true imbalanced set . Because in the real world the data you encounter will have the same imbalanced-ness and that’s what your performance metric needs to measure: how well you score on unseen imbalanced data.

Ответить

@amansamsonmogos9608 - 17.09.2022 03:40

I am not sure if this is a best way to deal with data imbalance and it won't work in a real case. You have used SMOTE to balance the dataset and used your test dataset from the oversampled data which is synthetic. To make sure your model is working well, you have to save part of the original imbalance dataset as your test dataset and then apply SMOTE on the rest. In this way your test dataset is a perfect representation of the original data. I am sure you f1-sccore will be very small. One of the best methods are One Class Support Vector Machine (OCSVM), Generalized One-class Discriminative Sub-spaces (GODS), One Class CNN (OCCNN) and Deep SVDD (DSVDD)

Ответить

@philwebb59 - 19.02.2022 20:22

You do realize that in your pipeline, once you run the oversample step, you have 6 perfectly balanced groups with 900 samples in each group. There's no real majority class to sample from. When you then undersample from a perfectly balanced dataset, it appears to leave group intact and resamples the others. If you plot the data, it will look essentially the same as before, when you only oversampled, with some samples missing and other samples duplicated. The scores will be similar as well.

Ответить

@Mustistics - 08.11.2021 20:47

I don't understand how you apply under and oversampling at the same time. One of them will balance the data, and the other one has nothing left to do...

Ответить

@kar2194 - 19.10.2021 10:11

Hi thanks for the content!

I am confused that, instead of applying this method for the y variable, can I apply this technique for imbalanced predictors that have levels with large differences in sample size?

For example, class A: 900, class B:100, class C: 2

Thanks!

Ответить

@fatimak6440 - 26.08.2021 21:04

one of the best channels for ML!

Ответить

@rahuldey6369 - 19.08.2021 11:06

When should we use under_sampling? As I see there's a potential risk of losing information

Ответить

@nickpgr10 - 30.05.2021 17:02

Can you suggest any techniques to solve imbalanced image dataset??
Thank you..

Ответить

@tahirullah4786 - 22.05.2021 16:20

Please how can we get the jupyter notebook code?

Ответить

@mahdimed775 - 12.05.2021 21:56

Thanks for this share.
Please Could you send me this code?
I need it .

Ответить

@preeyaarawlani9948 - 16.01.2021 03:26

I think if we have to do both over and under sampling , then we use combine.smotetomek. That’s the main difference between smote n smotetomek right?

Ответить

@farisocta7466 - 29.12.2020 13:52

are u sure that the undersampling method is work? the number still same 900.

Ответить

@ammarkamran4908 - 25.12.2020 05:34

Hello thanks for the video.

However I noticed that your did SMOTE before running the train test split. I am afraid that this might be causing the results to improve drastically since the the upsampled observations from the minority class might have entered the testing dataset. So basically your model learned and test on pretty much the same variable which caused the results to improve.

Let me know what you think.

Ответить