Комментарии:
Unfortunately, your website link and notebook link are not available here. Any suggestion?
ОтветитьIn order to truly evaluate you need to test on an IMBALANCED test set. :) you can train on a balanced train set but hold out needs to be on a true imbalanced set . Because in the real world the data you encounter will have the same imbalanced-ness and that’s what your performance metric needs to measure: how well you score on unseen imbalanced data.
ОтветитьI am not sure if this is a best way to deal with data imbalance and it won't work in a real case. You have used SMOTE to balance the dataset and used your test dataset from the oversampled data which is synthetic. To make sure your model is working well, you have to save part of the original imbalance dataset as your test dataset and then apply SMOTE on the rest. In this way your test dataset is a perfect representation of the original data. I am sure you f1-sccore will be very small. One of the best methods are One Class Support Vector Machine (OCSVM), Generalized One-class Discriminative Sub-spaces (GODS), One Class CNN (OCCNN) and Deep SVDD (DSVDD)
ОтветитьYou do realize that in your pipeline, once you run the oversample step, you have 6 perfectly balanced groups with 900 samples in each group. There's no real majority class to sample from. When you then undersample from a perfectly balanced dataset, it appears to leave group intact and resamples the others. If you plot the data, it will look essentially the same as before, when you only oversampled, with some samples missing and other samples duplicated. The scores will be similar as well.
ОтветитьI don't understand how you apply under and oversampling at the same time. One of them will balance the data, and the other one has nothing left to do...
ОтветитьHi thanks for the content!
I am confused that, instead of applying this method for the y variable, can I apply this technique for imbalanced predictors that have levels with large differences in sample size?
For example, class A: 900, class B:100, class C: 2
Thanks!
one of the best channels for ML!
ОтветитьWhen should we use under_sampling? As I see there's a potential risk of losing information
ОтветитьCan you suggest any techniques to solve imbalanced image dataset??
Thank you..
Please how can we get the jupyter notebook code?
ОтветитьThanks for this share.
Please Could you send me this code?
I need it .
I think if we have to do both over and under sampling , then we use combine.smotetomek. That’s the main difference between smote n smotetomek right?
Ответитьare u sure that the undersampling method is work? the number still same 900.
ОтветитьHello thanks for the video.
However I noticed that your did SMOTE before running the train test split. I am afraid that this might be causing the results to improve drastically since the the upsampled observations from the minority class might have entered the testing dataset. So basically your model learned and test on pretty much the same variable which caused the results to improve.
Let me know what you think.
Excuse me, is there any way to find the original Notebook file? Can't open the one in the description. Thank you.
ОтветитьIt is not clear sir, and I have a question, what is the technique u have used in sorting the problem class imbalance?
ОтветитьYou said to deal with 'multi-class classification problems'. But what if we have imbalanced data and binary classification?
ОтветитьCan you apply SMOTE to text data
Ответитьwhere do you edit your data? davinci?
ОтветитьGreat one
Ответить