How to Prune Regression Trees, Clearly Explained!!!

4 года назад

215,103 Просмотров

Комментарии:

Ankit Jain - 14.09.2023 10:06

with alpha as 10000 we get a tree with 3 leaves ( say tree 2 ).
When we increase alpha we start pruning from tree 2 or the original tree ?
From the video it looks like we start runing from tree 2, but i dont quite get the intuition why we start pruning from tree 2 and not the original tree.

Ответить

marah akermi - 02.08.2023 21:11

thanks joshh for this beautiful video i have a question in the statquest classification tree using python you extract the alphas only from train data however in this statquest you extract.alphas from all data(train and test data) ?

Ответить

Pratyansh Vaibhav - 10.07.2023 13:43

sir i am learning ML from your videos and everyday i am forced to comment expressing the beauty with which the concept is explained..and the best part is you still clear our doubts even after 3 years..for those who don't know sir has also written a book which is too good

Ответить

Trupolog Helper - 09.07.2023 19:54

Hey, Josh! Is it ok that we are using train+test to find alpha values? I mean that we are peeping into the future. do we know good thresholds using a test sample(not only train), or am I wrong? Thank you

Ответить

Christopher Kyte - 29.06.2023 08:33

Still super confused on how we picked the best alpha to begin with. Great video nonetheless

Ответить

Sonia Su - 28.06.2023 16:37

you are literally doing god's work

Ответить

abd elghafour Elgharbaoui - 23.05.2023 21:25

The best way I can thank you with, is advising you to learn and search about "ISLAM", and be a Muslim (muslim mean one who submits to ALLAH).
If you want there is channel called Boby's perspective of a person called boby that have a long journey through many believes before he reached the TRUTH, this journey is documented is his channel (Boby's perspective).

Ответить

Amnon T - 12.05.2023 22:16

Hey Josh, how can we choose our alpha wisely? So that the tree with the minimum tree score will really work well for testing data too. Is there a specific rule of thumb?

Ответить

mirroring_ - 05.05.2023 19:24

So a question. We learned previously that cross validation is used to test the model on different "blocks" of the test set. But in this case you are advocating for the cross validation to be used for hyper parameter tuning. Does that mean the test sets remain constant?

Ответить

Lolik Pof - 30.04.2023 16:27

Cost Complexity Pruning is otherwise known as post-pruning, while limiting the tree depth, enforcing a minimum amount of samples per leaf/split, a minimum impurity decrease is known as pre-pruning, correct? My question is, can you apply pre-pruning first, and then apply post-pruning to the pre-pruned tree? If yes, then i assume that the alpha parameters will be found from the pre-pruned tree, right? Not the initial full tree? And then cross validation will also be performed with the pre-pruned tree, not the initial full tree, to determine the final optimal alpha score?

And on a separate note, should only those alpha obtained from the tree trained on all the data be used when cross validating, and why? Is there no chance that some other, random alpha, might result in better performance? Considering that cross validation is done on several different test/train splits, and there will be those that do better with one alpha, and those that do better with other alphas, doesn't it make sense to try all possible alphas (from 0 to infinity) in the cross validation, not only those that give the best tree scores for the full tree? Isn't there a chance that some other alpha will give, on average, a lower sum of squared residuals than those obtained from the full tree?

Ответить

hopeless suprem - 01.04.2023 04:50

Josh, that's good that u support connection with us. There's one question I still can't understand: how do we choose alpha i.e. ok in the 1-st time this is always zero but how to get the next alpha? Are we doing this randomly or how? I really can't understand it).

Ответить

Moïse Iloo - 24.03.2023 12:32

Great explanation. I really enjoyed the video, but I'm a bit confused. Why use all of the data to build the initial tree in step 1? If we do that there is no test data left for testing the tree.

Ответить

Guohao Tan - 22.03.2023 13:09

is alpha = 0, 10000, 15000, 22000 arbitrary when you reduce the number of leaves?

Ответить

Marie Blanchemanche - 04.03.2023 17:05

love the reference to Phoebe ! also thank you, all your videos are very helpful

Ответить

Oddomania - 17.02.2023 04:47

you are awesome! clear! to the point!

Ответить

sudhanshu raj singh - 05.02.2023 05:29

Thank you so much for this amazing video. Very Amazing!

Ответить

Tamas Kiss - 30.01.2023 21:26

Thank you for these videos Josh, I really love learning from them. Just one question, when we do the cross validation, should not the alphas be different compared to those in the full sized training data and also on the different cross validation set? If yes, how should we decide which alpha should get the most vote as they are basically different on every training data?

Ответить

Swetha N - 03.01.2023 08:40

hi Josh. There is small typo error in the video @ 9.45 . The tree score for the tree with 2 leaves is 19243.7+(10000 * 2) but it was written as 19243.7+(10000 * 3)

Ответить

Luis Auza - 24.12.2022 02:34

I don't know why I spend a lot of time googling if I always end up watching statquest haahahha

Ответить

Sebastian Dahnert - 12.12.2022 19:57

smelly stat :D :D :D :D I love it

Ответить

ijjas Ahamed - 07.12.2022 17:14

i think in the last instead of taking average of lowest value of tree score you said lowest average of sum of square residuals. can you help me with this

Ответить

L DK - 02.11.2022 19:05

What a beautiful content!
I'm not an English speaker, but His video is more helpful than the Korean lecture provided by the college I attending.

Ответить

In Wonderland - 24.08.2022 19:34

🌲

Ответить

Abdul Sami - 06.08.2022 04:18

the average alpha value since is averaged across 10 folds won't be the exact same as any number of discrete values so in the end we will pick the discrete value that is closest to our averaged value right ? or is there like a distinct counter for each value and a score kept as to which one kept least SSR most times and on that note are these alpha values continuous or discrete across which we are making the pruned trees ?
apologies lots of question thrown there

Ответить

Besho Samir - 26.07.2022 19:23

Hi Josh , I hope u answer my question, I was searching for 3 days till now and i got nothing
I have 2 problem which is :
1_ How to determine alpha where there is more one leaf in the bottom of tree (i.e : u said increase alpha till pruning this leaf get lower score) , so if i have more than one leaf in the last level of tree, which one should i cut or should i look for all subtrees every time increasing the alpha it seems like it will get high complexity?

2_ in implementation when i will give the model the ideal alpha to implement the decision tree, how the model will know when building it in every step he take is that will lead to the subtree related to this alpha
finally , u r such amazing i really enjoyed every lesson i took from this channel

Ответить

Ste Dev - 19.07.2022 14:35

Hello,
first of all thanks for the great material you produced and shared, certainly among the clearest and effective I've come across.

My questions are about the cross-validation trees to determine the right alpha values.

As a premise, if I understood correctly, we first determine candidate alpha values by :
a) create a "full" tree from the full training+testing datasets
b) produce the corresponding family of "pruned" versions (and I guess asses their SSRs in preparation for the next step) based on the morphology of the "full" tree (meaning, all possible pruned trees are considered - is that correct?)
c) identify the candidate alpha values as those by which the "full" tree's score becomes higher than one of the pruned versions.

Assuming the above is correct, when we move on to cross-validate in order to ultimately determine the right alpha, I understand that we resample a training set (and a corresponding test set) for a number of times.
Each time, we build a new tree from the training set, and its associated set of pruned versions (let me call these tress a "cross-validation family of trees" (CVFTs)), and assess their SSRs based on the test set for the current round in order to contribute to ultimately calculate the actual alpha to use.

First question: how come every CVFTs in your slides has a number of members that equals the number of candidate values for alpha?
couldn't a resampled training set might give rise to trees with more or even fewer leaves - and corresponding pruned versions - than the tree that was used to identify the candidate alpha values? And in that case, the candidate alpha values might be in larger or smaller number than the possible number of trees in the CVFTs at hand.
I imagine that a possible answer is that the number of members in a CVFTs can actually be different than the number of candidate alphas, and that the pruned tress in a CVFTs are actually identified through their Tree Scores when each of the alpha candidate values is applied -- and if so I guess the issue is that perhaps this mechanism does not stand out 100% from the presentation...

Second question: if we assess the trees in each CVFTs only by their SSRs, wouldn't always the tree with more leaves (therefore alpha=0) win?

Thanks much

Ответить

Alex powell-Perry - 21.06.2022 17:45

HI Josh, do you have a copy of the dataset used in this example?

Ответить

Sachin Dev - 12.06.2022 21:04

@StatQuest with Josh Starmer, I have purchased your book but I didn't find these concepts (pruning, random forest, adaboost, gradient boosting) in that. Is there a way to access these presentation slides?

Ответить

Kerim Başbuğ - 10.06.2022 22:06

Perfect!

Ответить

Corrine Chou - 16.04.2022 19:57

Well explained, thank you so much

Ответить

Melisa Kamaci - 13.04.2022 14:31

The song cracked me up, perfect reference

Ответить

cường hồ bá - 19.02.2022 19:14

Very clear. Thank you very much

Ответить

Arash Vahabpour - 18.02.2022 21:47

I like his calculation sound, biboo-biboo-boobi :D

Ответить

ThePresistence - 23.01.2022 09:29

Teachers all over the world, must learn from josh bro!

Ответить

Mohammad Idrees Bhat Research Scholar - 17.01.2022 07:39

At the end this video gets hard to understand . I mean the presentation is not understandable....If we have to use ultimately 10 Fold cv. Than for what previous steps used for

Ответить

Ahmed Rejeb - 11.12.2021 13:48

this guy is a living legend ❤

Ответить

Ariella Vania Lynn - 06.11.2021 03:47

Sorry, I still have no idea, when we want to replace 52.8% and 100% leaves with 73.8% leaf, does it from taking the average of data that included in dosage <= 29?

Ответить

Diego Díaz - 22.10.2021 04:39

LMAO at the awesome song, and, as always, thank you for the video, I must say though, I do not dread terminology alert, quite the opposite, I actually get pumped when they show up, cuz Im about to learn something I can use for SWAG later.

Ответить