Комментарии:
with alpha as 10000 we get a tree with 3 leaves ( say tree 2 ).
When we increase alpha we start pruning from tree 2 or the original tree ?
From the video it looks like we start runing from tree 2, but i dont quite get the intuition why we start pruning from tree 2 and not the original tree.
thanks joshh for this beautiful video i have a question in the statquest classification tree using python you extract the alphas only from train data however in this statquest you extract.alphas from all data(train and test data) ?
Ответитьsir i am learning ML from your videos and everyday i am forced to comment expressing the beauty with which the concept is explained..and the best part is you still clear our doubts even after 3 years..for those who don't know sir has also written a book which is too good
ОтветитьHey, Josh! Is it ok that we are using train+test to find alpha values? I mean that we are peeping into the future. do we know good thresholds using a test sample(not only train), or am I wrong? Thank you
ОтветитьStill super confused on how we picked the best alpha to begin with. Great video nonetheless
Ответитьyou are literally doing god's work
ОтветитьThe best way I can thank you with, is advising you to learn and search about "ISLAM", and be a Muslim (muslim mean one who submits to ALLAH).
If you want there is channel called Boby's perspective of a person called boby that have a long journey through many believes before he reached the TRUTH, this journey is documented is his channel (Boby's perspective).
Hey Josh, how can we choose our alpha wisely? So that the tree with the minimum tree score will really work well for testing data too. Is there a specific rule of thumb?
ОтветитьSo a question. We learned previously that cross validation is used to test the model on different "blocks" of the test set. But in this case you are advocating for the cross validation to be used for hyper parameter tuning. Does that mean the test sets remain constant?
ОтветитьCost Complexity Pruning is otherwise known as post-pruning, while limiting the tree depth, enforcing a minimum amount of samples per leaf/split, a minimum impurity decrease is known as pre-pruning, correct? My question is, can you apply pre-pruning first, and then apply post-pruning to the pre-pruned tree? If yes, then i assume that the alpha parameters will be found from the pre-pruned tree, right? Not the initial full tree? And then cross validation will also be performed with the pre-pruned tree, not the initial full tree, to determine the final optimal alpha score?
And on a separate note, should only those alpha obtained from the tree trained on all the data be used when cross validating, and why? Is there no chance that some other, random alpha, might result in better performance? Considering that cross validation is done on several different test/train splits, and there will be those that do better with one alpha, and those that do better with other alphas, doesn't it make sense to try all possible alphas (from 0 to infinity) in the cross validation, not only those that give the best tree scores for the full tree? Isn't there a chance that some other alpha will give, on average, a lower sum of squared residuals than those obtained from the full tree?
Josh, that's good that u support connection with us. There's one question I still can't understand: how do we choose alpha i.e. ok in the 1-st time this is always zero but how to get the next alpha? Are we doing this randomly or how? I really can't understand it).
ОтветитьGreat explanation. I really enjoyed the video, but I'm a bit confused. Why use all of the data to build the initial tree in step 1? If we do that there is no test data left for testing the tree.
Ответитьis alpha = 0, 10000, 15000, 22000 arbitrary when you reduce the number of leaves?
Ответитьlove the reference to Phoebe ! also thank you, all your videos are very helpful
Ответитьyou are awesome! clear! to the point!
ОтветитьThank you so much for this amazing video. Very Amazing!
ОтветитьThank you for these videos Josh, I really love learning from them. Just one question, when we do the cross validation, should not the alphas be different compared to those in the full sized training data and also on the different cross validation set? If yes, how should we decide which alpha should get the most vote as they are basically different on every training data?
Ответитьhi Josh. There is small typo error in the video @ 9.45 . The tree score for the tree with 2 leaves is 19243.7+(10000 * 2) but it was written as 19243.7+(10000 * 3)
ОтветитьI don't know why I spend a lot of time googling if I always end up watching statquest haahahha
Ответитьsmelly stat :D :D :D :D I love it
Ответитьi think in the last instead of taking average of lowest value of tree score you said lowest average of sum of square residuals. can you help me with this
ОтветитьWhat a beautiful content!
I'm not an English speaker, but His video is more helpful than the Korean lecture provided by the college I attending.
🌲
Ответитьthe average alpha value since is averaged across 10 folds won't be the exact same as any number of discrete values so in the end we will pick the discrete value that is closest to our averaged value right ? or is there like a distinct counter for each value and a score kept as to which one kept least SSR most times and on that note are these alpha values continuous or discrete across which we are making the pruned trees ?
apologies lots of question thrown there
Hi Josh , I hope u answer my question, I was searching for 3 days till now and i got nothing
I have 2 problem which is :
1_ How to determine alpha where there is more one leaf in the bottom of tree (i.e : u said increase alpha till pruning this leaf get lower score) , so if i have more than one leaf in the last level of tree, which one should i cut or should i look for all subtrees every time increasing the alpha it seems like it will get high complexity?
2_ in implementation when i will give the model the ideal alpha to implement the decision tree, how the model will know when building it in every step he take is that will lead to the subtree related to this alpha
finally , u r such amazing i really enjoyed every lesson i took from this channel
Hello,
first of all thanks for the great material you produced and shared, certainly among the clearest and effective I've come across.
My questions are about the cross-validation trees to determine the right alpha values.
As a premise, if I understood correctly, we first determine candidate alpha values by :
a) create a "full" tree from the full training+testing datasets
b) produce the corresponding family of "pruned" versions (and I guess asses their SSRs in preparation for the next step) based on the morphology of the "full" tree (meaning, all possible pruned trees are considered - is that correct?)
c) identify the candidate alpha values as those by which the "full" tree's score becomes higher than one of the pruned versions.
Assuming the above is correct, when we move on to cross-validate in order to ultimately determine the right alpha, I understand that we resample a training set (and a corresponding test set) for a number of times.
Each time, we build a new tree from the training set, and its associated set of pruned versions (let me call these tress a "cross-validation family of trees" (CVFTs)), and assess their SSRs based on the test set for the current round in order to contribute to ultimately calculate the actual alpha to use.
First question: how come every CVFTs in your slides has a number of members that equals the number of candidate values for alpha?
couldn't a resampled training set might give rise to trees with more or even fewer leaves - and corresponding pruned versions - than the tree that was used to identify the candidate alpha values? And in that case, the candidate alpha values might be in larger or smaller number than the possible number of trees in the CVFTs at hand.
I imagine that a possible answer is that the number of members in a CVFTs can actually be different than the number of candidate alphas, and that the pruned tress in a CVFTs are actually identified through their Tree Scores when each of the alpha candidate values is applied -- and if so I guess the issue is that perhaps this mechanism does not stand out 100% from the presentation...
Second question: if we assess the trees in each CVFTs only by their SSRs, wouldn't always the tree with more leaves (therefore alpha=0) win?
Thanks much
HI Josh, do you have a copy of the dataset used in this example?
Ответить@StatQuest with Josh Starmer, I have purchased your book but I didn't find these concepts (pruning, random forest, adaboost, gradient boosting) in that. Is there a way to access these presentation slides?
ОтветитьPerfect!
ОтветитьWell explained, thank you so much
ОтветитьThe song cracked me up, perfect reference
ОтветитьVery clear. Thank you very much
ОтветитьI like his calculation sound, biboo-biboo-boobi :D
ОтветитьTeachers all over the world, must learn from josh bro!
ОтветитьAt the end this video gets hard to understand . I mean the presentation is not understandable....If we have to use ultimately 10 Fold cv. Than for what previous steps used for
Ответитьthis guy is a living legend ❤
ОтветитьSorry, I still have no idea, when we want to replace 52.8% and 100% leaves with 73.8% leaf, does it from taking the average of data that included in dosage <= 29?
ОтветитьLMAO at the awesome song, and, as always, thank you for the video, I must say though, I do not dread terminology alert, quite the opposite, I actually get pumped when they show up, cuz Im about to learn something I can use for SWAG later.
ОтветитьYou got me at Smelly Stat!
Ответить