Review and intuition why we divide by n-1 for the unbiased sample | Khan Academy

Review and intuition why we divide by n-1 for the unbiased sample | Khan Academy

Khan Academy

1 десятилетие назад

349,904 Просмотров

Courses on Khan Academy are always 100% free. Start practicing—and saving your progress—now: https://www.khanacademy.org/math/ap-statistics/summarizing-quantitative-data-ap/more-standard-deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-sample-variance

Reviewing the population mean, sample mean, population variance, sample variance and building an intuition for why we divide by n-1 for the unbiased sample variance

Practice this lesson yourself on KhanAcademy.org right now:
https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/e/variance?utm_source=YT&utm_medium=Desc&utm_campaign=ProbabilityandStatistics

Watch the next lesson: https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/simulation-showing-bias-in-sample-variance?utm_source=YT&utm_medium=Desc&utm_campaign=ProbabilityandStatistics

Missed the previous lesson?
https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/sample-variance?utm_source=YT&utm_medium=Desc&utm_campaign=ProbabilityandStatistics

Probability and statistics on Khan Academy: We dare you to go through a day in which you never consider or use probability. Did you check the weather forecast? Busted! Did you decide to go through the drive through lane vs walk in? Busted again! We are constantly creating hypotheses, making predictions, testing, and analyzing. Our lives are full of probabilities! Statistics is related to probability because much of the data we use when determining probable outcomes comes from our understanding of statistics. In these tutorials, we will cover a range of topics, some which include: independent events, dependent probability, combinatorics, hypothesis testing, descriptive statistics, random variables, probability distributions, regression, and inferential statistics. So buckle up and hop on for a wild ride. We bet you're going to be challenged AND love it!

About Khan Academy: Khan Academy offers practice exercises, instructional videos, and a personalized learning dashboard that empower learners to study at their own pace in and outside of the classroom. We tackle math, science, computer programming, history, art history, economics, and more. Our math missions guide learners from kindergarten to calculus using state-of-the-art, adaptive technology that identifies strengths and learning gaps. We've also partnered with institutions like NASA, The Museum of Modern Art, The California Academy of Sciences, and MIT to offer specialized content.

For free. For everyone. Forever. #YouCanLearnAnything

Subscribe to KhanAcademy’s Probability and Statistics channel:
https://www.youtube.com/channel/UCRXuOXLW3LcQLWvxbZiIZ0w?sub_confirmation=1
Subscribe to KhanAcademy: https://www.youtube.com/subscription_center?add_user=khanacademy
Ссылки и html тэги не поддерживаются


Комментарии:

five
five - 01.11.2023 05:33

This is essential reading. A book of similar stripe became a cornerstone in my personal growth. "Game Theory and the Pursuit of Algorithmic Fairness" by Jack Frostwell

Ответить
Julio Cesar Jovelina
Julio Cesar Jovelina - 04.09.2023 04:05

Why do we not use |Xi - x̄ | instead of (Xi - x̄ )² ?

Ответить
Julio Cesar Jovelina
Julio Cesar Jovelina - 04.09.2023 04:00

What if you take the 3 highest values?

Ответить
Dann
Dann - 09.04.2023 04:00

whos this man? he knows so much and explains so majestic. I wonder why he does not have a statue in the main square of my city ? he deserve a few

Ответить
posthocprior
posthocprior - 20.03.2023 20:41

That was unclear.

Ответить
File
File - 29.12.2022 14:16

Thank you very much for your video, it was very very good at explaining. But I have one more question, If descriptive statistics do not try to generalize to a population (since there is no uncertainty in descriptive statistics), then why does the sample standard deviation try to best estimate the population mean? Yet it is still considered a descriptive statistic

Ответить
carlneedsajob
carlneedsajob - 14.11.2022 02:21

thank you sal :4)

Ответить
Ben Wearne
Ben Wearne - 31.10.2022 10:34

N-1 is "better", but it is still very flawed

Ответить
Jeremy Falcon
Jeremy Falcon - 25.09.2022 20:25

Just gotta say, you're videos are awesome. Glad they exist.

Ответить
Sabreen Elzein
Sabreen Elzein - 01.09.2022 23:59

So instead of the sample lying somewhere much lower than the true population mean, what if it's lying much higher? Would it be correct to use n+1 instead of n-1 in order to deliberately make the sample variance smaller?

Ответить
adarsh tiwari
adarsh tiwari - 08.08.2022 22:07

After 3 videos, I finally understood this n-1. Basically when we consider a sample from our population and calculate the mean for it, it may or may not be as close to the overall population mean (which is thr mean that matters) so to lower the possibility of a highly distinct sample mean/variance we use n-1 to reach at least near the population mean...

Ответить
sh di
sh di - 08.08.2022 17:59

A very interesting and important discussion. I made a break in the middle and thought about it by myself. I have a rather short explanation: If the sample size n is very small, such as 3, the variance calculated for the sample has more chance to be very different from the actual variance. The smaller the n is, the more effect has this '-1' on the result.
Why do we use '-1' and not some other values like '-2', I think it is just a tradition. For the smallest sample size of 2, this unbiased variance can still be calculated. However, it is not really purely 'unbiased', just relatively 'unbiased'.

Ответить
Varun Ahlawat
Varun Ahlawat - 27.07.2022 05:52

Doesn't explain the point sal, sample could've been among the higher than mu values only; in that case this would be completely opposite, we should've divided by n+1 then

Ответить
Antygona
Antygona - 18.05.2022 01:23

This is not explained at all.

Ответить
Shivay Shakti
Shivay Shakti - 15.05.2022 13:29

But the same can be there for the other end where we would overestimate it?

Ответить
AJ
AJ - 04.05.2022 04:28

Awesome video! Thank you!

Ответить
pyguy
pyguy - 17.04.2022 07:45

Is n-1 mathematically derived?

Could we justify doing something else, e.g. using "0.85n" to build in conservativeness even for large n?

Ответить
MANU PANDIT
MANU PANDIT - 25.11.2021 07:59

Hi
How is this S2 variance of sample different from the sigma squared /n formula ( population variance /n) which is also the sample variance

thanks

Ответить
DarkTealGlasses
DarkTealGlasses - 04.10.2021 08:57

Much better than what my school teacher taught me

Ответить
Arthur Pletcher
Arthur Pletcher - 20.08.2021 17:32

Because of the upper and lower boundaries, samples are biased to be less spread, compared to the population mean, which is typically more centralized.

Ответить
Swapnamay Sen
Swapnamay Sen - 12.06.2021 06:25

What is bogus logic....khan academy is jack of all trade,master of none

Ответить
scott lomagistro
scott lomagistro - 21.05.2021 14:30

I get the math.... What I don't get is how you're able to write with the drawing/annotation feature so freakin' nicely?!?!? Either you missed your calling as a steady-handed microsurgeon or there is some sort of stabilization assistance with the program you're using.

Ответить
Prabhjyot SIngh
Prabhjyot SIngh - 18.05.2021 20:53

by this logic it can be n+1 also ig

Ответить
VGF80
VGF80 - 30.12.2019 16:49

Let's say a report comes out that mentions standard deviation. How are we supposed to know which formula was used to calculate that standard deviation.

Ответить
yeast
yeast - 19.09.2019 00:44

What if the sample mean is far greater than the population mean, then would you not divide by n+1 in order that your sample mean is not an overestimate?

Ответить
liu shao min
liu shao min - 23.08.2019 05:14

The analogy you’re using is probably not very convincing/intuitive enough. Because there’s also a likelihood that the sample is over-estimating the population mean, so why don’t we divide it by n+1?

Ответить
Jack
Jack - 14.07.2019 20:58

This is terrible. Still no explanation of why it is unbiased if using n-1.

Ответить
john Hendrickson
john Hendrickson - 08.06.2019 22:20

I would like to know why we use the square of the difference between x and xbar, and not the absolute value of the difference?

Ответить
MrVpassenheim
MrVpassenheim - 16.10.2018 19:58

NOT one of Khan Academy's shining moments. You're other video (thanks Dhiraj Budhrani) is MUCH better (with the simulation & a mathematical explanation!).

Ответить
Arvin Pillai
Arvin Pillai - 04.10.2018 16:31

Starts at 5.00

Ответить
Jayrald Basan
Jayrald Basan - 02.06.2018 07:07

So this means that the n-1 of the sample variance equation was just an arbitrarily chosen value because it's empirically closer to the actual population variance? Or is there any equation or a logical path in deriving the n-1? I kinda see that it's the former but kinda feel that there might be a theory that could explain why n-1 is the most appropriate and not any other value and that it's just a natural consequence of our math. Anyone who does have one, please tell me!
Thank you for the video Khan Academy! It was very informative!

Ответить
David Pastor
David Pastor - 18.05.2018 14:27

what if all the samples you took were greater than the mean? then you would be overestimating even more if you divide by n-1

Ответить
imbolc
imbolc - 05.05.2018 08:24

I can't understand why we would underestimate variance in general this way. Let's take population [0, 10, 20] and its sample [0, 20]. They have the same mean 10, and variance of the population is (100 + 100 + 0) / 3, while variance of the sample is (100 + 100) / 2, so we overestimate the variance.

Ответить
ᴠᴧᴨᴛᴧᴃᴌᴧcᴋ
ᴠᴧᴨᴛᴧᴃᴌᴧcᴋ - 30.03.2018 14:44

So I guess the biased variance is better if your sample is still close to the entire population

Ответить
GG
GG - 14.01.2018 17:52

I had the intuition that overestimation and underestimation would compensate each other. Why is it not the case?

Ответить
Lucia Breccia
Lucia Breccia - 18.02.2017 23:38

Why isn't this video on the statistics playlist?

Ответить
clancym1
clancym1 - 15.01.2016 03:23

this does not give an explanation for why it is exactly n-1.

Ответить
J S
J S - 01.09.2015 07:05

still dont get it. yes you would be underestimating it if u take the sample cluster below the mean. but if the cluster is above the mean? you would be overestimating it! seems arbitrary to me.

Ответить
f lotars
f lotars - 20.12.2014 19:37

So why minus - 1? Why not - 2 ? Or minus 6,345 % ? This is still not an explanation of the n - 1 :-(.

Ответить
Casey
Casey - 07.12.2014 21:53

Didn't say anything about n-1, misleading title.

Ответить
Upgrad3r
Upgrad3r - 12.10.2014 19:03

I love you, fuck the rest of explanations on internet, this made me understand

Ответить
Matthew
Matthew - 14.09.2014 14:09

If you want a more technical explanation/proof, Wikipedia Bessel's Correction. This video has some good intuition though.

Ответить
shahdatyoutube
shahdatyoutube - 02.03.2014 00:14

Ответить
EntropicalNature
EntropicalNature - 04.12.2012 21:51

It has to do with the fact that on an interval with N points there are N-1 smallests subintervals. Consider for example the interval [1,4] on the natural number line in which case N=4 You can subdivide it only in [1,2] [2,3] [3,4] which is 3 not 4 smallest subintervals.

Ответить
markbro2
markbro2 - 25.11.2012 01:44

The most common question seems to be why n-1 and not n-2 or n-3424342 (any other number). The way I understand it comes from the definition of unbiased estimators (look it up on wikipedia), in a nutshell an unbiased estimator is one whose expected value equals the value it is estimating. n-1 is known as bessel's correction (also on wikipedia). Here you can see that E[S^2]=sigma^2, hence it is unbiased. This makes sense; if you take enough samples and average them, you get true pop value.

Ответить
Drop Dead Fred
Drop Dead Fred - 23.11.2012 08:45

I GET IT! I had to work out the proof and think about it really hard, but I get it! I have an intuition for why n-1 makes sense! Message me with your questions, because I don't think I can explain it easily in the comment boxes.

Ответить