Why the gradient is the direction of steepest ascent

Why the gradient is the direction of steepest ascent

Khan Academy

8 лет назад

318,755 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Kaustubh Pandey
Kaustubh Pandey - 08.09.2023 19:10

When Geant first told us about the gradient giving the steepest ascent, I instantly imagined a graph where you have +ve partial derivatives in x and y directions, but a -ve one in between them (i.e. vector 1,1 etc). This would make the gradient vector not be the steepest ascent, rather the pure x or y direction (whichever is maximum slope).

But after this I realised there must be a concept of multivariable differentiablity because in this case there would be a sharp point at that location!

Ответить
Pratibha S
Pratibha S - 18.07.2023 22:26

Lucky that I found these intuitive explanations.. it truly feels great when you understand what's actually going on when we use a formula

Ответить
wajid ali
wajid ali - 07.07.2023 10:57

Kindly tell me that ....

As we know that...
Gradient is (n-1)d as compared to scalar function of (n)d.
..
Keeping in mind this thing.....
Does gradient at a point means A vector of global maximum locator in dimension less than one to scalar function.....??????

For example....
If phi=3d func.
Then
Obviously
...
Del phi= 2d vector at a point perp. To level surface...
Then
Does del means a vector in 2d that locates maximum value of phi???????

Ответить
daladno
daladno - 01.07.2023 23:41

that is dumb question. because that's what derivative function do. it's value tells you the direction(+ or - sign) and amount of gain when you change the argument of original function.

Ответить
mohd zikrya
mohd zikrya - 07.05.2023 05:31

Thanks

Ответить
esma
esma - 17.04.2023 11:22

everything was perfect until this gradient thing, i cant visualize it even though watching dozens of videos :(

Ответить
Kurren Nischal
Kurren Nischal - 10.04.2023 10:35

For me the easiest way is to think of a basic function
f : R -> R

The derivative of f at a point a tells you which direction to walk (left or right on the y axis) for the steepest ascent. This is the same thing for 2 dimensions

Ответить
Leonhard Olaye-Felix
Leonhard Olaye-Felix - 25.03.2023 19:35

For anyone confused, This is how I see it, if it helps at all:

If you’re at a point (a, b) then the directional derivative at that point looking in the direction v (where ||v||=1), is given by ∇f(a, b) • v.
When we say,”What is the direction of steepest ascent?”, what we are really asking is,”In what direction do I move in to produce the largest directional derivative?” In other words, we want to maximise ∇f(a, b) • v.
Given that we are at a single point, we can then say that ∇f(a, b) is a constant since it is evaluated by only using the particular values a and b. v is the only variable here and so the only way to maximise ∇f(a, b) • v is to alter the vector v. We know that the dot product of 2 vectors is maximised when they are pointing in the same direction as each other (see proof of this at bottom). Using this and the fact that we are not varying ∇f(a, b), we can conclude that to maximise the directional derivative (given by ∇f(a, b) • v) we must vary the vector v so that it points in the same direction as ∇f(a, b). And that’s it - we’ve shown how when the vector v, is in the same direction as the gradient function ∇f(a, b), it’s output (the directional derivative) is maximised. Saying that v is in the same direction as ∇f(a, b) is to say that v = (1/k) × ∇f(a, b) where k is the magnitude of ∇f(a, b). This is because v is a unit vector as previously stated. These 2 vectors (v and ∇f(a, b)) are pointing in the same direction so the conclusion can be drawn that ∇f(a, b) also points in the direction of steepest ascent

Proof to maximise dot product:

Consider two vectors a and b. The angle between a and b is given by cosθ = (a • b) / (|a| × |b|)
We can rearrange to say that a • b = |a||b|cosθ. To maximise the left hand side, which is what we want to prove, we must maximise cosθ. This achieves its maximum value at 1, which occurs when θ = 0. When θ = 0, we can visualise this by saying the vectors are parallel and overlapping each other. So we can conclude that a•b is maximised when a is parallel to, and overlapping b, Vice Versa.

Ответить
Ivan Luthfi
Ivan Luthfi - 23.01.2023 20:53

i think for steepest descent you need to put "-" , which come from cos theta , where theta = pi, gives the minimum for the objective function. cmiiw

Ответить
م/ حمزة خليل  - الكويت
م/ حمزة خليل - الكويت - 16.12.2022 11:05

👌👌👌👌👌👌

Ответить
Robert Wilson III
Robert Wilson III - 28.09.2022 11:08

The way I've always seen it is every sum of derivatives will be a combo of the partials. Therefore, the purest and least inefficient path is the least scaled up linear combination of the bases so path with least or minimized resistance and drag is the combo of just the two partial derivatives or the gradient vector.

I'm pretty sure you could prove this with the triangle inequality. Any sum of multiples of the bases vectors will have a longer hypotenuse than the sum of just the partial derivatives holding one side, like the height, constant on both. In other words, you'll waste energy traveling farther than necessary horizontally for the same movement vertically compared to the path of the pure partial derivatives. But you can't move faster than those, because you're limited by the physical shape of the surface you're on. You have no other choice, the constraints knock down other paths physically or hypothetically in the case of imagined scenarios.

Ответить
A Dalis
A Dalis - 07.04.2022 07:31

@steve manus, I
like the way you broke this concept down almost like a problem of Lagrange multiplier, where we are asked to find the optimal value of some function f(x,y) subject to the constraint of another function g(x,y) in two dimensions. Of course as you may know already or expect, the concept of the gradient is incorporated into the solution. It is typically the scenario that involves the balance between the independent variables so as to produce the maximum output for the function of such variables. Usually, the optimal value lies in between the extreme choices for the variables. Extreme X or extreme Y choices, as you noted, didn’t produce the maximum output for f(x,y). I was hoping that the video maker would stay away from the concept of directional derivative to explain the geometrical meaning of the gradient. In fact, I liked the mapping to a straight line explanation that he started with at the start of the video. I wished that he finished that up.

Ответить
Callistus Amilo
Callistus Amilo - 15.03.2022 05:38

It's like my mind just got illuminated!!
I've always underestimated the power of the Del operator.
Not only is this operator showing you the slope of a scalar field in the direction of a vector, it also points at the direction of the unit vector with the max gradient, and also tells you the size of the slope of this unit vector with a max value.

It's crazy how this new revelation changes your understanding of vector calculus.

Thanks a lot 🙏🏽🙏🏽🙏🏽🙏🏽🙏🏽

Ответить
Cauchy Schwarz
Cauchy Schwarz - 21.10.2021 19:21

I find this fact so confusing. If the gradient is the direction of steepest ascent, what is the direction of greatest net change? I always assumed the gradient points in the direction where the function changes the most.

Ответить
brzgr
brzgr - 30.09.2021 02:34

Thanks man . That was the best explaination ever . Simple and sweet . I was very confused but you saved me :)
Thanks

Ответить
Farhan Hyder
Farhan Hyder - 04.09.2021 03:55

Thank you. It's been bothering me for a long time

Ответить
allyourcode
allyourcode - 25.07.2021 00:09

Thanks! Here is how I would very concisely explain it: The problem of finding the direction of steepest ascent is exactly the problem of maximizing the directional derivative. The directional derivative is a dot product. When you are trying to maximize a dot product, choose the direction to make it parallel to the other vector, Since in this case, the other vector is given to be gradient(f)(v), THAT IS the direction of steepest ascent.

For me, the basic intuition comes from the dot product. The part that is not so obvious to me is that gradient(f) dot v is the "right" formula for the (definition of) the directional derivative.

Ответить
Aya Altayeb
Aya Altayeb - 24.07.2021 12:19

I never faced such a difficulty to understand something in maths like this 😂

Ответить
Palatus
Palatus - 19.07.2021 00:01

Excellent exposition ... thanks a lot !

Ответить
João Vitor Gomes
João Vitor Gomes - 12.06.2021 00:10

As a hiker I don't want the steepest ascent!

Ответить
kevin monroe
kevin monroe - 31.05.2021 01:17

I don't think the narrator ever really explained why the Gradient vector is ALWAYS in the direction of the steepest slope. As far as I could tell, he only explained how the directional unit vector interacts with the Gradient to reduce it or maintain it at its maximum.
If every point on a 3D-surface has an infinite number of tangent lines, all with potentially different slopes, how can taking partial derivatives from just 2 directions (x and y) and combining them into a vector always point in the direction of maximum steepness?

Ответить
CREEEPYassassin
CREEEPYassassin - 15.04.2021 14:37

I'm 5 videos in and my brain is on fire

Ответить
Arijit Das
Arijit Das - 01.02.2021 23:38

Learning this concept was no less than a sense of accomplishment itself! Grant is Grand! Cheers!

Ответить
Charu Singh
Charu Singh - 03.01.2021 16:43

I always wonder how Grant have developed such a great understanding of maths, he does magic with maths. !!!

Ответить
Daniel Jąszczyszczykołęczewski
Daniel Jąszczyszczykołęczewski - 16.12.2020 15:06

thank you very much for a video!!! :D
cheers from Ukraine

Ответить
Avadhoot Hede
Avadhoot Hede - 01.12.2020 13:50

Great

Ответить
Vasundara Krishnan
Vasundara Krishnan - 04.11.2020 17:35

To those who are confused, the direction of the steepest descent is the direction in which the directional derivative is maximum.
Directional derivative for any vector v = Gradient * vector v
So we maximise (gradient * vector v) to maximize directional derivative

Ответить
Danilo Espinoza Pino
Danilo Espinoza Pino - 30.10.2020 00:27

that last explanation kinda blew my mind a bit, nice!

Ответить
Jatin Saini
Jatin Saini - 08.10.2020 15:06

The best explanation of a gradient on Internet!

Ответить
Winston Peloso
Winston Peloso - 29.09.2020 15:58

I think it's hilarious how when Grant does videos for KA he repeats out loud what he's writing on the screen like Sal does. Makes me laugh every time

Ответить
Ahmad Izzuddin
Ahmad Izzuddin - 26.08.2020 11:21

My takeaway from this is to try reduce it to one dimension to understand what each element is doing to increase the "steepness" of a gradient.
Say f(x)=-x^3 then *df/dy=-3x^2*.
Since the constant of this example derivative is negative, that means x needs to move in the negative direction of the number line to increase the output of *f(x)*.
After you know what direction of the number line to go towards for each element, the magnitude you move for each element is proportional to the size of the element compared to the whole gradient vector.

Anyways thanks, great explanation Grant :)

Ответить
Steve Manus
Steve Manus - 07.07.2020 20:34

I was still struggling with the intuition of this and I think I have come up with another simple way to conceptualize the gradient and the characteristic of steepest ascent.

Start by remembering that the gradient is composed of the partial derivatives of the function in question. If you think about each partial derivative as just a simple rise-over-run problem, then you can see clearly that each partial derivative is going to give you the amount of change to the output (rise) as the input (run) is increased. Let's consider Grant's 3-dimensional example, so we can say that inputs for the multivariable problem are x and y and the output is z. Because the slopes vary based on location on the x-y grid, we need to pick a starting point on the grid. Let's just say (1,1). It doesn't matter. Now let's look at the x-z 2-dimensional problem first.

Let's say the partial derivative tells us that at point (1,1) for each 1 unit increase of x, z is increased by 4 units (i.e. the derivative = 4x). Since x can move in only 1 dimension, the only choice of direction we have is whether the change in x is positive or negative. Obviously, if we move x by -1, then z will decrease by 4 units. So, if we need to choose in which direction we move x to increase z, we know that it is in the positive direction. If we decrease x, z will decrease as well.

Now, do the same thing for y and z and let's say that the partial derivative for y at (1,1) is 3y. This means that a 1 unit increase in y will result in a 3 unit increase in z. Again, if you want to increase z by moving y, increase y, don't decrease it.

Now, let's put the two variables together. We now have a choice of directions. It's no longer sufficient to say that we need to increase x and we need to increase y. (Though that is half the battle). We need to also decide the relative value of increasing x versus increasing y. When we choose a direction we are making a tradeoff between the relative movements in each of the basis (x, y) directions.

Let's say that we are allowed to move in any direction by 5 units (if you haven't noticed yet, I like to keep my Pythagorean theorem problems simple!). The question is - "in what direction can I move 5 units to maximize the increase in z?" This would correspond to the direction of steepest ascent. So, let's say we use all of our 5 units in the x direction. This corresponds to the vector [5,0]. Since the increase in z is 4x + 3y, the total increase in z will be 20 (5∙4) . If, on the other hand, we use all of our 5 units in the y direction [0, 5], the total increase in z will be 15 (5∙3).

But the beauty of using vector geometry is that we can use our 5 units in a direction that will give us effectively a movement in the x direction of 4 and in the y direction of 3. We get 7 units of movement for the price of 5! Of course, that's the hypotenuse of our 4 by 3 right triangle. So, by following the vector [4, 3], z is increased by 4∙4 from the x direction movement and by 3∙3 from the y direction movement for a total increase of 25! I think you can see that any other direction will produce a smaller increase.

And, of course, the vector [4, 3] is exactly our gradient!

It's interesting to also think about when one of the components of the gradient is negative. Let's imagine that instead of 3y, our partial y derivative is -3y. This means that a positive movement in y produces a decrease in z. Remembering that we can only move y in 1 dimension, if we move y downward then of course z will increase. So you can see that the gradient vector [4, -3] will produce the same 25 unit increase in z for a 5 unit move IF we move y in the negative (downward) direction. Just follow the vector!

Of course this is calculus, so these 3, 4, and 5 unit moves are very, very small. 😊

Ответить
frank zhang
frank zhang - 05.07.2020 11:55

Thanks much, but i am still not clear why gradient direction can let the function f have the steepest change. How gradient relates to the steepest output change of function f? Thanks very much.

Ответить
poiuwn wang
poiuwn wang - 08.05.2020 14:03

Derivation of del in direcion of del that is equal to magnitude of del gives a lot of intuition. Nice!

Ответить
Tim Goppelsroeder
Tim Goppelsroeder - 07.05.2020 00:14

How can the gradient which is a vector dotted with the vector v equal the normalized version of the gradient vector???

Ответить
Nijat Shukurov
Nijat Shukurov - 22.04.2020 02:29

Thank you 3blue1brown

Ответить
Dabod
Dabod - 21.04.2020 20:01

You just explained the Cauchy-Schwarz equation in the end, didn't you?

Ответить
CSMole
CSMole - 17.04.2020 14:01

I was initially confused because I was thinking of a situation where:
along x axis and y axis the graph is kinda stable and mildly changing but in quadrant one there is a freaking big valley and ma poor little point is at the origin point😂
I was worried that no info about that valley is shown in gradient and ma point don't know where to go😂


then i realize i was outside the scope of this discussion

Ответить
Dmitry Abramov
Dmitry Abramov - 09.04.2020 22:51

Thanks a lot Grant! It was a great pleasure to see you here. I am a big fan of 3B1B.

Ответить
andrei115
andrei115 - 03.02.2020 21:56

I don't understand why the directional derivative gives the slope. Firstly, if I have the slope of a function = df/dx, then I can only multiply it with dx if I expect the resulting change df to match the function graph. Otherwise it matches only the slope line. For directional gradients, you mentioned the vector length should be 1 instead of infinitely small. That means that a gradient component df/dx multiplied with the corresponding directional vector x-component will result in a change that will align with the slope line, but not with the graph. Can somebody cast light into this issue?

Ответить
andrei115
andrei115 - 03.02.2020 21:45

could you provide us a video explaining why a dot b = a x b * cos(a,b)? I understand it for geometric vectors, but it's unclear for me how this can be scaled to n-dimensional vectors.

Ответить
Dominic Ellis
Dominic Ellis - 30.12.2019 22:48

Does another change w

Ответить
Emir Nurmatbekov
Emir Nurmatbekov - 23.12.2019 23:51

but in case if we start from (0,0), gradient is also (0,0). Does it mean that the steepest direction is any direction or what?

Ответить
Rafael
Rafael - 07.12.2019 23:15

What if I want to find the least steepest??

Ответить
trendy pie
trendy pie - 22.10.2019 23:29

i am wondering if a vector V is doted with any vector A other than gradient vector , it still give the max value if V is parallel to A . so still the video doesn't prove that gradient is the steepest ascent ,,correct me if i am wrong

Ответить
anon. doggo
anon. doggo - 22.10.2019 09:17

Explains nothing

Ответить