Комментарии:
When Geant first told us about the gradient giving the steepest ascent, I instantly imagined a graph where you have +ve partial derivatives in x and y directions, but a -ve one in between them (i.e. vector 1,1 etc). This would make the gradient vector not be the steepest ascent, rather the pure x or y direction (whichever is maximum slope).
But after this I realised there must be a concept of multivariable differentiablity because in this case there would be a sharp point at that location!
Lucky that I found these intuitive explanations.. it truly feels great when you understand what's actually going on when we use a formula
ОтветитьKindly tell me that ....
As we know that...
Gradient is (n-1)d as compared to scalar function of (n)d.
..
Keeping in mind this thing.....
Does gradient at a point means A vector of global maximum locator in dimension less than one to scalar function.....??????
For example....
If phi=3d func.
Then
Obviously
...
Del phi= 2d vector at a point perp. To level surface...
Then
Does del means a vector in 2d that locates maximum value of phi???????
that is dumb question. because that's what derivative function do. it's value tells you the direction(+ or - sign) and amount of gain when you change the argument of original function.
ОтветитьThanks
Ответитьeverything was perfect until this gradient thing, i cant visualize it even though watching dozens of videos :(
ОтветитьFor me the easiest way is to think of a basic function
f : R -> R
The derivative of f at a point a tells you which direction to walk (left or right on the y axis) for the steepest ascent. This is the same thing for 2 dimensions
For anyone confused, This is how I see it, if it helps at all:
If you’re at a point (a, b) then the directional derivative at that point looking in the direction v (where ||v||=1), is given by ∇f(a, b) • v.
When we say,”What is the direction of steepest ascent?”, what we are really asking is,”In what direction do I move in to produce the largest directional derivative?” In other words, we want to maximise ∇f(a, b) • v.
Given that we are at a single point, we can then say that ∇f(a, b) is a constant since it is evaluated by only using the particular values a and b. v is the only variable here and so the only way to maximise ∇f(a, b) • v is to alter the vector v. We know that the dot product of 2 vectors is maximised when they are pointing in the same direction as each other (see proof of this at bottom). Using this and the fact that we are not varying ∇f(a, b), we can conclude that to maximise the directional derivative (given by ∇f(a, b) • v) we must vary the vector v so that it points in the same direction as ∇f(a, b). And that’s it - we’ve shown how when the vector v, is in the same direction as the gradient function ∇f(a, b), it’s output (the directional derivative) is maximised. Saying that v is in the same direction as ∇f(a, b) is to say that v = (1/k) × ∇f(a, b) where k is the magnitude of ∇f(a, b). This is because v is a unit vector as previously stated. These 2 vectors (v and ∇f(a, b)) are pointing in the same direction so the conclusion can be drawn that ∇f(a, b) also points in the direction of steepest ascent
Proof to maximise dot product:
Consider two vectors a and b. The angle between a and b is given by cosθ = (a • b) / (|a| × |b|)
We can rearrange to say that a • b = |a||b|cosθ. To maximise the left hand side, which is what we want to prove, we must maximise cosθ. This achieves its maximum value at 1, which occurs when θ = 0. When θ = 0, we can visualise this by saying the vectors are parallel and overlapping each other. So we can conclude that a•b is maximised when a is parallel to, and overlapping b, Vice Versa.
i think for steepest descent you need to put "-" , which come from cos theta , where theta = pi, gives the minimum for the objective function. cmiiw
Ответить👌👌👌👌👌👌
ОтветитьThe way I've always seen it is every sum of derivatives will be a combo of the partials. Therefore, the purest and least inefficient path is the least scaled up linear combination of the bases so path with least or minimized resistance and drag is the combo of just the two partial derivatives or the gradient vector.
I'm pretty sure you could prove this with the triangle inequality. Any sum of multiples of the bases vectors will have a longer hypotenuse than the sum of just the partial derivatives holding one side, like the height, constant on both. In other words, you'll waste energy traveling farther than necessary horizontally for the same movement vertically compared to the path of the pure partial derivatives. But you can't move faster than those, because you're limited by the physical shape of the surface you're on. You have no other choice, the constraints knock down other paths physically or hypothetically in the case of imagined scenarios.
@steve manus, I
like the way you broke this concept down almost like a problem of Lagrange multiplier, where we are asked to find the optimal value of some function f(x,y) subject to the constraint of another function g(x,y) in two dimensions. Of course as you may know already or expect, the concept of the gradient is incorporated into the solution. It is typically the scenario that involves the balance between the independent variables so as to produce the maximum output for the function of such variables. Usually, the optimal value lies in between the extreme choices for the variables. Extreme X or extreme Y choices, as you noted, didn’t produce the maximum output for f(x,y). I was hoping that the video maker would stay away from the concept of directional derivative to explain the geometrical meaning of the gradient. In fact, I liked the mapping to a straight line explanation that he started with at the start of the video. I wished that he finished that up.
It's like my mind just got illuminated!!
I've always underestimated the power of the Del operator.
Not only is this operator showing you the slope of a scalar field in the direction of a vector, it also points at the direction of the unit vector with the max gradient, and also tells you the size of the slope of this unit vector with a max value.
It's crazy how this new revelation changes your understanding of vector calculus.
Thanks a lot 🙏🏽🙏🏽🙏🏽🙏🏽🙏🏽
I find this fact so confusing. If the gradient is the direction of steepest ascent, what is the direction of greatest net change? I always assumed the gradient points in the direction where the function changes the most.
ОтветитьThanks man . That was the best explaination ever . Simple and sweet . I was very confused but you saved me :)
Thanks
Thank you. It's been bothering me for a long time
ОтветитьThanks! Here is how I would very concisely explain it: The problem of finding the direction of steepest ascent is exactly the problem of maximizing the directional derivative. The directional derivative is a dot product. When you are trying to maximize a dot product, choose the direction to make it parallel to the other vector, Since in this case, the other vector is given to be gradient(f)(v), THAT IS the direction of steepest ascent.
For me, the basic intuition comes from the dot product. The part that is not so obvious to me is that gradient(f) dot v is the "right" formula for the (definition of) the directional derivative.
I never faced such a difficulty to understand something in maths like this 😂
ОтветитьExcellent exposition ... thanks a lot !
ОтветитьAs a hiker I don't want the steepest ascent!
ОтветитьI don't think the narrator ever really explained why the Gradient vector is ALWAYS in the direction of the steepest slope. As far as I could tell, he only explained how the directional unit vector interacts with the Gradient to reduce it or maintain it at its maximum.
If every point on a 3D-surface has an infinite number of tangent lines, all with potentially different slopes, how can taking partial derivatives from just 2 directions (x and y) and combining them into a vector always point in the direction of maximum steepness?
I'm 5 videos in and my brain is on fire
ОтветитьLearning this concept was no less than a sense of accomplishment itself! Grant is Grand! Cheers!
ОтветитьI always wonder how Grant have developed such a great understanding of maths, he does magic with maths. !!!
Ответитьthank you very much for a video!!! :D
cheers from Ukraine
Great
ОтветитьTo those who are confused, the direction of the steepest descent is the direction in which the directional derivative is maximum.
Directional derivative for any vector v = Gradient * vector v
So we maximise (gradient * vector v) to maximize directional derivative
that last explanation kinda blew my mind a bit, nice!
ОтветитьThe best explanation of a gradient on Internet!
ОтветитьI think it's hilarious how when Grant does videos for KA he repeats out loud what he's writing on the screen like Sal does. Makes me laugh every time
ОтветитьMy takeaway from this is to try reduce it to one dimension to understand what each element is doing to increase the "steepness" of a gradient.
Say f(x)=-x^3 then *df/dy=-3x^2*.
Since the constant of this example derivative is negative, that means x needs to move in the negative direction of the number line to increase the output of *f(x)*.
After you know what direction of the number line to go towards for each element, the magnitude you move for each element is proportional to the size of the element compared to the whole gradient vector.
Anyways thanks, great explanation Grant :)
I was still struggling with the intuition of this and I think I have come up with another simple way to conceptualize the gradient and the characteristic of steepest ascent.
Start by remembering that the gradient is composed of the partial derivatives of the function in question. If you think about each partial derivative as just a simple rise-over-run problem, then you can see clearly that each partial derivative is going to give you the amount of change to the output (rise) as the input (run) is increased. Let's consider Grant's 3-dimensional example, so we can say that inputs for the multivariable problem are x and y and the output is z. Because the slopes vary based on location on the x-y grid, we need to pick a starting point on the grid. Let's just say (1,1). It doesn't matter. Now let's look at the x-z 2-dimensional problem first.
Let's say the partial derivative tells us that at point (1,1) for each 1 unit increase of x, z is increased by 4 units (i.e. the derivative = 4x). Since x can move in only 1 dimension, the only choice of direction we have is whether the change in x is positive or negative. Obviously, if we move x by -1, then z will decrease by 4 units. So, if we need to choose in which direction we move x to increase z, we know that it is in the positive direction. If we decrease x, z will decrease as well.
Now, do the same thing for y and z and let's say that the partial derivative for y at (1,1) is 3y. This means that a 1 unit increase in y will result in a 3 unit increase in z. Again, if you want to increase z by moving y, increase y, don't decrease it.
Now, let's put the two variables together. We now have a choice of directions. It's no longer sufficient to say that we need to increase x and we need to increase y. (Though that is half the battle). We need to also decide the relative value of increasing x versus increasing y. When we choose a direction we are making a tradeoff between the relative movements in each of the basis (x, y) directions.
Let's say that we are allowed to move in any direction by 5 units (if you haven't noticed yet, I like to keep my Pythagorean theorem problems simple!). The question is - "in what direction can I move 5 units to maximize the increase in z?" This would correspond to the direction of steepest ascent. So, let's say we use all of our 5 units in the x direction. This corresponds to the vector [5,0]. Since the increase in z is 4x + 3y, the total increase in z will be 20 (5∙4) . If, on the other hand, we use all of our 5 units in the y direction [0, 5], the total increase in z will be 15 (5∙3).
But the beauty of using vector geometry is that we can use our 5 units in a direction that will give us effectively a movement in the x direction of 4 and in the y direction of 3. We get 7 units of movement for the price of 5! Of course, that's the hypotenuse of our 4 by 3 right triangle. So, by following the vector [4, 3], z is increased by 4∙4 from the x direction movement and by 3∙3 from the y direction movement for a total increase of 25! I think you can see that any other direction will produce a smaller increase.
And, of course, the vector [4, 3] is exactly our gradient!
It's interesting to also think about when one of the components of the gradient is negative. Let's imagine that instead of 3y, our partial y derivative is -3y. This means that a positive movement in y produces a decrease in z. Remembering that we can only move y in 1 dimension, if we move y downward then of course z will increase. So you can see that the gradient vector [4, -3] will produce the same 25 unit increase in z for a 5 unit move IF we move y in the negative (downward) direction. Just follow the vector!
Of course this is calculus, so these 3, 4, and 5 unit moves are very, very small. 😊
Thanks much, but i am still not clear why gradient direction can let the function f have the steepest change. How gradient relates to the steepest output change of function f? Thanks very much.
ОтветитьDerivation of del in direcion of del that is equal to magnitude of del gives a lot of intuition. Nice!
ОтветитьHow can the gradient which is a vector dotted with the vector v equal the normalized version of the gradient vector???
ОтветитьThank you 3blue1brown
ОтветитьYou just explained the Cauchy-Schwarz equation in the end, didn't you?
ОтветитьI was initially confused because I was thinking of a situation where:
along x axis and y axis the graph is kinda stable and mildly changing but in quadrant one there is a freaking big valley and ma poor little point is at the origin point😂
I was worried that no info about that valley is shown in gradient and ma point don't know where to go😂
then i realize i was outside the scope of this discussion
Thanks a lot Grant! It was a great pleasure to see you here. I am a big fan of 3B1B.
ОтветитьI don't understand why the directional derivative gives the slope. Firstly, if I have the slope of a function = df/dx, then I can only multiply it with dx if I expect the resulting change df to match the function graph. Otherwise it matches only the slope line. For directional gradients, you mentioned the vector length should be 1 instead of infinitely small. That means that a gradient component df/dx multiplied with the corresponding directional vector x-component will result in a change that will align with the slope line, but not with the graph. Can somebody cast light into this issue?
Ответитьcould you provide us a video explaining why a dot b = a x b * cos(a,b)? I understand it for geometric vectors, but it's unclear for me how this can be scaled to n-dimensional vectors.
ОтветитьDoes another change w
Ответитьbut in case if we start from (0,0), gradient is also (0,0). Does it mean that the steepest direction is any direction or what?
ОтветитьWhat if I want to find the least steepest??
Ответитьi am wondering if a vector V is doted with any vector A other than gradient vector , it still give the max value if V is parallel to A . so still the video doesn't prove that gradient is the steepest ascent ,,correct me if i am wrong
ОтветитьExplains nothing
Ответить