The mysteries of the gradient
Several mathematical concepts are key to understand Machine Learning. This is specially true in using back-propagating neural nets. The chain rule, transformation concepts of linear algebra, the gradient and a bit of dynamical systems are key.
The famous gradient ( aka ∇f=⟨∂f/∂x,∂f/∂y⟩ )seems to be -let's say- problematic. We all heard (and in many cases have convinced ourselves) that the function grows in the direction of the gradient. Proving it is a different thing.
All that is pretty and fine but look at what I get when looking for gradient pictures in google:
search: function gradient pics
What these images have in common is that they are basically correct. Especially the two upper row right.
Now for a slightly different search:
What these second set of pictures have in common is that they are mostly wrong or confusing in terms of what the gradient is and how it should be depicted.
I the top row second from the write: The gradient is *not* tangent to the line nor surface. Also heard here and there is that the gradient does not make sense in a curve (wrong!)
Let's not go into details....My point is that there is a void to fill, so here is the tale of the gradient:
The gradient is a vector, meaning it indicates a direction, for example for a surface (x,y,f(x,y)) it is:
Note that the gradient has the same dimension as the input ( (x,y) ) and hence cannot be tangent to the surface (which is in 3 dimensional space, well except in the super particular case in which the surface is constant)
For a function of a single variable and its correspondent curve (x,f(x)) the gradient is the first derivative regarded as a direction (vector) in (attached, tangent) the x axis. For one dimensional vectors we represent them with a number. If the number is positive a vector on the one dimensional line indicates a direction to the right and if negative to the left.
(draw a picture here, not a reminder for me, it means you)
if we have a function that is growing the first derivative is positive so following the gradient (move x to the right) you will see the function growing.
if we have a function that is descending, the first derivative is negative and hence the gradient points to the left, but if you walk (move x) to the left you would perceive the function growing.
In higher dimensions look at the intersection of the hyper-surface into planes paralell to the coordinate planes passing through the point in question, let only one variable change at a time..just like when you are taking partial derivatives:
Each curve n-th coordinate curve will grow in the direction of ∂f( x0,....,xn,....)/∂xn (regarded as a vector)
and hence the function (hyper-surface) will grow in the direction of the gradient.
Conversely if we take the direction opposite to the gradient we will insure that the function decreases. End of story!