Automatic differentiation
From Free net encyclopedia
In mathematics and computer algebra, automatic differentiation, or AD, sometimes alternatively called algorithmic differentiation, is a method to numerically evaluate the derivative of a function specified by a computer program. Two classical ways of doing this are:
- symbolically differentiate the function as an expression, and evaluate it in the point; or
- use finite differences.
The drawback with symbolic differentiation is low speed, and the difficulty of converting a computer program into a single expression. Moreover, many functions grow quite complex as higher derivatives are calculated. Two important drawbacks with finite differences are round-off errors in the discretization process, and cancellation. Both classical methods have problems with calculating higher derivatives, where the complexity and errors increase. Automatic differentiation solves all of the mentioned problems.
There are two modes of AD, called forward accumulation and reverse accumulation. Forward accumulation is less useful in practice, but easy to implement on computers. Reverse accumulation allows the efficient calculation of gradients.
Contents |
Forward accumulation AD
A new arithmetic is introduced using ordered pairs as objects. The new arithmetic uses ordinary arithmetic on the first element of the pair and a first order differentiation arithmetic applying the chain rule on the second element. The basic arithmetic operations and some standard functions are defined by
- <math>\left(u,u'\right)+\left(v,v'\right)=\left(u+v, u'+v' \right)</math>
- <math>\left(u,u'\right)-\left(v,v'\right)=\left(u-v, u'-v' \right)</math>
- <math>\left(u,u'\right)*\left(v,v'\right)=\left( u v, u'v+uv' \right)</math>
- <math>\left(u,u'\right)/\left(v,v'\right)=\left( \frac{u}{v}, \frac{u'v-uv'}{v^2} \right)\quad \left(v\ne 0\right)</math>
- <math>\sin\left(u,u'\right)=\left(\sin\left(u\right),u'\cos\left(u\right)\right)</math>
- <math>\cos\left(u,u'\right)=\left(\cos\left(u\right),-u'\sin\left(u\right)\right)</math>
- <math>e^{\left(u,u'\right)}=\left(e^u,u'e^u\right)</math>
- <math>\log\left(u,u'\right)=\left(\log\left(u\right),u'/u\right)\quad \left(u>0\right)</math>
- <math>\left(u,u'\right)^k=\left(u^k,ku^{k-1}u'\right)\quad \left(u\ne 0\right)</math>
- <math>\left|\left(u,u'\right)\right|=\left(\left|u\right|,u'\mbox{sign}\left(u\right)\right)\quad \left(u\ne 0\right).</math>
The derivative of a function <math>f\left( x \right)</math> at the point <math>x_0</math> is now calculated by substituting <math>x_0</math> by the ordered pair <math>\left(x_0,1\right)</math> in the expression for <math>f\left( x \right)</math>. A constant c is substituted by <math>\left(c,0\right)</math>. Calculating <math>f'\left( x_0 \right) </math> using the above arithmetic gives <math>\left( f \left( x_0 \right), f' \left( x_0 \right) \right) </math> as the answer.
Higher order forward accumulation AD
The above arithmetic can be generalized, in the natural way, to second order and higher derivatives. However, the arithmetic rules quickly grow very complicated. Instead, truncated Taylor series arithmetic is used. This is possible because the Taylor summands in a Taylor series of a function are products of known coefficients and derivatives of the function.
Reverse accumulation AD
The data flow graph of a computation can be manipulated to calculate the gradient of its original calculation. This is done by adding an adjoint node for each primal node, connected by adjoint edges which parallel the primal edges but flow in the opposite direction. The nodes in the adjoint graph represent multiplication by the derivatives of the functions calculated by the nodes in the primal. For instance, addition in the primal causes fanout in the adjoint; fanout in the primal causes addition in the adjoint; a unary function <math>y=f(x)</math> in the primal causes <math>x'=f'(x) y'</math> in the adjoint; etc.
Backpropagation of errors in multilayer perceptrons, a technique used in machine learning, is a special case of reverse mode AD.
Software
There are plenty of libraries for automatic differentiation.