ForwardDiff is an implementation of forward mode automatic
differentiation (AD) in
Julia. There are two key components of this implementation: the Dual type, and the API.
Partial derivatives are stored in the Partials type:
struct Partials{N,V} <: AbstractVector{V}
values::NTuple{N,V}
endThe Partials type is used to implement the Dual type:
struct Dual{T,V<:Real,N} <: Real
value::V
partials::Partials{N,V}
endThis type represents an N-dimensional dual number
coupled with a tag parameter T in order to prevent perturbation
confusion. This dual number
type is implemented to have the following mathematical behavior:
where the a component is stored in the value field and the b
components are stored in the partials field. This property of dual numbers is the
central feature that allows ForwardDiff to take derivatives.
In order to implement the above property, elementary numerical functions on a Dual
number are overloaded to evaluate both the original function, and evaluate the derivative
of the function, propagating the derivative via multiplication. For example, Base.sin
can be overloaded on Dual like so:
Base.sin(d::Dual{T}) where {T} = Dual{T}(sin(value(d)), cos(value(d)) * partials(d))If we assume that a general function f is composed of entirely of these elementary
functions, then the chain rule enables our derivatives to compose as well. Thus, by
overloading a plethora of elementary functions, we can differentiate generic functions
composed of them by passing in a Dual number and looking at the output.
We won't discuss higher-order differentiation in detail, but the reader is encouraged to
learn about hyper-dual numbers,
which extend dual numbers to higher orders by introducing extra \epsilon terms that can
cross-multiply. ForwardDiff's Dual number implementation naturally supports hyper-dual
numbers without additional code by allowing instances of the Dual type to nest within each
other. For example, a second-order hyper-dual number has the type Dual{T,Dual{S,V,M},N}, a
third-order hyper-dual number has the type Dual{T,Dual{S,Dual{R,V,K},M},N}, and so on.
The second component provided by this package is the API, which abstracts away the number
types and makes it easy to execute familiar calculations like gradients and Hessians. This
way, users don't have to understand Dual numbers in order to make use of the package.
The job of the API functions is to performantly seed input values with Dual numbers,
pass the seeded value into the target function, and extract the derivative information from
the result. For example, to calculate the partial derivatives for the gradient of a function
f at an input vector \vec{x}, we would do the following:
In reality, ForwardDiff does this calculation in chunks of the input vector (see
Configuring Chunk Size for details). To provide a simple example of this, let's
examine the case where the input vector size is 4 and the chunk size is 2. It then takes
two calls to f to evaluate the gradient:
1st call:
2nd call:
This seeding process is similar for Jacobians, so we won't rehash it here.