For problem 1, you were asked to check your derivatives against finite-difference approximations. This is always a good idea — derivatives of complicated functions are surpisingly easy to get wrong in hand calculations, and a quick check against random inputs will catch most bugs.
As in the finite differences notebook, we will want to measure accuracy by looking at the relative error. Let's define a function to compute that:
using LinearAlgebra # for norm and trace
relerr(a, b) = norm(a - b) * 2 / (norm(a) + norm(b))
relerr (generic function with 1 method)
For fun, we will also check some of our derivatives against the analytical automatic differentiation results from the ForwardDiff.jl package, as well as from Richardson-extrapolated "super-accurate" finite differences from the FiniteDifferences.jl package. (Students are not required to do this in their solutions.)
import ForwardDiff, FiniteDifferences
FiniteDifferences differentiates scalar functions, and can also compute the gradient or Jacobian one component at a time.
Let's define a helper function to compute directional derivatives with FiniteDifferences, using the fact that $df = f'(x)dx = \left. \frac{d}{ds} f(x + s\,dx) \right|_{s=0}$:
df_FD(f, x, dx) = FiniteDifferences.extrapolate_fdm(FiniteDifferences.central_fdm(2, 1), s -> f(x + s*dx), 0, 1/norm(dx))[1]
df_FD (generic function with 1 method)
fa(x) = (x'x)^4
fa′(x) = 8(x'x)^3 * x' # analytical Jacobian (= ∇faᵀ) from the solutions
fa′ (generic function with 1 method)
x = randn(5) # a random input
dx = randn(5) * 1e-8 # a small random perturbation
approx = fa(x+dx) - fa(x)
3.5871963177669386e-5
exact = fa′(x) * dx # directional derivative
3.587196202811543e-5
Looks pretty good! Let's check the relative error:
relerr(approx, exact)
3.2046029059297376e-8
Good, it matche to > 7 digits!
Let's try the analytical AD derivative from ForwardDiff
:
ad = ForwardDiff.gradient(fa, x)' * dx
3.587196202811543e-5
relerr(exact, ad)
0.0
Yay, we are actually matching ForwardDiff
exactly. (In general, we will get a slight difference in the 15th digit or so, due to the roundoff errors in computer arithmetic.)
Now, let's try the "fancy" high-order extrapolated finite difference from FiniteDifferences
:
fd = df_FD(fa, x, dx)
3.587196202813507e-5
relerr(exact, fd)
5.474362354015351e-13
Yes, the "fancy" finite-difference method is actually accurate to nearly machine precision — that is, nearly as accurate as we could possibly hope for given the finite precision of computer arithmetic.
(However, it's much more computationally costly than the analytical derivative.)
In fact, using AD (or even FiniteDifferences, albeit at much greater computational cost), we can actually compute the whole gradient, not just the directional derivative:
ForwardDiff.gradient(fa, x)
5-element Vector{Float64}: -343.4438670412682 -239.50385610105337 476.10330987985355 -1966.428777053804 27.545862357599063
If we compare it to our gradient (= transpose of our Jacobian), we see that it matches as desired:
relerr(fa′(x)', ForwardDiff.gradient(fa, x))
0.0
As we have discussed in class, and will discuss in more detail soon, "forward-mode" AD is not very efficient for differentiating functions with few outputs (here, 1 output!) and many inputs, and in the long run you are better off with "reverse-mode" AD for such cases. ForwardDiff
is a forward-mode algorithm, as the name implies, but there are other Julia packages for reverse-mode AD, such as the Zygote.jl package:
import Zygote
Zygote.gradient(fa, x)[1]
5-element Vector{Float64}: -343.4438670412682 -239.50385610105337 476.10330987985355 -1966.428777053804 27.545862357599063
relerr(fa′(x)', Zygote.gradient(fa, x)[1])
0.0
Hooray, it matches Zygote too! We are only doing small-scale calculations here, however, so for the rest of the notebook we will just use ForwardDiff
and forward-mode AD.
fb(x, A) = cos(x'*A*x)
∇fb(x, A) = -sin(x'*A*x) * (A + A') * x
∇fb (generic function with 1 method)
A = randn(5,5)
approx = fb(x + dx, A) - fb(x, A)
exact = ∇fb(x, A)' * dx # directional derivative df = ∇fᵀdx
relerr(approx, exact)
1.959781246578459e-8
Yay, again a match to 7–9 digits!
ad = ForwardDiff.gradient(x -> fb(x,A), x)
5-element Vector{Float64}: 2.4186139556397603 2.068692383304104 -2.220145661282068 3.5768159549935965 1.6594947068315289
relerr(∇fb(x, A), ad)
1.135229348700448e-16
Good, ForwardDiff
matches our whole gradient, up to roundoff errors.
fc(A) = tr(A^4)
∇fc(A) = 4A'^3 # = 4(Aᵀ)³
dA = randn(5,5) * 1e-8 # a random small matrix perturbation
approx = fc(A + dA) - fc(A)
exact = tr(∇fc(A)'dA) # the elementwise matrix dot product
relerr(approx, exact)
1.8898259119934437e-8
Yay, it worked!
f_d(A) = A^4
f_d′(A,dA) = dA*A^3 + A*dA*A^2 + A^2*dA*A + A^3*dA
approx = f_d(A + dA) - f_d(A)
exact = f_d′(A,dA)
relerr(approx, exact)
1.2607914911733005e-8
Hooray, it worked!
For fun, let's check the wrong answer $4A^3 dA$, which only works in the unlikely event that $A$ and $dA$ commute:
relerr(4A^3*dA, exact)
1.1383381060849134
Yep, relative error of order 1 (100%) — definitely a bug.
f_e(A, θ) = θ'A
f_e′(θ, dA) = θ'dA
θ = randn(5)
approx = f_e(A+dA,θ) - f_e(A,θ)
exact = f_e′(θ,dA)
relerr(approx, exact)
9.880558358444516e-9
Hooray!
Or… hooray?
This function $f_e$ is actually linear in $A$, so our finite-difference approximation should be exact. The problem is that we have lost half of our ≈15 significant digits due to cancellation roundoff error. In this case, a much bigger $dA$ would have been better! Let's try multiplying $dA$ by $10^8$:
approx = f_e(A+dA*1e8,θ) - f_e(A,θ)
exact = f_e′(θ,dA*1e8)
relerr(approx, exact)
1.7927480347228517e-16
Now it is matching to 16 digits! Finite differences and roundoff errors are tricky!
f_f(x) = sin.(x) # elementwise sine f_f′(x) = Diagonal(cos.(x)) # the Jacobian matrix
Here, we've used a special Diagonal
matrix type built-in to Julia's LinearAlgebra
library for efficiently working with diagonal matrices. (It only actually stores the diagonal entries, and can multiply more quickly than a general matrix.)
f_f(x) = sin.(x)
f_f′(x) = Diagonal(cos.(x))
f_f′ (generic function with 1 method)
Let's check it:
approx = f_f(x+dx) - f_f(x)
exact = f_f′(x) * dx
relerr(approx, exact)
9.10312795714695e-9
Great, it's matching!
Let's try automatic differentiation, which can compute the Jacobian for us, though it doesn't "know" that the Jacobian is sparse (mostly zero) / diagonal:
ad = ForwardDiff.jacobian(f_f, x)
5×5 Matrix{Float64}: 0.933231 0.0 0.0 0.0 0.0 0.0 0.967341 0.0 0.0 0.0 0.0 0.0 0.873017 0.0 0.0 -0.0 -0.0 -0.0 -0.508401 -0.0 0.0 0.0 0.0 0.0 0.999566
relerr(ad, f_f′(x))
0.0
Great, it's matching perfectly!
You were not required to do any computational work for problem 4, but for fun let's check the result using the Symbolics.jl computer-algebra (symbolic math) package:
using Symbolics
@variables a[1:2,1:2] s[1:2,1:2]
2-element Vector{Symbolics.Arr{Num, 2}}: a[1:2,1:2] s[1:2,1:2]
A = collect(a)
S = collect(s)
S[2,1] = S[1,2]
S
Y = A'*S*A
s = [S[1,1],S[1,2],S[2,2]] # vector of unique inputs
y = [Y[1,1],Y[1,2],Y[2,2]] # vector of unique outputs
Symbolics.jacobian(y, s)
This matches our hand-calculated solution, but a heck of a lot more easily!