Matrix Calculus IAP 2022: Homework 1 Solutions¶

Problem 1¶

For problem 1, you were asked to check your derivatives against finite-difference approximations. This is always a good idea — derivatives of complicated functions are surpisingly easy to get wrong in hand calculations, and a quick check against random inputs will catch most bugs.

As in the finite differences notebook, we will want to measure accuracy by looking at the relative error. Let's define a function to compute that:

In [1]:

using LinearAlgebra # for norm and trace

relerr(a, b) = norm(a - b) * 2 / (norm(a) + norm(b))

Out[1]:

relerr (generic function with 1 method)

For fun, we will also check some of our derivatives against the analytical automatic differentiation results from the ForwardDiff.jl package, as well as from Richardson-extrapolated "super-accurate" finite differences from the FiniteDifferences.jl package. (Students are not required to do this in their solutions.)

In [2]:

import ForwardDiff, FiniteDifferences

FiniteDifferences differentiates scalar functions, and can also compute the gradient or Jacobian one component at a time.

Let's define a helper function to compute directional derivatives with FiniteDifferences, using the fact that $df = f'(x)dx = \left. \frac{d}{ds} f(x + s\,dx) \right|_{s=0}$:

In [3]:

df_FD(f, x, dx) = FiniteDifferences.extrapolate_fdm(FiniteDifferences.central_fdm(2, 1), s -> f(x + s*dx), 0, 1/norm(dx))[1]

Out[3]:

df_FD (generic function with 1 method)

(a) $f(x) = (x^T x)^4$¶

In [4]:

fa(x) = (x'x)^4
fa′(x) = 8(x'x)^3 * x'  # analytical Jacobian (= ∇faᵀ) from the solutions

Out[4]:

fa′ (generic function with 1 method)

In [5]:

x = randn(5) # a random input
dx = randn(5) * 1e-8 # a small random perturbation

approx = fa(x+dx) - fa(x)

Out[5]:

3.5871963177669386e-5

In [6]:

exact = fa′(x) * dx # directional derivative

Out[6]:

3.587196202811543e-5

Looks pretty good! Let's check the relative error:

In [7]:

relerr(approx, exact)

Out[7]:

3.2046029059297376e-8

Good, it matche to > 7 digits!

More checks, for "fun":¶

Let's try the analytical AD derivative from ForwardDiff:

In [8]:

ad = ForwardDiff.gradient(fa, x)' * dx

Out[8]:

3.587196202811543e-5

In [9]:

relerr(exact, ad)

Out[9]:

0.0

Yay, we are actually matching ForwardDiff exactly. (In general, we will get a slight difference in the 15th digit or so, due to the roundoff errors in computer arithmetic.)

Now, let's try the "fancy" high-order extrapolated finite difference from FiniteDifferences:

In [10]:

fd = df_FD(fa, x, dx)

Out[10]:

3.587196202813507e-5

In [11]:

relerr(exact, fd)

Out[11]:

5.474362354015351e-13

Yes, the "fancy" finite-difference method is actually accurate to nearly machine precision — that is, nearly as accurate as we could possibly hope for given the finite precision of computer arithmetic.

(However, it's much more computationally costly than the analytical derivative.)

In fact, using AD (or even FiniteDifferences, albeit at much greater computational cost), we can actually compute the whole gradient, not just the directional derivative:

In [12]:

ForwardDiff.gradient(fa, x)

Out[12]:

5-element Vector{Float64}:
  -343.4438670412682
  -239.50385610105337
   476.10330987985355
 -1966.428777053804
    27.545862357599063

If we compare it to our gradient (= transpose of our Jacobian), we see that it matches as desired:

In [13]:

relerr(fa′(x)', ForwardDiff.gradient(fa, x))

Out[13]:

0.0

As we have discussed in class, and will discuss in more detail soon, "forward-mode" AD is not very efficient for differentiating functions with few outputs (here, 1 output!) and many inputs, and in the long run you are better off with "reverse-mode" AD for such cases. ForwardDiff is a forward-mode algorithm, as the name implies, but there are other Julia packages for reverse-mode AD, such as the Zygote.jl package:

In [14]:

import Zygote
Zygote.gradient(fa, x)[1]

Out[14]:

5-element Vector{Float64}:
  -343.4438670412682
  -239.50385610105337
   476.10330987985355
 -1966.428777053804
    27.545862357599063

In [15]:

relerr(fa′(x)', Zygote.gradient(fa, x)[1])

Out[15]:

0.0

Hooray, it matches Zygote too! We are only doing small-scale calculations here, however, so for the rest of the notebook we will just use ForwardDiff and forward-mode AD.

(b) $f(x) = \cos(x^T A x)$¶

In [16]:

fb(x, A) = cos(x'*A*x)
∇fb(x, A) = -sin(x'*A*x) * (A + A') * x

Out[16]:

∇fb (generic function with 1 method)

In [17]:

A = randn(5,5)

approx = fb(x + dx, A) - fb(x, A)
exact = ∇fb(x, A)' * dx # directional derivative df = ∇fᵀdx
relerr(approx, exact)

Out[17]:

1.959781246578459e-8

Yay, again a match to 7–9 digits!

More checks, for "fun":¶

In [18]:

ad = ForwardDiff.gradient(x -> fb(x,A), x)

Out[18]:

5-element Vector{Float64}:
  2.4186139556397603
  2.068692383304104
 -2.220145661282068
  3.5768159549935965
  1.6594947068315289

In [19]:

relerr(∇fb(x, A), ad)

Out[19]:

1.135229348700448e-16

Good, ForwardDiff matches our whole gradient, up to roundoff errors.

(c) $f(A) = \operatorname{trace}(A^4)$¶

In [20]:

fc(A) = tr(A^4)
∇fc(A) = 4A'^3   # = 4(Aᵀ)³

dA = randn(5,5) * 1e-8 # a random small matrix perturbation
approx = fc(A + dA) - fc(A)
exact = tr(∇fc(A)'dA) # the elementwise matrix dot product
relerr(approx, exact)

Out[20]:

1.8898259119934437e-8

Yay, it worked!

(d) $f(A) = A^4$¶

In [21]:

f_d(A) = A^4
f_d′(A,dA) = dA*A^3 + A*dA*A^2 + A^2*dA*A + A^3*dA

approx = f_d(A + dA) - f_d(A)
exact = f_d′(A,dA)
relerr(approx, exact)

Out[21]:

1.2607914911733005e-8

Hooray, it worked!

For fun, let's check the wrong answer $4A^3 dA$, which only works in the unlikely event that $A$ and $dA$ commute:

In [22]:

relerr(4A^3*dA, exact)

Out[22]:

1.1383381060849134

Yep, relative error of order 1 (100%) — definitely a bug.

(e) $f(A) = \theta^T A$¶

In [23]:

f_e(A, θ) = θ'A
f_e′(θ, dA) = θ'dA
θ = randn(5)
approx = f_e(A+dA,θ) - f_e(A,θ)
exact = f_e′(θ,dA)
relerr(approx, exact)

Out[23]:

9.880558358444516e-9

Hooray!

Or… hooray?

This function $f_e$ is actually linear in $A$, so our finite-difference approximation should be exact. The problem is that we have lost half of our ≈15 significant digits due to cancellation roundoff error. In this case, a much bigger $dA$ would have been better! Let's try multiplying $dA$ by $10^8$:

In [24]:

approx = f_e(A+dA*1e8,θ) - f_e(A,θ)
exact = f_e′(θ,dA*1e8)
relerr(approx, exact)

Out[24]:

1.7927480347228517e-16

Now it is matching to 16 digits! Finite differences and roundoff errors are tricky!

(f) $f(x) = \sin.(x)$¶

f_f(x) = sin.(x) # elementwise sine f_f′(x) = Diagonal(cos.(x)) # the Jacobian matrix

Here, we've used a special Diagonal matrix type built-in to Julia's LinearAlgebra library for efficiently working with diagonal matrices. (It only actually stores the diagonal entries, and can multiply more quickly than a general matrix.)

In [29]:

f_f(x) = sin.(x)
f_f′(x) = Diagonal(cos.(x))

Out[29]:

f_f′ (generic function with 1 method)

Let's check it:

In [30]:

approx = f_f(x+dx) - f_f(x)
exact = f_f′(x) * dx
relerr(approx, exact)

Out[30]:

9.10312795714695e-9

Great, it's matching!

More checks, for "fun":¶

Let's try automatic differentiation, which can compute the Jacobian for us, though it doesn't "know" that the Jacobian is sparse (mostly zero) / diagonal:

In [31]:

ad = ForwardDiff.jacobian(f_f, x)

Out[31]:

5×5 Matrix{Float64}:
  0.933231   0.0        0.0        0.0        0.0
  0.0        0.967341   0.0        0.0        0.0
  0.0        0.0        0.873017   0.0        0.0
 -0.0       -0.0       -0.0       -0.508401  -0.0
  0.0        0.0        0.0        0.0        0.999566

In [32]:

relerr(ad, f_f′(x))

Out[32]:

0.0

Great, it's matching perfectly!

Problem 4¶

You were not required to do any computational work for problem 4, but for fun let's check the result using the Symbolics.jl computer-algebra (symbolic math) package:

In [2]:

using Symbolics

@variables a[1:2,1:2] s[1:2,1:2]

Out[2]:

2-element Vector{Symbolics.Arr{Num, 2}}:
 a[1:2,1:2]
 s[1:2,1:2]

In [4]:

A = collect(a)

Out[4]:

\begin{equation} \left[ \begin{array}{cc} a{{_1}}ˏ{_1} & a{{_1}}ˏ{_2} \\ a{{_2}}ˏ{_1} & a{{_2}}ˏ{_2} \\ \end{array} \right] \end{equation}

In [5]:

S = collect(s)
S[2,1] = S[1,2]
S

Out[5]:

\begin{equation} \left[ \begin{array}{cc} s{{_1}}ˏ{_1} & s{{_1}}ˏ{_2} \\ s{{_1}}ˏ{_2} & s{{_2}}ˏ{_2} \\ \end{array} \right] \end{equation}

In [6]:

Y = A'*S*A

Out[6]:

\begin{equation} \left[ \begin{array}{cc} \left( a{{_1}}ˏ{_1} s{{_1}}ˏ{_1} + a{{_2}}ˏ{_1} s{{_1}}ˏ{_2} \right) a{{_1}}ˏ{_1} + \left( a{{_1}}ˏ{_1} s{{_1}}ˏ{_2} + a{{_2}}ˏ{_1} s{{_2}}ˏ{_2} \right) a{{_2}}ˏ{_1} & \left( a{{_1}}ˏ{_1} s{{_1}}ˏ{_1} + a{{_2}}ˏ{_1} s{{_1}}ˏ{_2} \right) a{{_1}}ˏ{_2} + \left( a{{_1}}ˏ{_1} s{{_1}}ˏ{_2} + a{{_2}}ˏ{_1} s{{_2}}ˏ{_2} \right) a{{_2}}ˏ{_2} \\ \left( a{{_1}}ˏ{_2} s{{_1}}ˏ{_1} + a{{_2}}ˏ{_2} s{{_1}}ˏ{_2} \right) a{{_1}}ˏ{_1} + \left( a{{_1}}ˏ{_2} s{{_1}}ˏ{_2} + a{{_2}}ˏ{_2} s{{_2}}ˏ{_2} \right) a{{_2}}ˏ{_1} & \left( a{{_1}}ˏ{_2} s{{_1}}ˏ{_1} + a{{_2}}ˏ{_2} s{{_1}}ˏ{_2} \right) a{{_1}}ˏ{_2} + \left( a{{_1}}ˏ{_2} s{{_1}}ˏ{_2} + a{{_2}}ˏ{_2} s{{_2}}ˏ{_2} \right) a{{_2}}ˏ{_2} \\ \end{array} \right] \end{equation}

In [8]:

s = [S[1,1],S[1,2],S[2,2]] # vector of unique inputs

Out[8]:

\begin{equation} \left[ \begin{array}{c} s{{_1}}ˏ{_1} \\ s{{_1}}ˏ{_2} \\ s{{_2}}ˏ{_2} \\ \end{array} \right] \end{equation}

In [9]:

y = [Y[1,1],Y[1,2],Y[2,2]] # vector of unique outputs

Out[9]:

\begin{equation} \left[ \begin{array}{c} \left( a{{_1}}ˏ{_1} s{{_1}}ˏ{_1} + a{{_2}}ˏ{_1} s{{_1}}ˏ{_2} \right) a{{_1}}ˏ{_1} + \left( a{{_1}}ˏ{_1} s{{_1}}ˏ{_2} + a{{_2}}ˏ{_1} s{{_2}}ˏ{_2} \right) a{{_2}}ˏ{_1} \\ \left( a{{_1}}ˏ{_1} s{{_1}}ˏ{_1} + a{{_2}}ˏ{_1} s{{_1}}ˏ{_2} \right) a{{_1}}ˏ{_2} + \left( a{{_1}}ˏ{_1} s{{_1}}ˏ{_2} + a{{_2}}ˏ{_1} s{{_2}}ˏ{_2} \right) a{{_2}}ˏ{_2} \\ \left( a{{_1}}ˏ{_2} s{{_1}}ˏ{_1} + a{{_2}}ˏ{_2} s{{_1}}ˏ{_2} \right) a{{_1}}ˏ{_2} + \left( a{{_1}}ˏ{_2} s{{_1}}ˏ{_2} + a{{_2}}ˏ{_2} s{{_2}}ˏ{_2} \right) a{{_2}}ˏ{_2} \\ \end{array} \right] \end{equation}

In [10]:

Symbolics.jacobian(y, s)

Out[10]:

\begin{equation} \left[ \begin{array}{ccc} a{{_1}}ˏ{_1}^{2} & 2 a{{_1}}ˏ{_1} a{{_2}}ˏ{_1} & a{{_2}}ˏ{_1}^{2} \\ a{{_1}}ˏ{_1} a{{_1}}ˏ{_2} & a{{_1}}ˏ{_2} a{{_2}}ˏ{_1} + a{{_1}}ˏ{_1} a{{_2}}ˏ{_2} & a{{_2}}ˏ{_1} a{{_2}}ˏ{_2} \\ a{{_1}}ˏ{_2}^{2} & 2 a{{_1}}ˏ{_2} a{{_2}}ˏ{_2} & a{{_2}}ˏ{_2}^{2} \\ \end{array} \right] \end{equation}

This matches our hand-calculated solution, but a heck of a lot more easily!