Skip to content

stable pinv least-squares via rtol,atol in svd, ldiv! #1387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Jun 19, 2025

It recently came to my attention that solving a least-square problem $\min_x \Vert Ax - b \Vert$ by the pseudo-inverse $\hat{x} = A^+ b$ is not backwards stable if you explicitly compute the pseudo inverse matrix $A^+$. That is, pinv(A) * b is numerically unstable. If you want to use the SVD $A = U \Sigma V^*$, the right way is to compute $V (\Sigma^+ (U^* b))$, not $A^+ b = (V \Sigma^+ U^*) b$. See e.g. Liu & Barnett (2016) and references therein:

image

However, pinv(A) * b is convenient in that it allows you to specify explicit tolerances atol, rtol to regularize the problem by dropping small singular values, whereas if you use the svd(A) directly there was no such truncation method provided, and furthermore the drop tolerance in svd(A) \ b was inconsistent with the default drop tolerance in pinv(A) * b.

To make it easier to do the right thing, this PR makes the following changes:

  • ldiv!(svd(A), b; atol, rtol) now allows you to pass tolerances, and its default tolerances are consistent with pinv. (This gives you a backwards-stable algorithm, but is somewhat low-level.)
  • You can now pass atol, rtol keywords directly to the svd function, via svd(A; atol, rtol), in order to compute a truncated SVD. This a provides higher-level, backwards-stable API svd(A; atol, rtol) \ b to compute a regularized least-square solution without explicitly forming the pinv matrix.
  • A new method pinv(svd(A)) that lets you pass the SVD to pinv along with atol, rtol tolerances. (This still computes an explicit pinv, which is undesirable in ill-conditioned problems, but I got the extra method "for free" as the result of some refactoring.)

@stevengj stevengj added the enhancement New feature or request label Jun 19, 2025
@stevengj
Copy link
Member Author

stevengj commented Jun 19, 2025

Note that the advice from JuliaLang/julia#8859 to use a tolerance $\sqrt{\epsilon}$ for pinv in ill-conditioned least-squares problems seems to be misleading. The correct advice is not to use pinv at all in such cases (at least, not with a small tolerance; with a large tolerance, you regularize away the ill-conditioning, but of course that changes the answer).

For example, here is the ill-conditioned Hilbert-matrix example from that PR:

hilb(n::Integer) = [1/(i+j-1) for i = 1:n, j=1:n]

# form an ill-conditioned matrix
m, n = 1000, 100
A = randn(m,n) * hilb(n);

# create a right hand side in the range of A
x0 = randn(n);
b = A*x0;

# find the least squares solution
x = A \ b;

# the error in rhs should be near machine precision
norm(A*x - b) / norm(b)

The final backwards relative error using the default (QR) method A \ b is indeed near machine precision, e.g. 5.642891778507215e-16 for one random example on my machine. You get similar backwards error from svd(A) \ b.

However, if you use pinv(A; rtol) * b, for any value of rtol, the backwards error never goes below 1e-10:
image

While it's true that an rtol near $\sqrt{\epsilon}$ gives the smallest backwards error, it is much better to not use pinv at all. (Unless you are intentionally using a large tolerance to regularize the problem, independent of precision, as noted above, which changes the answer.)

@stevengj
Copy link
Member Author

(Note that this adds a missing checksquare to inv(::SVD), which I would argue is a bug from JuliaLang/julia#32126. If you want the pseudo-inverse, you should call pinv(::SVD) explicitly, not rely on magic undocumented behavior of inv.)

Copy link

codecov bot commented Jun 19, 2025

Codecov Report

Attention: Patch coverage is 97.14286% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.81%. Comparing base (c5d7122) to head (2048988).

Files with missing lines Patch % Lines
src/svd.jl 96.96% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1387      +/-   ##
==========================================
- Coverage   93.81%   93.81%   -0.01%     
==========================================
  Files          34       34              
  Lines       15722    15732      +10     
==========================================
+ Hits        14750    14759       +9     
- Misses        972      973       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stevengj
Copy link
Member Author

stevengj commented Jun 19, 2025

I'm not sure if there is a better reference than the offhand remarks in @ahbarnett's papers cited above for the fact that pinv(A) * b is unstable. Trefethen & Bau, Golub & Van Loan, and Björk all recommend using the SVD for least-squares problems via $V \Sigma^+ (U^* b)$, with $U^* b$ computed first, but don't explicitly say that the other parenthesization is unstable.

@stevengj
Copy link
Member Author

stevengj commented Jun 23, 2025

I forgot that Michael Stewart (github handle?) commented on this topic on julia discourse as well, where he pointed out that Higham's book shows that $A^{-1}b$ is not a backwards stable way to solve $Ax=b$ for an invertible square matrix, and since this is a special case of the pseudoinverse the result follows. On page 260, Higham writes:

image

@ahbarnett
Copy link

Hi Steven - nice demo, and guilty as charged with respect to my "offhand remarks" :) To this day I do not know a good reference that pinv(A)*b is not backward stable for ill-conditioned LSQ problems, and I keep needing the idea in other papers. You guys are right to point to the Higham book Sec 14.1 for the square case - this is much more well known, and I probably should refer to it. But it's not satisfactory since square vs LSQ problems are usually treated differently. Higham Ch 20 (on LSQ) has many references in its afternotes but I do not see anything with the statement we need. Let me know if you find something!
Best, Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants