How does cuDSS solve linear equations internally?

v77 · June 23, 2025, 5:34pm

Hello,

I am fairly new to CUDA and cuDSS as a whole. I have used cuSOLVER and I found it quite helpful that the methods already told you what type of solving method you were using. Now cuDSS seems to be the better and more recent alternative, but it now works with a single execute-method and key-arguments like SOLVE or REFACTORIZATION, but it doesn’t really tell me what’s going on behind the scenes. I think that I have done my due diligence by reading the cuDSS documentation and searching in the forum, but I still haven’t found any remarks about the direct methods being used for solving the sets of equations, is it still using LU? Or is it using LU, QR or Cholesky based on the type of matrix automatically?

I would love to have a deeper insight into the inner workings of cuDSS. If I have missed a source, which explains how it works under the hood, I would be pleased to get a link or a pdf of some sorts.

-V

kvoronin · June 23, 2025, 5:50pm

Hi V,

First of all, I’d like to note that it’s not fully correct to say that cuDSS is a better alternative to cuSOLVER. These two libraries provide different functionality. Even when considering just the functionality for solving linear systems of the form Ax = b, cuDSS currently only supports solving systems with sparse matrices, while cuSOLVER only supports the dense case.

Regarding the methods used underneath: at the moment, we intentionally do not specify explicitly what happens but the logic currently is quite straightforward. LU is used for general matrices, LDL^T for symmetric or hermitian matrices and LL^T for spd/hpd (^T is a conjugated transpose for the complex case). So, by specifying the matrix type, the user currently can control the decomposition.
Note: QR is not supported right now but if it ever gets supported, there will be a way for the user to choose between QR and LU.

However, there are extra considerations. E.g., if matching is turned on, then even if the original matrix is symmetric, after matching the system is likely to loose the symmetry and the final decomposition would be described by
P_reorder^T * A * P_match * P_reorder = LU

For more details about how direct sparse solvers may be implemented, you can refer to any of the sparse direct solvers survey, e.g. A survey of direct methods for sparse linear systems | Acta Numerica | Cambridge Core.

-Kirill

v77 · June 23, 2025, 6:16pm

Thank you very much for your explanation,

About the statement

These two libraries provide different functionality. Even when considering just the functionality for solving linear systems of the form Ax = b, cuDSS currently only supports solving systems with sparse matrices, while cuSOLVER only supports the dense case.

This is only true, because of the deprecations of cuSolverSp methods and the recommendation fo switching to cuDSS, right? Because I definitely used cuSOLVER for solving sparse systems using cuSolverSp methods before.

Could you also shed some light in regards as to why the switch to cuDSS is being done? Is cuSOLVERSp not being maintained anymore or have you found ways to write a faster alternative?

Thanks in advance, again.

-V

kvoronin · June 23, 2025, 6:45pm

I see now what you were asking (I was thinking of cuSOLVERDn mainly when answering your first question).

Components cuSolverSp and cuSolverRf indeed provide some sparse solver functionality but, as you correctly noticed, they are deprecated.

Is cuSOLVERSp not being maintained anymore or have you found ways to write a faster alternative?

Short answer: both (except ofcourse some minimal maintenance).

The reasons for replacing parts of cuSolverSp and cusolverRf with cuDSS are:

very sub-optimal performance
some crucial parts only work on CPU and basically do not make any use of the GPUs
the API is not convenient to use

So, cuDSS is a true GPU-accelerated sparse direct solver (well, except the reordering, but there is a good reason for this).

Btw, there are samples (and more to appear) which should help with getting away from cusolverSp/cusolverRf and transition to cuDSS where possible, see CUDALibrarySamples/cuSOLVERSp2cuDSS at master · NVIDIA/CUDALibrarySamples · GitHub

-Kirill

v77 · June 25, 2025, 1:52pm

Thanks again for your fast and detailed explanation.

I have two last questions and then I will close this thread. If I understand correctly, cuDSS only supports solving (sparse) squared matrices.

Is support for rectangular matrices planned?

Are there alternative libraries for solving sparse rectangular matrices which use CUDA (aside from the least square solver from cuSOLVERSp)?

-V

kvoronin · June 25, 2025, 4:26pm

Hi V!

Is support for rectangular matrices planned?

Yes (with Sparse QR being the first priority), but it would require more requests to prioritize adding this feature. What kind of methods (QR, LQ, SVD, …) and systems (over- or under-determined?) would you want to see for rectangular matrices?

Are there alternative libraries for solving sparse rectangular matrices which use CUDA (aside from the least square solver from cuSOLVERSp)

E.g., SuiteSparse has CUDA-accelerated SuiteSparseQR (but I haven’t tried it so can’t say how well it works)

-Kirill

v77 · July 8, 2025, 5:22pm

I have no preference in regards to which method would be best for solving over-/underdetermined linear systems. I am sure that SVD would be good, but I don’t know if the parallelisation is that simple. I think the solving of at least overdetermined systems, - that maybe also minimize ||x|| -would be very useful to certain applications like approximation of PDE.

I thank you very much for your responses! I will now resolve this thread and mark the solution.

-V

system · July 22, 2025, 5:22pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cuDSS GPU-Accelerated Libraries cusparse , cudss	3	36	August 8, 2025
Example using cusparse and cusolverSpDcsrlsvchol GPU-Accelerated Libraries cusolver , cusparse	22	264	May 13, 2025
Just Released: cuDSS 0.3.0 Technical Blog	1	112	July 3, 2024
How to efficiently solve transposed problem using cuDSS? GPU-Accelerated Libraries cusparse , cudss	4	64	October 10, 2024
Why I cannot get correct results from cudss solver for a simple matrix? Customer Stories cuda , cudss	4	58	June 10, 2025
Cusolver solve sparse Ax=b wrong GPU-Accelerated Libraries cusolver	8	130	August 19, 2024
Linear solver with sparse matrix on Xavier Jetson Xavier NX	4	41	April 18, 2025
NVIDIA cuDSS Advances Solver Technologies for Engineering and Scientific Computing Technical Blog	1	30	February 25, 2025
Sparse cusolver inside loop .................. factorization at every call? GPU-Accelerated Libraries cusparse	8	1446	March 9, 2024
Sparse right-hand side inputs for cuDSS GPU-Accelerated Libraries cudss	2	53	February 28, 2025

How does cuDSS solve linear equations internally?

Related topics