How does cuDSS solve linear equations internally?

Hello,

I am fairly new to CUDA and cuDSS as a whole. I have used cuSOLVER and I found it quite helpful that the methods already told you what type of solving method you were using. Now cuDSS seems to be the better and more recent alternative, but it now works with a single execute-method and key-arguments like SOLVE or REFACTORIZATION, but it doesn’t really tell me what’s going on behind the scenes. I think that I have done my due diligence by reading the cuDSS documentation and searching in the forum, but I still haven’t found any remarks about the direct methods being used for solving the sets of equations, is it still using LU? Or is it using LU, QR or Cholesky based on the type of matrix automatically?

I would love to have a deeper insight into the inner workings of cuDSS. If I have missed a source, which explains how it works under the hood, I would be pleased to get a link or a pdf of some sorts.

-V

Hi V,

First of all, I’d like to note that it’s not fully correct to say that cuDSS is a better alternative to cuSOLVER. These two libraries provide different functionality. Even when considering just the functionality for solving linear systems of the form Ax = b, cuDSS currently only supports solving systems with sparse matrices, while cuSOLVER only supports the dense case.

Regarding the methods used underneath: at the moment, we intentionally do not specify explicitly what happens but the logic currently is quite straightforward. LU is used for general matrices, LDL^T for symmetric or hermitian matrices and LL^T for spd/hpd (^T is a conjugated transpose for the complex case). So, by specifying the matrix type, the user currently can control the decomposition.
Note: QR is not supported right now but if it ever gets supported, there will be a way for the user to choose between QR and LU.

However, there are extra considerations. E.g., if matching is turned on, then even if the original matrix is symmetric, after matching the system is likely to loose the symmetry and the final decomposition would be described by
P_reorder^T * A * P_match * P_reorder = LU

For more details about how direct sparse solvers may be implemented, you can refer to any of the sparse direct solvers survey, e.g. A survey of direct methods for sparse linear systems | Acta Numerica | Cambridge Core.

-Kirill

Thank you very much for your explanation,

About the statement

These two libraries provide different functionality. Even when considering just the functionality for solving linear systems of the form Ax = b, cuDSS currently only supports solving systems with sparse matrices, while cuSOLVER only supports the dense case.

This is only true, because of the deprecations of cuSolverSp methods and the recommendation fo switching to cuDSS, right? Because I definitely used cuSOLVER for solving sparse systems using cuSolverSp methods before.

Could you also shed some light in regards as to why the switch to cuDSS is being done? Is cuSOLVERSp not being maintained anymore or have you found ways to write a faster alternative?

Thanks in advance, again.

-V

I see now what you were asking (I was thinking of cuSOLVERDn mainly when answering your first question).

Components cuSolverSp and cuSolverRf indeed provide some sparse solver functionality but, as you correctly noticed, they are deprecated.

Is cuSOLVERSp not being maintained anymore or have you found ways to write a faster alternative?

Short answer: both (except ofcourse some minimal maintenance).

The reasons for replacing parts of cuSolverSp and cusolverRf with cuDSS are:

  1. very sub-optimal performance
  2. some crucial parts only work on CPU and basically do not make any use of the GPUs
  3. the API is not convenient to use

So, cuDSS is a true GPU-accelerated sparse direct solver (well, except the reordering, but there is a good reason for this).

Btw, there are samples (and more to appear) which should help with getting away from cusolverSp/cusolverRf and transition to cuDSS where possible, see CUDALibrarySamples/cuSOLVERSp2cuDSS at master · NVIDIA/CUDALibrarySamples · GitHub

-Kirill

Thanks again for your fast and detailed explanation.

I have two last questions and then I will close this thread. If I understand correctly, cuDSS only supports solving (sparse) squared matrices.

Is support for rectangular matrices planned?

Are there alternative libraries for solving sparse rectangular matrices which use CUDA (aside from the least square solver from cuSOLVERSp)?

-V

Hi V!

Is support for rectangular matrices planned?

Yes (with Sparse QR being the first priority), but it would require more requests to prioritize adding this feature. What kind of methods (QR, LQ, SVD, …) and systems (over- or under-determined?) would you want to see for rectangular matrices?

Are there alternative libraries for solving sparse rectangular matrices which use CUDA (aside from the least square solver from cuSOLVERSp)

E.g., SuiteSparse has CUDA-accelerated SuiteSparseQR (but I haven’t tried it so can’t say how well it works)

-Kirill

I have no preference in regards to which method would be best for solving over-/underdetermined linear systems. I am sure that SVD would be good, but I don’t know if the parallelisation is that simple. I think the solving of at least overdetermined systems, - that maybe also minimize ||x|| -would be very useful to certain applications like approximation of PDE.

I thank you very much for your responses! I will now resolve this thread and mark the solution.

-V

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.