I am fairly new to CUDA and cuDSS as a whole. I have used cuSOLVER and I found it quite helpful that the methods already told you what type of solving method you were using. Now cuDSS seems to be the better and more recent alternative, but it now works with a single execute-method and key-arguments like SOLVE or REFACTORIZATION, but it doesn’t really tell me what’s going on behind the scenes. I think that I have done my due diligence by reading the cuDSS documentation and searching in the forum, but I still haven’t found any remarks about the direct methods being used for solving the sets of equations, is it still using LU? Or is it using LU, QR or Cholesky based on the type of matrix automatically?
I would love to have a deeper insight into the inner workings of cuDSS. If I have missed a source, which explains how it works under the hood, I would be pleased to get a link or a pdf of some sorts.
First of all, I’d like to note that it’s not fully correct to say that cuDSS is a better alternative to cuSOLVER. These two libraries provide different functionality. Even when considering just the functionality for solving linear systems of the form Ax = b, cuDSS currently only supports solving systems with sparse matrices, while cuSOLVER only supports the dense case.
Regarding the methods used underneath: at the moment, we intentionally do not specify explicitly what happens but the logic currently is quite straightforward. LU is used for general matrices, LDL^T for symmetric or hermitian matrices and LL^T for spd/hpd (^T is a conjugated transpose for the complex case). So, by specifying the matrix type, the user currently can control the decomposition.
Note: QR is not supported right now but if it ever gets supported, there will be a way for the user to choose between QR and LU.
However, there are extra considerations. E.g., if matching is turned on, then even if the original matrix is symmetric, after matching the system is likely to loose the symmetry and the final decomposition would be described by
P_reorder^T * A * P_match * P_reorder = LU
These two libraries provide different functionality. Even when considering just the functionality for solving linear systems of the form Ax = b, cuDSS currently only supports solving systems with sparse matrices, while cuSOLVER only supports the dense case.
This is only true, because of the deprecations of cuSolverSp methods and the recommendation fo switching to cuDSS, right? Because I definitely used cuSOLVER for solving sparse systems using cuSolverSp methods before.
Could you also shed some light in regards as to why the switch to cuDSS is being done? Is cuSOLVERSp not being maintained anymore or have you found ways to write a faster alternative?
Yes (with Sparse QR being the first priority), but it would require more requests to prioritize adding this feature. What kind of methods (QR, LQ, SVD, …) and systems (over- or under-determined?) would you want to see for rectangular matrices?
Are there alternative libraries for solving sparse rectangular matrices which use CUDA (aside from the least square solver from cuSOLVERSp)
E.g., SuiteSparse has CUDA-accelerated SuiteSparseQR (but I haven’t tried it so can’t say how well it works)
I have no preference in regards to which method would be best for solving over-/underdetermined linear systems. I am sure that SVD would be good, but I don’t know if the parallelisation is that simple. I think the solving of at least overdetermined systems, - that maybe also minimize ||x|| -would be very useful to certain applications like approximation of PDE.
I thank you very much for your responses! I will now resolve this thread and mark the solution.