Description
Is your feature request related to a problem? Please describe.
Based on a Discourse discussion here https://itensor.discourse.group/t/evaluating-overlaps-of-mpss-in-parallel/451/
it seems that the tensor contraction backend, in this case called through the inner
function, can generate a lot of "garbage", that is perform a large number of allocations. In the user's case, this resulted either in a measureable slowdown of multithreaded performance, or when disabling GC (GC.enable(false)
) led to a spike in memory usage followed by a delay after GC was re-enabled.
It should be noted that the calculation performed by the user was itself rather demanding, with something like a thousand inner products of length N=100 MPS being performed all at the same time. The overall speed of this was actually quite good, and the only issue here is how effectively it can be parallelized by multithreading.
Describe the solution you'd like
This is more of a "placeholder" issue to remind us to investigate allocation in the contraction engine. (Unless it is is in the inner
function itself, though I doubt that given the simplicity of that function.)
Describe alternatives you've considered
Considered disabling GC or other Julia-language aspects, outside of ITensor, but my current best guess here is that there are just a lot of allocations happening at the contraction level.
Additional context
Forum discussion:
https://itensor.discourse.group/t/evaluating-overlaps-of-mpss-in-parallel/451