Can a warp scheduler send instructions to tensor core and cuda core concurrently?

In a SM sub core, if we have two tasks, one task will use tensor core to do matrix multiply, the other will use cuda core to do general computing. The two tasks are independent, so they can be executed concurrently. Is it possible for a warp scheduler in a SM sub core to execute the two tasks concurrently?

on modern GPUs, a warp scheduler can only issue one instruction per clock, maximum. Therefore in the narrowest definition of “concurrently”, the answer is no. However with a slight expansion of the definition of “concurrently” the answer is yes.

This question comes up from time to time. Here is a related thread.

If you can use both (tensor cores and normal computation units) for your algorithm, best try to start relatively wide mma instructions, which take many cycles to complete.

The mma instructions mostly (not always) map to actual SASS instructions and the wmma instructions are compiled into often smaller mma instructions.

The same data type mma instructions are offered with different matrix sizes. Look into the PTX manual for a table.