Can a warp scheduler send instructions to tensor core and cuda core concurrently?

spring_wind · May 5, 2025, 12:46am

In a SM sub core, if we have two tasks, one task will use tensor core to do matrix multiply, the other will use cuda core to do general computing. The two tasks are independent, so they can be executed concurrently. Is it possible for a warp scheduler in a SM sub core to execute the two tasks concurrently?

Robert_Crovella · May 5, 2025, 12:54am

on modern GPUs, a warp scheduler can only issue one instruction per clock, maximum. Therefore in the narrowest definition of “concurrently”, the answer is no. However with a slight expansion of the definition of “concurrently” the answer is yes.

This question comes up from time to time. Here is a related thread.

Curefab · May 5, 2025, 6:38am

If you can use both (tensor cores and normal computation units) for your algorithm, best try to start relatively wide mma instructions, which take many cycles to complete.

The mma instructions mostly (not always) map to actual SASS instructions and the wmma instructions are compiled into often smaller mma instructions.

The same data type mma instructions are offered with different matrix sizes. Look into the PTX manual for a table.

Topic		Replies	Views
Use cuda core & tensor core at the same time CUDA Programming and Performance	6	479	September 29, 2024
Can CUDA Core and Tensor Core in one SM execute concurrently? CUDA Programming and Performance	1	868	August 25, 2023
How do CUDA cores on a SM execute warps concurrently? CUDA Programming and Performance	8	28778	July 4, 2019
warp scheduler of Fermi architecture CUDA Programming and Performance	2	3229	February 5, 2012
Cuda operations along side Tensor operations CUDA Programming and Performance	2	490	October 12, 2021
Beginner's question about concurrent warp execution. CUDA Programming and Performance	3	2552	July 4, 2019
About the relationship between warp and tensor_core CUDA Programming and Performance	7	1474	July 7, 2023
Concurrent execution of CUDA and Tensor cores CUDA Programming and Performance	34	8590	November 3, 2024
Warps - Number of threads running concurrently CUDA Programming and Performance	4	2185	March 19, 2011
Warp thread Scheduling CUDA Programming and Performance	7	2255	June 28, 2010

Can a warp scheduler send instructions to tensor core and cuda core concurrently?

Related topics