How is cache coherence handled for host pinned memory?

zhongchen530 · February 28, 2025, 7:26pm

when host pinned memory is allocated with

cudaHostAlloc (void **ptr, size_t size, cudaHostAllocMapped)

in older architecture before Grace supperchip, how was cache coherence handled? Is it handled just like unified managed memory through data migration and page fault?

In newer architecture, especially Grace supper chip, will unified managed memory and host pinned memory both handled the same way through hardware MESI cache coherence protocol?

Robert_Crovella · February 28, 2025, 7:42pm

Speaking about GPUs with neither HMM nor ATS in effect, pinned memory (that thing produced by cudaHostAlloc) is not migrated. The GPU cache behavior is not well specified (that I know of), but when I ran my own tests, I observed that it was not cached in L2 but could be cached in L1. Some work with the profiler and directed tests could probably yield instructive observations.

In grace-hopper (ATS is in effect), managed memory is typically migrated on-demand to the processor that touches it, similar to other environments where the concurrentManagedAccess property is true.

Other types of memory (e.g. ordinary host memory pages allocated with host malloc when ATS is in effect) may get migrated; AFAIK this behavior is not specified. I wouldn’t depend on it, and anecdotal reports suggest it may not give the same performance as a managed allocation.

Topic		Replies	Views
cudaHostAlloc caching behavior CUDA Programming and Performance	0	468	November 1, 2019
Difference between cudamallocmanaged and malloc/new CUDA Programming and Performance	2	96	February 20, 2025
Performance decrease on Unified GracehopperC CUDA Programming and Performance	3	115	February 7, 2025
Unified Memory vs Pinned Host Memory vs GPU Global Memory CUDA Programming and Performance	9	8902	June 1, 2022
Does unified memory incur double transfer? CUDA Programming and Performance cuda	2	341	April 6, 2022
gpu access host memory CUDA Programming and Performance	1	643	January 20, 2012
Unified Memory is pageable? it can be swap out to disk? CUDA Programming and Performance	6	67	April 2, 2025
Pinned memory CUDA Programming and Performance	2	409	February 20, 2019
Memory-type quesions CUDA Programming and Performance	7	513	April 21, 2023
uncached memory created by cudaHostAlloc and cudaMemcpyAsync issues on TX1 Jetson TX1	3	1739	July 15, 2016

How is cache coherence handled for host pinned memory?

Related topics