How is cache coherence handled for host pinned memory?

when host pinned memory is allocated with

cudaHostAlloc (void **ptr, size_t size, cudaHostAllocMapped)

in older architecture before Grace supperchip, how was cache coherence handled? Is it handled just like unified managed memory through data migration and page fault?

In newer architecture, especially Grace supper chip, will unified managed memory and host pinned memory both handled the same way through hardware MESI cache coherence protocol?

Speaking about GPUs with neither HMM nor ATS in effect, pinned memory (that thing produced by cudaHostAlloc) is not migrated. The GPU cache behavior is not well specified (that I know of), but when I ran my own tests, I observed that it was not cached in L2 but could be cached in L1. Some work with the profiler and directed tests could probably yield instructive observations.

In grace-hopper (ATS is in effect), managed memory is typically migrated on-demand to the processor that touches it, similar to other environments where the concurrentManagedAccess property is true.

Other types of memory (e.g. ordinary host memory pages allocated with host malloc when ATS is in effect) may get migrated; AFAIK this behavior is not specified. I wouldn’t depend on it, and anecdotal reports suggest it may not give the same performance as a managed allocation.