[PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_*
tytso at mit.edu
Sun Jun 20 14:14:54 UTC 2021
On Thu, Jun 17, 2021 at 10:16:57AM -0500, Alex Sierra wrote:
> AMD is building a system architecture for the Frontier supercomputer with a
> coherent interconnect between CPUs and GPUs. This hardware architecture allows
> the CPUs to coherently access GPU device memory. We have hardware in our labs
> and we are working with our partner HPE on the BIOS, firmware and software
> for delivery to the DOE.
> The system BIOS advertises the GPU device memory (aka VRAM) as SPM
> (special purpose memory) in the UEFI system address map. The amdgpu driver looks
> it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
> using devm_memremap_pages.
> Now we're trying to migrate data to and from that memory using the migrate_vma_*
> helpers so we can support page-based migration in our unified memory allocations,
> while also supporting CPU access to those pages.
> This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
> correctly in the migrate_vma_* helpers. We are looking for feedback about this
> approach. If we're close, what's needed to make our patches acceptable upstream?
> If we're not close, any suggestions how else to achieve what we are trying to do
> (i.e. page migration and coherent CPU access to VRAM)?
Is there a way we can test the codepaths touched by this patchset? It
doesn't have to be via a complete qemu simulation of the GPU device
memory, but some way of creating MEMORY_DEVICE_GENERIC subject to
migrate_vma_* helpers so we can test for regressions moving forward.
More information about the amd-gfx