[PATCH v6 05/13] drm/amdkfd: generic type as sys mem on migration to ram

Felix Kuehling felix.kuehling at amd.com
Tue Aug 17 00:42:40 UTC 2021


Am 2021-08-16 um 6:06 p.m. schrieb Zeng, Oak:
> Regards,
> Oak 
>
>  
>
> On 2021-08-16, 3:53 PM, "amd-gfx on behalf of Sierra Guiza, Alejandro (Alex)" <amd-gfx-bounces at lists.freedesktop.org on behalf of alex.sierra at amd.com> wrote:
>
>
>     On 8/15/2021 10:38 AM, Christoph Hellwig wrote:
>     > On Fri, Aug 13, 2021 at 01:31:42AM -0500, Alex Sierra wrote:
>     >>   	migrate.vma = vma;
>     >>   	migrate.start = start;
>     >>   	migrate.end = end;
>     >> -	migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
>     >>   	migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
>     >>   
>     >> +	if (adev->gmc.xgmi.connected_to_cpu)
>     >> +		migrate.flags = MIGRATE_VMA_SELECT_SYSTEM;
>     >> +	else
>     >> +		migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
>     > It's been a while since I touched this migrate code, but doesn't this
>     > mean that if the range already contains system memory the migration
>     > now won't do anything? for the connected_to_cpu case?
>
>     For above’s condition equal to connected_to_cpu , we’re explicitly 
>     migrating from
>     device memory to system memory with device generic type. 
>
> For MEMORY_DEVICE_GENERIC memory type, why do we need to explicitly migrate it from device memory to normal system memory? I thought the design was, for this type of memory, CPU can access it in place without migration(just like CPU access normal system memory), so there is no need to migrate such type of memory to normal system memory...
>
> With this patch, the migration behavior will be: when memory is accessed by CPU, it will be migrated to normal system memory; when memory is accessed by GPU, it will be migrated to device vram. This is basically the same behavior as when vram is treated as DEVICE_PRIVATE. 
>
> I thought the whole goal of introducing DEVICE_GENERIC is to avoid such back and forth migration b/t device memory and normal system memory. But maybe I am missing something here....

Hi Oak,

By using MEMORY_DEVICE_GENERIC we can avoid CPU page faults triggering
migration back to system memory on every CPU access on the Frontier
system architecture, because such pages can be mapped in the CPU page
table. You're right that this is the reason for the whole patch series.

But we still need the ability to migrate from MEMORY_DEVICE_GENERIC to
system memory for reasons other than CPU page faults. Applications can
request migrations explicitly (hipMemPrefetchAsync). Or we can be forced
to migrate data due to memory pressure from other allocations (evictions
in the TTM memory allocator).

Regards,
  Felix


>
> Regards,
> Oak
>
> In this type, 
>     device PTEs are
>     present in CPU page table.
>
>     During migrate_vma_collect_pmd walk op at migrate_vma_setup call, 
>     there’s a condition
>     for present pte that require migrate->flags be set for 
>     MIGRATE_VMA_SELECT_SYSTEM.
>     Otherwise, the migration for this entry will be ignored.
>
>     Regards,
>     Alex S.
>
>


More information about the amd-gfx mailing list