[PATCH] drm/amdgpu: Checkpoint and Restore VRAM BOs without VA
Felix Kuehling
felix.kuehling at amd.com
Tue Jul 25 20:23:58 UTC 2023
Am 2023-07-25 um 16:04 schrieb Errabolu, Ramesh:
> [AMD Official Use Only - General]
>
> Responses inline.
>
> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling at amd.com>
> Sent: Monday, July 24, 2023 2:51 PM
> To: amd-gfx at lists.freedesktop.org; Errabolu, Ramesh <Ramesh.Errabolu at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Checkpoint and Restore VRAM BOs without VA
>
>
> On 2023-07-24 11:57, Ramesh Errabolu wrote:
>> Extend checkpoint logic to allow inclusion of VRAM BOs that do not
>> have a VA attached
>>
>> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index 40ac093b5035..5cc00ff4b635 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -1845,7 +1845,8 @@ static uint32_t get_process_num_bos(struct kfd_process *p)
>> idr_for_each_entry(&pdd->alloc_idr, mem, id) {
>> struct kgd_mem *kgd_mem = (struct kgd_mem *)mem;
>>
>> - if ((uint64_t)kgd_mem->va > pdd->gpuvm_base)
>> + if (((uint64_t)kgd_mem->va > pdd->gpuvm_base) ||
>> + (kgd_mem->va == 0))
> I'm trying to remember what this condition is there to protect against, because it almost looks like we could drop the entire condition. I think it's there to avoid checkpointing the TMA/TBA BOs allocated by KFD itself.
>
> Ramesh: I am unsure as to how we can detect TMA/TBA BOs if we don't want them checkpointed. Alternatively we can checkpoint and restore TMA/TBA BOs without loss of function. If this o.k. we can drop the check that determines BO qualification.
It's OK. Currently they have a VA > 0 and < gpuvm_base. So this check
will still work if you only allow BOs with VA == 0.
There is a patch in the works to move the TMA and TBA to the upper half
of the virtual address space. Then we'll need to update this check to
exclude anything that has bit 63 of the VA set.
Regards,
Felix
>
> That said, you have some unnecessary parentheses in this expression. And just using !x to check for 0 is usually preferred in the kernel. This should work and is more readable IMO:
>
> + if ((uint64_t)kgd_mem->va > pdd->gpuvm_base || !kgd_mem->va)
>
>
>> num_of_bos++;
>> }
>> }
>> @@ -1917,7 +1918,8 @@ static int criu_checkpoint_bos(struct kfd_process *p,
>> kgd_mem = (struct kgd_mem *)mem;
>> dumper_bo = kgd_mem->bo;
>>
>> - if ((uint64_t)kgd_mem->va <= pdd->gpuvm_base)
>> + if (((uint64_t)kgd_mem->va <= pdd->gpuvm_base) &&
>> + !(kgd_mem->va == 0))
> Similar to above:
>
> + if (kgd_mem->va && (uint64_t)kgd_mem->va <= pdd->gpuvm_base)
>
> Regards,
> Felix
>
>
>> continue;
>>
>> bo_bucket = &bo_buckets[bo_index];
More information about the amd-gfx
mailing list