[PATCH v7 4/4] drm/amdgpu: track bo memory stats at runtime
Christian König
christian.koenig at amd.com
Wed Nov 13 14:19:51 UTC 2024
Am 13.11.24 um 15:09 schrieb Li, Yunxiang (Teddy):
> [Public]
>
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: Wednesday, November 13, 2024 3:49
>> Am 12.11.24 um 19:16 schrieb Li, Yunxiang (Teddy):
>>>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>>>> Sent: Tuesday, November 12, 2024 5:54
>>>> Am 10.11.24 um 16:41 schrieb Yunxiang Li:
>>>>> @@ -2612,7 +2707,6 @@ void amdgpu_vm_fini(struct amdgpu_device
>>>>> *adev, struct amdgpu_vm *vm)
>>>>>
>>>>> root = amdgpu_bo_ref(vm->root.bo);
>>>>> amdgpu_bo_reserve(root, true);
>>>>> - amdgpu_vm_put_task_info(vm->task_info);
>>>>> amdgpu_vm_set_pasid(adev, vm, 0);
>>>>> dma_fence_wait(vm->last_unlocked, false);
>>>>> dma_fence_put(vm->last_unlocked); @@ -2660,6 +2754,15 @@ void
>>>>> amdgpu_vm_fini(struct amdgpu_device *adev,
>>>> struct amdgpu_vm *vm)
>>>>> }
>>>>> }
>>>>>
>>>>> + if (!amdgpu_vm_stats_is_zero(vm)) {
>>>>> + struct amdgpu_task_info *ti = vm->task_info;
>>>>> +
>>>>> + dev_warn(adev->dev,
>>>>> + "VM memory stats for proc %s(%d) task %s(%d) is
>>>>> + non-zero
>>>> when fini\n",
>>>>> + ti->process_name, ti->pid, ti->task_name, ti->tgid);
>>>>> + }
>>>>> +
>>>>> + amdgpu_vm_put_task_info(vm->task_info);
>>>> Please don't move the call to amdgpu_vm_put_task_info().
>>> Is keeping the task_info alive a hazard here? I could copy out the info, it just
>> seemed a bit wasteful.
>>
>> Ah, now I see why you have moved that.
>>
>> IIRC we need to free up the task info before releasing the PASID, but that info might
>> be outdated. Need to check the code.
>>
>> Does it work if you move the message further up or does the root PD then break
>> your neck because it isn't released yet?
>>
>> Thanks,
>> Christian.
> It needs to be after root BO is deleted. I think there's a way to go from pasid to task_info but not the other way around, so it should be safe? It's okay if the pasid/pid/etc gets recycled before we get here and we print outdated info since it's just so we know which application we should use to try to reproduce the bug.
Double checked that and the order is actually incorrect right now and
your patch here is fixing it!
The call to amdgpu_vm_put_task_info() needs to be after the call to
amdgpu_vm_set_pasid(adev, vm, 0) or otherwise fault handling might use a
freed up task info.
So feel free to completely ignore my comment, you're actually fixing things.
Regards,
Christian.
>
> Teddy
More information about the amd-gfx
mailing list