[PATCH v2] drm/amdgpu: change vm->task_info handling

Wed Jan 24 16:38:53 UTC 2024

On 2024-01-24 9:32, Shashank Sharma wrote:
>
> On 19/01/2024 21:23, Felix Kuehling wrote:
>>
>> On 2024-01-18 14:21, Shashank Sharma wrote:
>>> This patch changes the handling and lifecycle of vm->task_info object.
>>> The major changes are:
>>> - vm->task_info is a dynamically allocated ptr now, and its uasge is
>>>    reference counted.
>>> - introducing two new helper funcs for task_info lifecycle management
>>>      - amdgpu_vm_get_task_info: reference counts up task_info before
>>>        returning this info
>>>      - amdgpu_vm_put_task_info: reference counts down task_info
>>> - last put to task_info() frees task_info from the vm.
>>>
>>> This patch also does logistical changes required for existing usage
>>> of vm->task_info.
>>>
>>> V2: Do not block all the prints when task_info not found (Felix)
>>>
>>> Cc: Christian Koenig <christian.koenig at amd.com>
>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>> Cc: Felix Kuehling <Felix.Kuehling at amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>
>> Nit-picks inline.

[snip]

>>> +/**
>>> + * amdgpu_vm_put_task_info_pasid - reference down the vm task_info ptr
>>> + * frees the vm task_info ptr at the last put
>>> + *
>>> + * @adev: drm device pointer
>>> + * @task_info: task_info struct under discussion.
>>> + * @pasid: pasid of the VM which contains task_info
>>> + */
>>> +void amdgpu_vm_put_task_info_pasid(struct amdgpu_device *adev,
>>> +                   struct amdgpu_task_info *task_info,
>>> +                   u32 pasid)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = kref_put(&task_info->refcount, amdgpu_vm_destroy_task_info);
>>> +
>>> +    /* Clean up if object was removed in the last put */
>>> +    if (ret == 1) {
>>> +        struct amdgpu_vm *vm;
>>> +
>>> +        vm = amdgpu_vm_get_vm_from_pasid(adev, pasid);
>>> +        if (!vm) {
>>> +            WARN(1, "Invalid PASID %u to put task info\n", pasid);
>>> +            return;
>>> +        }
>>> +
>>> +        vm->task_info = NULL;
>>
>> This doesn't make sense. If there is a VM pointing to the task_info, 
>> then the ref count should not go to 0. Therefore this whole "look up 
>> the VM from PASID and clear vm->task_info" seams bogus.
>
> Actually, (ret == 1) above indicates that cleanup function has been 
> called and task_info has been freed, and the vm->task_info is a 
> dangling ptr.
>
> The current design is
>
> - first open per process/fd creates vm->task_info
>
> - last close per process/fd frees vm->task_info
>
>>
>> I'd expect the last put_task_info call to come from amdgpu_vm_fini. 
>> At that point setting task_info to NULL is probably unnecessary. But 
>> if we wanted that, we could set it to NULL in the caller.
>>
> Even this can be done, I can call the first get_vm_info() from vm_init 
> and last put from vm_fini.

Well, actually it doesn't matter where the last put comes from. The main 
point is, that vm->task_info is a counted reference. As long as that 
reference exists, the refcount should not become 0. If it does, that's a 
bug somewhere. The vm->task_info reference is only dropped in 
amdgpu_vm_fini, after which the whole vm structure is freed. So there 
should never be a need to set vm->task_info to NULL in 
amdgpu_vm_put_task_info_*.

[snip]
>>> +static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm)
>>> +{
>>> +    vm->task_info = kzalloc(sizeof(struct amdgpu_task_info), 
>>> GFP_KERNEL);
>>> +    if (!vm->task_info) {
>>> +        DRM_ERROR("OOM while creating task_info space\n");
>>
>> Printing OOM error messages is usually redundant. The allocators are 
>> already very noisy when OOM happens. I think checkpatch.pl also warns 
>> about that. Maybe it doesn't catch DRM_ERROR but only printk and 
>> friends.
>>
> Agree, I will make this debug instead of error.

Even a debug message is not needed. See https://lkml.org/lkml/2014/6/10/382.

[snip]

>>   @@ -157,18 +157,26 @@ static int gmc_v10_0_process_interrupt(struct 
>> amdgpu_device *adev,
>>>       if (!printk_ratelimit())
>>>           return 0;
>>>   -    memset(&task_info, 0, sizeof(struct amdgpu_task_info));
>>> -    amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
>>> +    task_info = amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
>>> +    if (task_info) {
>>> +        dev_err(adev->dev,
>>> +            "[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, 
>>> for process %s pid %d thread %s pid %d)\n",
>>> +            entry->vmid_src ? "mmhub" : "gfxhub",
>>> +            entry->src_id, entry->ring_id, entry->vmid,
>>> +            entry->pasid, task_info->process_name, task_info->tgid,
>>> +            task_info->task_name, task_info->pid);
>>> +        amdgpu_vm_put_task_info_pasid(adev, task_info, entry->pasid);
>>> +    } else {
>>> +        dev_err(adev->dev,
>>> +            "[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, 
>>> no process info)\n",
>>> +            entry->vmid_src ? "mmhub" : "gfxhub",
>>> +            entry->src_id, entry->ring_id, entry->vmid,
>>> +            entry->pasid);
>>
>> Can we avoid the duplication here? It's too easy for them to get out 
>> of sync. I think it's OK to change the message format so that the 
>> process into is printed on a separate line. E.g.:
>>
> That's exactly I thought, but then I was afraid of breaking any 
> existing scripts grepping for this exact text. I guess I can do this 
> change and see if anyone complaints :).

I don't think there are any ABI guarantees for log messages.

Regards,
   Felix

>
> - Shashank
>
>> dev_err(adev->dev,
>> -        "[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, for 
>> process %s pid %d thread %s pid %d)\n",
>> +        "[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u)\n",
>>          entry->vmid_src ? "mmhub" : "gfxhub",
>>          entry->src_id, entry->ring_id, entry->vmid,
>> -        entry->pasid, task_info.process_name, task_info.tgid,
>> -        task_info.task_name, task_info.pid);
>> +         entry->pasid);
>> +    if (task_info) {
>> +        dev_err(adev->dev, "  in process %s pid %d thread %s pid %d\n",
>> +            task_info.process_name, task_info.tgid,
>> +            task_info.task_name, task_info.pid);
>> +    }
>>
>>
>>> +    }
>>>   -    dev_err(adev->dev,
>>> -        "[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, for 
>>> process %s pid %d thread %s pid %d)\n",
>>> -        entry->vmid_src ? "mmhub" : "gfxhub",
>>> -        entry->src_id, entry->ring_id, entry->vmid,
>>> -        entry->pasid, task_info.process_name, task_info.tgid,
>>> -        task_info.task_name, task_info.pid);
>>>       dev_err(adev->dev, "  in page starting at address 0x%016llx 
>>> from client 0x%x (%s)\n",
>>> -        addr, entry->client_id,
>>> -        soc15_ih_clientid_name[entry->client_id]);
>>> +            addr, entry->client_id,
>>> +            soc15_ih_clientid_name[entry->client_id]);
>>>         if (!amdgpu_sriov_vf(adev))
>>> hub->vmhub_funcs->print_l2_protection_fault_status(adev,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> index 23d7b548d13f..3dda6c310729 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> @@ -126,19 +126,28 @@ static int gmc_v11_0_process_interrupt(struct 
>>> amdgpu_device *adev,
>>>       }
>>>         if (printk_ratelimit()) {
>>> -        struct amdgpu_task_info task_info;
>>> -
>>> -        memset(&task_info, 0, sizeof(struct amdgpu_task_info));
>>> -        amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
>>> +        struct amdgpu_task_info *task_info;
>>> +
>>> +        task_info = amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
>>> +        if (task_info) {
>>> +            dev_err(adev->dev,
>>> +                "[%s] page fault (src_id:%u ring:%u vmid:%u 
>>> pasid:%u, for process %s pid %d thread %s pid %d)\n",
>>> +                entry->vmid_src ? "mmhub" : "gfxhub",
>>> +                entry->src_id, entry->ring_id, entry->vmid,
>>> +                entry->pasid, task_info->process_name, 
>>> task_info->tgid,
>>> +                task_info->task_name, task_info->pid);
>>> +            amdgpu_vm_put_task_info_pasid(adev, task_info, 
>>> entry->pasid);
>>> +        } else {
>>> +            dev_err(adev->dev,
>>> +                "[%s] page fault (src_id:%u ring:%u vmid:%u 
>>> pasid:%u, no process info)\n",
>>> +                entry->vmid_src ? "mmhub" : "gfxhub",
>>> +                entry->src_id, entry->ring_id, entry->vmid,
>>> +                entry->pasid);
>>> +        }
>>
>> Same as above.
>>
>>
>>>   -        dev_err(adev->dev,
>>> -            "[%s] page fault (src_id:%u ring:%u vmid:%u pasid:%u, 
>>> for process %s pid %d thread %s pid %d)\n",
>>> -            entry->vmid_src ? "mmhub" : "gfxhub",
>>> -            entry->src_id, entry->ring_id, entry->vmid,
>>> -            entry->pasid, task_info.process_name, task_info.tgid,
>>> -            task_info.task_name, task_info.pid);
>>>           dev_err(adev->dev, "  in page starting at address 
>>> 0x%016llx from client %d\n",
>>> -            addr, entry->client_id);
>>> +                addr, entry->client_id);
>>> +
>>>           if (!amdgpu_sriov_vf(adev))
>>> hub->vmhub_funcs->print_l2_protection_fault_status(adev, status);
>>>       }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> index ff4ae73d27ec..aa3887c3bb35 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> @@ -1444,18 +1444,24 @@ static int gmc_v8_0_process_interrupt(struct 
>>> amdgpu_device *adev,
>>>           gmc_v8_0_set_fault_enable_default(adev, false);
>>>         if (printk_ratelimit()) {
>>> -        struct amdgpu_task_info task_info;
>>> -
>>> -        memset(&task_info, 0, sizeof(struct amdgpu_task_info));
>>> -        amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
>>> +        struct amdgpu_task_info *task_info;
>>> +
>>> +        task_info = amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
>>> +        if (task_info) {
>>> +            dev_err(adev->dev, "GPU fault detected: %d 0x%08x for 
>>> process %s pid %d thread %s pid %d\n",
>>> +                entry->src_id, entry->src_data[0], 
>>> task_info->process_name,
>>> +                task_info->tgid, task_info->task_name, 
>>> task_info->pid);
>>> +            amdgpu_vm_put_task_info_pasid(adev, task_info, 
>>> entry->pasid);
>>> +        } else {
>>> +            dev_err(adev->dev, "GPU fault detected: %d 0x%08x (no 
>>> process info)\n",
>>> +                entry->src_id, entry->src_data[0]);
>>> +        }
>> Same as above.
>>
>>
>>>   -        dev_err(adev->dev, "GPU fault detected: %d 0x%08x for 
>>> process %s pid %d thread %s pid %d\n",
>>> -            entry->src_id, entry->src_data[0], task_info.process_name,
>>> -            task_info.tgid, task_info.task_name, task_info.pid);
>>>           dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_ADDR   
>>> 0x%08X\n",
>>> -            addr);
>>> +                addr);
>>>           dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_STATUS 
>>> 0x%08X\n",
>>>               status);
>>> +
>>>           gmc_v8_0_vm_decode_fault(adev, status, addr, mc_client,
>>>                        entry->pasid);
>>>       }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index 2ac5820e9c92..075d633109e3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -549,7 +549,7 @@ static int gmc_v9_0_process_interrupt(struct 
>>> amdgpu_device *adev,
>>>       bool retry_fault = !!(entry->src_data[1] & 0x80);
>>>       bool write_fault = !!(entry->src_data[1] & 0x20);
>>>       uint32_t status = 0, cid = 0, rw = 0;
>>> -    struct amdgpu_task_info task_info;
>>> +    struct amdgpu_task_info *task_info;
>>>       struct amdgpu_vmhub *hub;
>>>       const char *mmhub_cid;
>>>       const char *hub_name;
>>> @@ -626,15 +626,23 @@ static int gmc_v9_0_process_interrupt(struct 
>>> amdgpu_device *adev,
>>>       if (!printk_ratelimit())
>>>           return 0;
>>>   -    memset(&task_info, 0, sizeof(struct amdgpu_task_info));
>>> -    amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
>>> +    task_info = amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
>>> +    if (task_info) {
>>> +        dev_err(adev->dev,
>>> +            "[%s] %s page fault (src_id:%u ring:%u vmid:%u 
>>> pasid:%u, for process %s pid %d thread %s pid %d)\n",
>>> +            hub_name, retry_fault ? "retry" : "no-retry",
>>> +            entry->src_id, entry->ring_id, entry->vmid,
>>> +            entry->pasid, task_info->process_name, task_info->tgid,
>>> +            task_info->task_name, task_info->pid);
>>> +        amdgpu_vm_put_task_info_pasid(adev, task_info, entry->pasid);
>>> +    } else {
>>> +        dev_err(adev->dev,
>>> +            "[%s] %s page fault (src_id:%u ring:%u vmid:%u 
>>> pasid:%u, no process info)\n",
>>> +            hub_name, retry_fault ? "retry" : "no-retry",
>>> +            entry->src_id, entry->ring_id, entry->vmid,
>>> +            entry->pasid);
>>> +    }
>> Same as above.
>>
>>
>>>   -    dev_err(adev->dev,
>>> -        "[%s] %s page fault (src_id:%u ring:%u vmid:%u pasid:%u, 
>>> for process %s pid %d thread %s pid %d)\n",
>>> -        hub_name, retry_fault ? "retry" : "no-retry",
>>> -        entry->src_id, entry->ring_id, entry->vmid,
>>> -        entry->pasid, task_info.process_name, task_info.tgid,
>>> -        task_info.task_name, task_info.pid);
>>>       dev_err(adev->dev, "  in page starting at address 0x%016llx 
>>> from IH client 0x%x (%s)\n",
>>>           addr, entry->client_id,
>>>           soc15_ih_clientid_name[entry->client_id]);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> index 3d68dd5523c6..515d1a1ff141 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> @@ -2104,7 +2104,7 @@ static int sdma_v4_0_print_iv_entry(struct 
>>> amdgpu_device *adev,
>>>                             struct amdgpu_iv_entry *entry)
>>>   {
>>>       int instance;
>>> -    struct amdgpu_task_info task_info;
>>> +    struct amdgpu_task_info *task_info;
>>>       u64 addr;
>>>         instance = sdma_v4_0_irq_id_to_seq(entry->client_id);
>>> @@ -2116,15 +2116,23 @@ static int sdma_v4_0_print_iv_entry(struct 
>>> amdgpu_device *adev,
>>>       addr = (u64)entry->src_data[0] << 12;
>>>       addr |= ((u64)entry->src_data[1] & 0xf) << 44;
>>>   -    memset(&task_info, 0, sizeof(struct amdgpu_task_info));
>>> -    amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
>>> +    task_info = amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
>>> +    if (task_info) {
>>> +        dev_dbg_ratelimited(adev->dev,
>>> +            "[sdma%d] address:0x%016llx src_id:%u ring:%u vmid:%u "
>>> +            "pasid:%u, for process %s pid %d thread %s pid %d\n",
>>> +            instance, addr, entry->src_id, entry->ring_id, 
>>> entry->vmid,
>>> +            entry->pasid, task_info->process_name, task_info->tgid,
>>> +            task_info->task_name, task_info->pid);
>>> +        amdgpu_vm_put_task_info_pasid(adev, task_info, entry->pasid);
>>> +    } else {
>>> +        dev_dbg_ratelimited(adev->dev,
>>> +            "[sdma%d] address:0x%016llx src_id:%u ring:%u vmid:%u "
>>> +            "pasid:%u, no process info\n",
>>> +            instance, addr, entry->src_id, entry->ring_id, 
>>> entry->vmid,
>>> +            entry->pasid);
>>> +    }
>> Same as above.
>>
>>
>>>   -    dev_dbg_ratelimited(adev->dev,
>>> -           "[sdma%d] address:0x%016llx src_id:%u ring:%u vmid:%u "
>>> -           "pasid:%u, for process %s pid %d thread %s pid %d\n",
>>> -           instance, addr, entry->src_id, entry->ring_id, entry->vmid,
>>> -           entry->pasid, task_info.process_name, task_info.tgid,
>>> -           task_info.task_name, task_info.pid);
>>>       return 0;
>>>   }
>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c 
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
>>> index 0f24af6f2810..d7e23bd90f7f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
>>> @@ -1642,7 +1642,7 @@ static int sdma_v4_4_2_print_iv_entry(struct 
>>> amdgpu_device *adev,
>>>                             struct amdgpu_iv_entry *entry)
>>>   {
>>>       int instance;
>>> -    struct amdgpu_task_info task_info;
>>> +    struct amdgpu_task_info *task_info;
>>>       u64 addr;
>>>         instance = sdma_v4_4_2_irq_id_to_seq(entry->client_id);
>>> @@ -1654,15 +1654,23 @@ static int sdma_v4_4_2_print_iv_entry(struct 
>>> amdgpu_device *adev,
>>>       addr = (u64)entry->src_data[0] << 12;
>>>       addr |= ((u64)entry->src_data[1] & 0xf) << 44;
>>>   -    memset(&task_info, 0, sizeof(struct amdgpu_task_info));
>>> -    amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
>>> +    task_info = amdgpu_vm_get_task_info_pasid(adev, entry->pasid);
>>> +    if (task_info) {
>>> +        dev_dbg_ratelimited(adev->dev,
>>> +            "[sdma%d] address:0x%016llx src_id:%u ring:%u vmid:%u "
>>> +            "pasid:%u, for process %s pid %d thread %s pid %d\n",
>>> +            instance, addr, entry->src_id, entry->ring_id, 
>>> entry->vmid,
>>> +            entry->pasid, task_info->process_name, task_info->tgid,
>>> +            task_info->task_name, task_info->pid);
>>> +        amdgpu_vm_put_task_info_pasid(adev, task_info, entry->pasid);
>>> +    } else {
>>> +        dev_dbg_ratelimited(adev->dev,
>>> +            "[sdma%d] address:0x%016llx src_id:%u ring:%u vmid:%u "
>>> +            "pasid:%u (no process info)\n",
>>> +            instance, addr, entry->src_id, entry->ring_id, 
>>> entry->vmid,
>>> +            entry->pasid);
>>> +    }
>> Same as above.
>>
>> Regards,
>>   Felix
>>
>>
>>>   -    dev_dbg_ratelimited(adev->dev,
>>> -           "[sdma%d] address:0x%016llx src_id:%u ring:%u vmid:%u "
>>> -           "pasid:%u, for process %s pid %d thread %s pid %d\n",
>>> -           instance, addr, entry->src_id, entry->ring_id, entry->vmid,
>>> -           entry->pasid, task_info.process_name, task_info.tgid,
>>> -           task_info.task_name, task_info.pid);
>>>       return 0;
>>>   }
>>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
>>> index d9953c2b2661..0dfe4b3bd18a 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
>>> @@ -238,16 +238,16 @@ void 
>>> kfd_smi_event_update_thermal_throttling(struct kfd_node *dev,
>>>     void kfd_smi_event_update_vmfault(struct kfd_node *dev, uint16_t 
>>> pasid)
>>>   {
>>> -    struct amdgpu_task_info task_info;
>>> -
>>> -    memset(&task_info, 0, sizeof(struct amdgpu_task_info));
>>> -    amdgpu_vm_get_task_info(dev->adev, pasid, &task_info);
>>> -    /* Report VM faults from user applications, not retry from 
>>> kernel */
>>> -    if (!task_info.pid)
>>> -        return;
>>> -
>>> -    kfd_smi_event_add(0, dev, KFD_SMI_EVENT_VMFAULT, "%x:%s\n",
>>> -              task_info.pid, task_info.task_name);
>>> +    struct amdgpu_task_info *task_info;
>>> +
>>> +    task_info = amdgpu_vm_get_task_info_pasid(dev->adev, pasid);
>>> +    if (task_info) {
>>> +        /* Report VM faults from user applications, not retry from 
>>> kernel */
>>> +        if (task_info->pid)
>>> +            kfd_smi_event_add(0, dev, KFD_SMI_EVENT_VMFAULT, 
>>> "%x:%s\n",
>>> +                     task_info->pid, task_info->task_name);
>>> +        amdgpu_vm_put_task_info_pasid(dev->adev, task_info, pasid);
>>> +    }
>>>   }
>>>     void kfd_smi_event_page_fault_start(struct kfd_node *node, pid_t 
>>> pid,