[PATCH v3 2/2] drm/admgpu: Present amdgpu_task_info in VM_FAULTS.

Mon Jul 9 07:04:50 UTC 2018

Am 09.07.2018 um 07:13 schrieb Zhang, Jerry (Junwei):
> On 07/06/2018 03:27 AM, Andrey Grodzovsky wrote:
>> Extract and present the reposnsible process and thread when
>> VM_FAULT happens.
>>
>> v2: Use getter and setter functions.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  4 ++++
>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 10 +++++++---
>>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  9 +++++++--
>>   3 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 7a625f3..609c8f5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -187,6 +187,10 @@ static int amdgpu_cs_parser_init(struct 
>> amdgpu_cs_parser *p, void *data)
>>       if (p->uf_entry.robj)
>>           p->job->uf_addr = uf_offset;
>>       kfree(chunk_array);
>> +
>> +    /* Use this opportunity to fill in task info for the vm */
>> +    amdgpu_vm_set_task_info(vm);
>> +
>
> Shall we set the task info when vm init?

No, vm_init() is called from a completely different process which is 
later on user of the VM.

>
>
>>       return 0;
>>
>>   free_all_kdata:
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> index 08753e7..75f3ffb 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> @@ -46,7 +46,6 @@
>>
>>   #include "ivsrcid/ivsrcid_vislands30.h"
>>
>> -
>>   static void gmc_v8_0_set_gmc_funcs(struct amdgpu_device *adev);
>>   static void gmc_v8_0_set_irq_funcs(struct amdgpu_device *adev);
>>   static int gmc_v8_0_wait_for_idle(void *handle);
>> @@ -1449,8 +1448,13 @@ static int gmc_v8_0_process_interrupt(struct 
>> amdgpu_device *adev,
>>           gmc_v8_0_set_fault_enable_default(adev, false);
>>
>>       if (printk_ratelimit()) {
>> -        dev_err(adev->dev, "GPU fault detected: %d 0x%08x\n",
>> -            entry->src_id, entry->src_data[0]);
>> +        struct amdgpu_task_info task_info = { 0 };
>> +
>> +        amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
>
> Shall we find vm and get the vm->task_info directly instead of filling 
> local variable task_info?
> (current style is also OK for me, just for more info)

No, we can't guarantee that the task_info pointer in the VM won't be 
freed after dropping the spinlock. So returning a copy here is the 
better approach.

>
> And we may also check the return value for "NULL" case, otherwise it 
> may cause access errorin dev_err(),
> if failed to find vm (although, it's most unlikely to happen).

Since we use a copy of the task info we should never get a NULL pointer. 
The string should already be zero terminated with the "{ 0 }" 
initialization above.

Christian.

>
> Jerry
>
>> +
>> +        dev_err(adev->dev, "GPU fault detected: %d 0x%08x for 
>> process %s pid %d thread %s pid %d\n",
>> +            entry->src_id, entry->src_data[0], task_info.process_name,
>> +            task_info.tgid, task_info.task_name, task_info.pid);
>>           dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_ADDR   
>> 0x%08X\n",
>>               addr);
>>           dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_STATUS 
>> 0x%08X\n",
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 691a659..9df94b4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -259,11 +259,16 @@ static int gmc_v9_0_process_interrupt(struct 
>> amdgpu_device *adev,
>>       }
>>
>>       if (printk_ratelimit()) {
>> +        struct amdgpu_task_info task_info = { 0 };
>> +
>> +        amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
>> +
>>           dev_err(adev->dev,
>> -            "[%s] VMC page fault (src_id:%u ring:%u vmid:%u 
>> pasid:%u)\n",
>> +            "[%s] VMC page fault (src_id:%u ring:%u vmid:%u 
>> pasid:%u, for process %s pid %d thread %s pid %d\n)\n",
>>               entry->vmid_src ? "mmhub" : "gfxhub",
>>               entry->src_id, entry->ring_id, entry->vmid,
>> -            entry->pasid);
>> +            entry->pasid, task_info.process_name, task_info.tgid,
>> +            task_info.task_name, task_info.pid);
>>           dev_err(adev->dev, "  at page 0x%016llx from %d\n",
>>               addr, entry->client_id);
>>           if (!amdgpu_sriov_vf(adev))
>>