[PATCH] drm/amdgpu: fix a bug NULL pointer dereference

Nirmoy nirmodas at amd.com
Thu Feb 20 10:39:37 UTC 2020


On 2/20/20 9:41 AM, Li, Dennis wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Christian and Monk,
>        When doing SDMA copy, a RAS uncorrectable error happens, which will cause this issue.  The RAS uncorrectable error event will trigger driver to do BACO reset which will set the status of SDMA scheduler to no ready. And then drm_sched_entity_get_free_sched will return NULL in drm_sched_entity_select_rq, which cause entity->rq to NULL.

This can happen only in drm_sched_job_init() which gets called in 
amdgpu_job_submit() which is after when we get this NULL ptr exception.

         entity = p->direct ? &p->vm->direct : &p->vm->delayed;
         ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);

         WARN_ON(ib->length_dw == 0);
         amdgpu_ring_pad_ib(ring, ib);
         WARN_ON(ib->length_dw > p->num_dw_left);
         r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, 
&f);  <-- Here
         if (r)
                 goto error;

Could it be possible that we are doing  drm_sched_entity_init() for 
direct and indirect entity with zero scheds in amdgpu_vm_init() ?


Regards,

Nirmoy

> Best Regards
> Dennis Li
> -----Original Message-----
> From: Liu, Monk <Monk.Liu at amd.com>
> Sent: Wednesday, February 19, 2020 7:30 PM
> To: Koenig, Christian <Christian.Koenig at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Li, Dennis <Dennis.Li at amd.com>; amd-gfx at lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>
> Subject: 回复: [PATCH] drm/amdgpu: fix a bug NULL pointer dereference
>
>> +	if (!entity->rq)
>> +		return 0;
>> +
> Yes, supposedly we shouldn't get 'entity->rq == NULL' case , that looks the true bug
>
> -----邮件原件-----
> 发件人: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> 代表 Christian K?nig
> 发送时间: 2020年2月19日 18:50
> 收件人: Zhang, Hawking <Hawking.Zhang at amd.com>; Li, Dennis <Dennis.Li at amd.com>; amd-gfx at lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>
> 主题: Re: [PATCH] drm/amdgpu: fix a bug NULL pointer dereference
>
> Well of hand this patch looks like a clear NAK to me.
>
> Returning without raising an error is certainly the wrong thing to do here because we just drop the necessary page table updates.
>
> How does the entity->rq ends up as NULL in the first place?
>
> Regards,
> Christian.
>
> Am 19.02.20 um 07:26 schrieb Zhang, Hawking:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
>>
>> Regards,
>> Hawking
>> -----Original Message-----
>> From: Dennis Li <Dennis.Li at amd.com>
>> Sent: Wednesday, February 19, 2020 12:05
>> To: amd-gfx at lists.freedesktop.org; Deucher, Alexander
>> <Alexander.Deucher at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Zhang,
>> Hawking <Hawking.Zhang at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>
>> Cc: Li, Dennis <Dennis.Li at amd.com>
>> Subject: [PATCH] drm/amdgpu: fix a bug NULL pointer dereference
>>
>> check whether the queue of entity is null to avoid null pointer dereference.
>>
>> Change-Id: I08d56774012cf229ba2fe7a011c1359e8d1e2781
>> Signed-off-by: Dennis Li <Dennis.Li at amd.com>
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>> index 4cc7881f438c..67cca463ddcc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>> @@ -95,6 +95,9 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>>    	int r;
>>    
>>    	entity = p->direct ? &p->vm->direct : &p->vm->delayed;
>> +	if (!entity->rq)
>> +		return 0;
>> +
>>    	ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
>>    
>>    	WARN_ON(ib->length_dw == 0);
>> --
>> 2.17.1
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cmo
>> nk.liu%40amd.com%7C28e7260af3a24eec758f08d7b52975e3%7C3dd8961fe4884e60
>> 8e11a82d994e183d%7C0%7C0%7C637177062003213431&sdata=vMXmhwTlN8lAav
>> uqhYhpmKLM6V%2F%2B2%2FubFBbsk%2BGY%2Bjw%3D&reserved=0
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cnirmoy.das%40amd.com%7C3af940db61dd474eb3f608d7b5e0a5cb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637177848788408220&sdata=QiKFb5KAI21G4%2B5OiNxf22OIlmSu0078xhFquDVrIXA%3D&reserved=0
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cnirmoy.das%40amd.com%7C3af940db61dd474eb3f608d7b5e0a5cb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637177848788408220&sdata=QiKFb5KAI21G4%2B5OiNxf22OIlmSu0078xhFquDVrIXA%3D&reserved=0


More information about the amd-gfx mailing list