[PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference
Sharma, Deepak
Deepak.Sharma at amd.com
Tue Nov 27 23:18:55 UTC 2018
-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Christian König
Sent: Tuesday, November 27, 2018 1:52 AM
To: Sharma, Deepak <Deepak.Sharma at amd.com>; Zhou, David(ChunMing) <David1.Zhou at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Sharma, Deepak <Deepak.Sharma at amd.com>; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference
Am 27.11.18 um 01:40 schrieb Deepak Sharma:
>
> On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:
>>
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>>> Sent: Monday, November 26, 2018 5:23 PM
>>> To: Sharma, Deepak <Deepak.Sharma at amd.com>; Zhou, David(ChunMing)
>>> <David1.Zhou at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>;
>>> amd-gfx at lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
>>> dereference
>>>
>>> Am 26.11.18 um 02:59 schrieb Sharma, Deepak:
>>>> 在 2018/11/24 2:10, Koenig, Christian 写道:
>>>>> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>>>>>> 在 2018/11/23 21:30, Koenig, Christian 写道:
>>>>>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
>>>>>>>> 在 2018/11/22 19:25, Christian König 写道:
>>>>>>>>> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
>>>>>>>>>> when returned fence is not valid mostly due to userspace ignored
>>>>>>>>>> previous error causes NULL pointer dereference.
>>>>>>>>> Again, this is clearly incorrect. The my other mails on the
>>>>>>>>> earlier patch.
>>>>>>>> Sorry for I didn't get your history, but looks from the patch
>>>>>>>> itself, it is still a valid patch, isn't it?
>>>>>>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
>>>>>>> when the fence is already signaled.
>>>>>>>
>>>>>>> So this patch could totally break userspace because it changes the
>>>>>>> behavior when we try to sync to an already signaled fence.
>>>>>> Ah, I got your meaning, how about attached patch?
>>>>> Yeah something like this, but I would just give the
>>>>> DRM_SYNCOBJ_CREATE_SIGNALED instead.
>>>>>
>>>>> I mean that's what this flag is good for isn't it?
>>>> Yeah, I give a flag initally when creating patch, but as you know, there is a
>>> swich case not be able to use that flag:
>>>> case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
>>>> fd = get_unused_fd_flags(O_CLOEXEC);
>>>> if (fd < 0) {
>>>> dma_fence_put(fence);
>>>> return fd;
>>>> }
>>>>
>>>> sync_file = sync_file_create(fence);
>>>> dma_fence_put(fence);
>>>> if (!sync_file) {
>>>> put_unused_fd(fd);
>>>> return -ENOMEM;
>>>> }
>>>>
>>>> fd_install(fd, sync_file->file);
>>>> info->out.handle = fd;
>>>> return 0;
>>>>
>>>> So I change to stub fence instead.
>>> Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
>>> fence.
>>>
>>> We should then probably move the stub fence function into
>>> dma_fence_stub.c under drivers/dma-buf to keep the stuff together.
>> Yes, you wrap it to review first with your stub fence, we can do it separately first.
>>
>> -David
>>>> -David
>>>>
>>>> I have not applied this patch.
>>>> The issue was trying to address is when amdgpu_cs_ioctl() failed due to
>>> low memory (ENOMEM) but userspace chose to proceed and called
>>> amdgpu_cs_fence_to_handle_ioctl().
>>>> In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
>>>> NULL pointer dereference, this patch was to avoid that and system panic
>>> But I understand now that its a valid case retuning NULL if fence was already
>>> signaled but need to handle case so avoid kernel panic. Seems David patch
>>> should fix this, I will test it tomorrow.
>>>
>>> Mhm, but don't we bail out with an error if we ask for a failed command
>>> submission? If not that sounds like a bug as well.
>>>
>>> Christian.
>>>
> Where do we do that?
> I see error
> [drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) failed.
> [drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> Did some more debugging,dma_fence_is_array() is causing NULL pointer
> dereference call through sync_file_ioctl.
>
> Also I think changes in David patch cant be applied on
> amd-staging-drm-next, which all patches I should take to get it correctly?
> Mhm, what you say here actually doesn't make much sense.
Yes I did mixed different issue here at end , please ignore it.
When the sequence number is invalid because the submission failed
amdgpu_ctx_get_fence() returns an error:
> if (seq >= centity->sequence) {
> spin_unlock(&ctx->ring_lock);
> return ERR_PTR(-EINVAL);
> }
This error is then handled in amdgpu_cs_fence_to_handle_ioctl():
> fence = amdgpu_cs_get_fence(adev, filp, &info->in.fence);
> if (IS_ERR(fence))
> return PTR_ERR(fence);
So the error handling for this is actually in place.
The only thing that could go wrong is that userspace sends a sequence
number which is to small.
The correct solution is to either set the flag as I suggested and make
sync_file_create() capable of handling a NULL fence.
Or we make the stub fence global like David suggested.
Regards,
Christian.
Thanks Christian for explanation , understood. We should take any of suggest approach in for solving issue.
Will give some more try to reproduce this and try the fix.
-Deepak
>
>>>> -Deepak
>>>>> Christian.
>>>>>
>>>>>> -David
>>>>>>> If that patch was applied then please revert it immediately.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> -David
>>>>>>>>> If you have already pushed the patch then please revert.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Deepak Sharma <Deepak.Sharma at amd.com>
>>>>>>>>>> ---
>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
>>>>>>>>>> 1 file changed, 2 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> index 024dfbd87f11..14166cd8a12f 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> @@ -1403,6 +1403,8 @@ static struct dma_fence
>>>>>>>>>> *amdgpu_cs_get_fence(struct amdgpu_device *adev,
>>>>>>>>>> fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
>>>>>>>>>> amdgpu_ctx_put(ctx);
>>>>>>>>>> + if(!fence)
>>>>>>>>>> + return ERR_PTR(-EINVAL);
>>>>>>>>>> return fence;
>>>>>>>>>> }
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list