[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

Wed Feb 6 09:01:31 UTC 2019

Am 05.02.19 um 22:56 schrieb Yang, Philip:
> Hi Christian,
>
> I will submit new patch for review, my comments embedded inline below.
>
> Thanks,
> Philip
>
> On 2019-02-05 1:09 p.m., Koenig, Christian wrote:
>> Am 05.02.19 um 18:25 schrieb Yang, Philip:
>>> [SNIP]+
>>>>> +    if (r == -ERESTARTSYS) {
>>>>> +        if (!--tries) {
>>>>> +            DRM_ERROR("Possible deadlock? Retry too many times\n");
>>>>> +            return -EDEADLK;
>>>>> +        }
>>>>> +        goto restart;
>>>> You really need to restart the IOCTL to potentially handle signals.
>>>>
>>>> But since the calling code correctly handles this already you can just
>>>> drop this change as far as I can see.
>>>>
>>> I agree that we should return -ERESTARTSYS to upper layer to handle signals.
>>>
>>> But I do not find upper layers handle -ERESTARTSYS in the entire calling
>>> path, ksys_ioctl -> do_vfs_ioctl -> amdgpu_drm_ioctl -> drm->ioctl ->
>>> drm_ioctl_kernel -> amdgpu_cs_ioctl. The error code returns to
>>> application. I confirm it, libdrm userptr test application calling
>>> amdgpu_cs_ioctl return code is -512, which is -ERESTARTSYS.
>>>
>>> So application should handle -ERESTARTSYS to restart the ioctl, but
>>> libdrm userptr test application doesn't handle this. This causes the
>>> test failed.
>> This is a bug in the test cases then.
>>
>> -ERESTARTSYS can happen at any time during interruptible waiting and it
>> is mandatory for the upper layer to handle it correctly.
>>
> -ERESTARTSYS can be returned only when signal is pending, signal handler
> will translate ERESTARTSYS to EINTR, drmIoctl in libdrm does handle
> EINTR and restart the ioctl. The test cases are ok.

Interesting, I thought that this was handled unconditionally if a signal 
is pending or not. But that could as well be architecture specific.

> Driver fail path should not return ERESTARTSYS to user space. The new
> patch, I change amdgpu_cs_submit to return -EAGAIN if userptr is
> updated, and amdgpu_cs_ioctl redo the ioctl only if error code is
> -EAGAIN. ERESTARTSYS error code returns to user space for signal handle
> as before.

Good idea, that is probably cleaner. EGAIN and EINTR should have the 
same effect in drmIoctl IIRC.

>>> Below are details of userptr path difference. For the previous path,
>>> libdrm test always goes to step 2, step 3 never trigger. So it never
>>> return -ERESTARTSYS, but in theory, this could happen.
>>>
>>> For HMM path, the test always goes to step 3, we have to handle this
>>> case inside amdgpu_cs_ioctl. Maybe I can change amdgpu_cs_submit to
>>> return -EBUSY, then restart the ioctl inside amdgpu_cs_ioctl. I will
>>> submit new patch.
>> Clearly a NAK, this won't work correctly.
>>
> I don't understand your concern, may you explain the reason?

When some underlying code returns ERESTARTSYS we *MUST* forward that to 
the upper layer.

Handling it in a loop or replacing it manually with EINTR or EGAIN will 
break signal handling.

This is suggested quite often, but can end up in an endless loop inside 
the kernel which is quite ugly.

Regards,
Christian.