[PATCH 7/9] drm/amdgpu:block kms open during gpu_reset

Liu, Monk Monk.Liu at amd.com
Mon Oct 30 03:47:51 UTC 2017


> I can't see any difference between the handling of existing VMs and new created ones.
I know, for existing VMs we still have similar problems, I'm not saying this patch can save existing VM problem ...

My eldest patch series actually use a way can 100% avoid such problem: use RW mlock on drm_ioctl and gpu_recover(), drm_ioctl() take the
Read lock, and gpu_recover() take the write lock. 
But you gave NAK on this approach, so I want to hear your idea.

>Either we have correct handling and can redo the activity or we have corrupted VM page tables and crash again immediately.
The thing is some VM activity is not go through GPU scheduler (direct), if it is interrupted by gpu_recover() it's not going to be re-scheduled again ...


> So we need to handle this gracefully anyway, Christian.
Yeah I'd like to hear 


BR Monk



-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com] 
Sent: 2017年10月26日 23:15
To: Liu, Monk <Monk.Liu at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH 7/9] drm/amdgpu:block kms open during gpu_reset

Am 26.10.2017 um 13:08 schrieb Liu, Monk:
> "Clear operation on the page table " is some kind of SDMA activity right? What if ASIC RESET from amd_gpu_recover() interrupted this activity in fly ???
I can't see any difference between the handling of existing VMs and new created ones.

Either we have correct handling and can redo the activity or we have corrupted VM page tables and crash again immediately.

So we need to handle this gracefully anyway, Christian.

>
> BR Monk
>
> -----Original Message-----
> From: Koenig, Christian
> Sent: 2017年10月26日 18:54
> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH 7/9] drm/amdgpu:block kms open during gpu_reset
>
>> if we don't block device open while gpu doing recover, the vm init 
>> (SDMA working on page table creating) would be ruined by ASIC RESET
> That is not a problem at all. SDMA just does some clear operation on the page tables and those are either recovered from the shadow or run after the reset.
>
> Regards,
> Christian.
>
> Am 26.10.2017 um 10:17 schrieb Liu, Monk:
>> When amdgpu_gpu_recover() routine is in the fly, we shouldn't let UMD open our device, otherwise the VM init would be ruined by gpu_recover().
>>
>> e.g. VM init need to create page table, but keep In mind that
>> gpu_recover() calls ASIC RESET,
>>
>> if we don't block device open while gpu doing recover, the vm init 
>> (SDMA working on page table creating) would be ruined by ASIC RESET
>>
>> do you have any good solution ? the key point is 
>> avoid/delay/push_back hw activities from UMD side when we are running 
>> in gpu_recover() function
>>
>> BR Monk
>>
>> -----Original Message-----
>> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com]
>> Sent: 2017年10月26日 15:18
>> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH 7/9] drm/amdgpu:block kms open during gpu_reset
>>
>> NAK, why the heck should we do this? It would just block all new processes from using the device.
>>
>> Christian.
>>
>> Am 25.10.2017 um 11:22 schrieb Monk Liu:
>>> Change-Id: Ibdb0ea9e3769d572fbbc13bbf1ef73f1af2ab7be
>>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++
>>>     1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> index 4a9f749..c155ce4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> @@ -813,6 +813,9 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>>>     	if (r < 0)
>>>     		return r;
>>>     
>>> +	if (adev->in_gpu_reset)
>>> +		return -ENODEV;
>>> +
>>>     	fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
>>>     	if (unlikely(!fpriv)) {
>>>     		r = -ENOMEM;
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx




More information about the amd-gfx mailing list