[PATCH 4/4] drm/amdgpu: reset fpriv vram_lost_counter
zhoucm1
david1.zhou at amd.com
Wed May 17 08:46:51 UTC 2017
On 2017年05月17日 16:40, Christian König wrote:
> Am 17.05.2017 um 10:01 schrieb Michel Dänzer:
>> On 17/05/17 04:13 PM, zhoucm1 wrote:
>>> On 2017年05月17日 14:57, Michel Dänzer wrote:
>>>> On 17/05/17 01:28 PM, zhoucm1 wrote:
>>>>> On 2017年05月17日 11:15, Michel Dänzer wrote:
>>>>>> On 17/05/17 12:04 PM, zhoucm1 wrote:
>>>>>>> On 2017年05月17日 09:18, Michel Dänzer wrote:
>>>>>>>> On 16/05/17 06:25 PM, Chunming Zhou wrote:
>>>>>>>>> Change-Id: I8eb6d7f558da05510e429d3bf1d48c8cec6c1977
>>>>>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>> index bca1fb5..f3e7525 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>>>> @@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct drm_device *dev,
>>>>>>>>> void *data, struct drm_file *filp)
>>>>>>>>> case AMDGPU_VM_OP_UNRESERVE_VMID:
>>>>>>>>> amdgpu_vm_free_reserved_vmid(adev, &fpriv->vm,
>>>>>>>>> AMDGPU_GFXHUB);
>>>>>>>>> break;
>>>>>>>>> + case AMDGPU_VM_OP_RESET:
>>>>>>>>> + fpriv->vram_lost_counter =
>>>>>>>>> atomic_read(&adev->vram_lost_counter);
>>>>>>>>> + break;
>>>>>>>> How do you envision the UMDs using this? I can mostly think of
>>>>>>>> them
>>>>>>>> calling this ioctl when a context is created or destroyed. But
>>>>>>>> that
>>>>>>>> would also allow any other remaining contexts using the same
>>>>>>>> DRM file
>>>>>>>> descriptor to use all ioctls again. So, I think there needs to
>>>>>>>> be a
>>>>>>>> vram_lost_counter in struct amdgpu_ctx instead of in struct
>>>>>>>> amdgpu_fpriv.
>>>>>>> struct amdgpu_fpriv for vram_lost_counter is proper place,
>>>>>>> especially
>>>>>>> for ioctl return value.
>>>>>>> if you need to reset ctx one by one, we can mark all contexts of
>>>>>>> that
>>>>>>> vm, and then reset by userspace.
>>>>>> I'm not following. With vram_lost_counter in amdgpu_fpriv, if any
>>>>>> context calls this ioctl, all other contexts using the same file
>>>>>> descriptor will also be considered safe again, right?
>>>>> Yes, but it really depends on userspace requirement, if you need to
>>>>> reset ctx one by one, we can mark all contexts of that vm to
>>>>> guilty, and
>>>>> then reset one context by userspace.
>>>> Still not sure what you mean by that.
>>>>
>>>> E.g. what do you mean by "guilty"? I thought that refers to the
>>>> context
>>>> which caused a hang. But it seems like you're using it to refer to any
>>>> context which hasn't reacted yet to VRAM contents being lost.
>>> When vram is lost, we treat all contexts need to reset.
>> Essentially, your patches only track VRAM contents being lost per file
>> descriptor, not per context. I'm not sure (rather skeptical) that this
>> is suitable for OpenGL UMDs, since state is usually tracked per context.
>> Marek / Nicolai?
>
> Oh, yeah that's a good point.
>
> The problem with tracking it per context is that Vulkan also wants the
> ENODEV on the amdgpu_gem_va_ioct() and amdgpu_info_ioctl() which are
> context less.
>
> But thinking more about this blocking those two doesn't make much
> sense. The VM content can be restored and why should be disallow
> reading GPU info?
I can re-paste the Vulkan APIs requiring ENODEV:
"
The Vulkan APIs listed below could return VK_ERROR_DEVICE_LOST according
to the spec.
I tries to provide a list of u/k interfaces that could be called for
each vk API.
vkCreateDevice
-amdgpu_device_initialize.
-amdgpu_query_gpu_info
vkQueueSubmit
-amdgpu_cs_submit
vkWaitForFences
amdgpu_cs_wait_fences
vkGetEventStatus
vkQueueWaitIdle
vkDeviceWaitIdle
vkGetQueryPoolResults**
amdgpu_cs_query_Fence_status
vkQueueBindSparse**
amdgpu_bo_va_op
amdgpu_bo_va_op_raw
vkCreateSwapchainKHR**
vkAcquireNextImageKHR**
vkQueuePresentKHR
Not related with u/k interface.**
**
Besides those listed above, I think
amdgpu_cs_signal_Sem/amdgpu_cs_wait_sem should respond to gpu reset as
well."
>
> Christian.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170517/3f3907c7/attachment.html>
More information about the amd-gfx
mailing list