<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 2017年05月17日 16:40, Christian König
wrote:<br>
</div>
<blockquote
cite="mid:7a302ebe-1de1-734f-fb21-aadcc7904d37@vodafone.de"
type="cite">Am 17.05.2017 um 10:01 schrieb Michel Dänzer:
<br>
<blockquote type="cite">On 17/05/17 04:13 PM, zhoucm1 wrote:
<br>
<blockquote type="cite">On 2017年05月17日 14:57, Michel Dänzer
wrote:
<br>
<blockquote type="cite">On 17/05/17 01:28 PM, zhoucm1 wrote:
<br>
<blockquote type="cite">On 2017年05月17日 11:15, Michel Dänzer
wrote:
<br>
<blockquote type="cite">On 17/05/17 12:04 PM, zhoucm1
wrote:
<br>
<blockquote type="cite">On 2017年05月17日 09:18, Michel
Dänzer wrote:
<br>
<blockquote type="cite">On 16/05/17 06:25 PM, Chunming
Zhou wrote:
<br>
<blockquote type="cite">Change-Id:
I8eb6d7f558da05510e429d3bf1d48c8cec6c1977
<br>
Signed-off-by: Chunming Zhou
<a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com"><David1.Zhou@amd.com></a>
<br>
<br>
diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
<br>
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
<br>
index bca1fb5..f3e7525 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
<br>
@@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct
drm_device *dev,
<br>
void *data, struct drm_file *filp)
<br>
case AMDGPU_VM_OP_UNRESERVE_VMID:
<br>
amdgpu_vm_free_reserved_vmid(adev,
&fpriv->vm,
<br>
AMDGPU_GFXHUB);
<br>
break;
<br>
+ case AMDGPU_VM_OP_RESET:
<br>
+ fpriv->vram_lost_counter =
<br>
atomic_read(&adev->vram_lost_counter);
<br>
+ break;
<br>
</blockquote>
How do you envision the UMDs using this? I can
mostly think of them
<br>
calling this ioctl when a context is created or
destroyed. But that
<br>
would also allow any other remaining contexts using
the same DRM file
<br>
descriptor to use all ioctls again. So, I think
there needs to be a
<br>
vram_lost_counter in struct amdgpu_ctx instead of in
struct
<br>
amdgpu_fpriv.
<br>
</blockquote>
struct amdgpu_fpriv for vram_lost_counter is proper
place, especially
<br>
for ioctl return value.
<br>
if you need to reset ctx one by one, we can mark all
contexts of that
<br>
vm, and then reset by userspace.
<br>
</blockquote>
I'm not following. With vram_lost_counter in
amdgpu_fpriv, if any
<br>
context calls this ioctl, all other contexts using the
same file
<br>
descriptor will also be considered safe again, right?
<br>
</blockquote>
Yes, but it really depends on userspace requirement, if
you need to
<br>
reset ctx one by one, we can mark all contexts of that vm
to guilty, and
<br>
then reset one context by userspace.
<br>
</blockquote>
Still not sure what you mean by that.
<br>
<br>
E.g. what do you mean by "guilty"? I thought that refers to
the context
<br>
which caused a hang. But it seems like you're using it to
refer to any
<br>
context which hasn't reacted yet to VRAM contents being
lost.
<br>
</blockquote>
When vram is lost, we treat all contexts need to reset.
<br>
</blockquote>
Essentially, your patches only track VRAM contents being lost
per file
<br>
descriptor, not per context. I'm not sure (rather skeptical)
that this
<br>
is suitable for OpenGL UMDs, since state is usually tracked per
context.
<br>
Marek / Nicolai?
<br>
</blockquote>
<br>
Oh, yeah that's a good point.
<br>
<br>
The problem with tracking it per context is that Vulkan also wants
the ENODEV on the amdgpu_gem_va_ioct() and amdgpu_info_ioctl()
which are context less.
<br>
<br>
But thinking more about this blocking those two doesn't make much
sense. The VM content can be restored and why should be disallow
reading GPU info?
<br>
</blockquote>
I can re-paste the Vulkan APIs requiring ENODEV:<br>
"
<p class="MsoNormal">The Vulkan APIs listed below could return
VK_ERROR_DEVICE_LOST according to the spec. <o:p></o:p></p>
<p class="MsoNormal">I tries to provide a list of u/k interfaces
that could be called for each vk API.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">vkCreateDevice<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1
level1 lfo2"><span style="mso-list:Ignore">-<span
style="font:7.0pt "Times New Roman""> </span></span><span
style="color:#1F497D">amdgpu_device_initialize.<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1
level1 lfo2"><span style="mso-list:Ignore">-<span
style="font:7.0pt "Times New Roman""> </span></span><span
style="color:#1F497D">amdgpu_query_gpu_info<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">vkQueueSubmit<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1
level1 lfo2"><span style="mso-list:Ignore">-<span
style="font:7.0pt "Times New Roman""> </span></span>amdgpu_cs_submit<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">vkWaitForFences<o:p></o:p></p>
<p class="MsoNormal"> amdgpu_cs_wait_fences<span
style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">vkGetEventStatus<o:p></o:p></p>
<p class="MsoNormal">vkQueueWaitIdle<o:p></o:p></p>
<p class="MsoNormal">vkDeviceWaitIdle<o:p></o:p></p>
<p class="MsoNormal">vkGetQueryPoolResults<b><o:p></o:p></b></p>
<p class="MsoNormal"> amdgpu_cs_query_Fence_status<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">vkQueueBindSparse<b><o:p></o:p></b></p>
<p class="MsoNormal"> amdgpu_bo_va_op<o:p></o:p></p>
<p class="MsoNormal"> amdgpu_bo_va_op_raw<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">vkCreateSwapchainKHR<b><o:p></o:p></b></p>
<p class="MsoNormal">vkAcquireNextImageKHR<b><o:p></o:p></b></p>
<p class="MsoNormal">vkQueuePresentKHR<o:p></o:p></p>
<p class="MsoNormal"> Not related with u/k interface.<b><o:p></o:p></b></p>
<p class="MsoNormal"><b><o:p> </o:p></b></p>
Besides those listed above, I think
amdgpu_cs_signal_Sem/amdgpu_cs_wait_sem should respond to gpu reset
as well."<br>
<blockquote
cite="mid:7a302ebe-1de1-734f-fb21-aadcc7904d37@vodafone.de"
type="cite">
<br>
Christian.
<br>
<br>
</blockquote>
<br>
</body>
</html>