<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Am 17.05.2017 um 10:46 schrieb zhoucm1:<br>
    </div>
    <blockquote cite="mid:591C0DFB.8030604@amd.com" type="cite">
      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
      <br>
      <br>
      <div class="moz-cite-prefix">On 2017年05月17日 16:40, Christian König
        wrote:<br>
      </div>
      <blockquote
        cite="mid:7a302ebe-1de1-734f-fb21-aadcc7904d37@vodafone.de"
        type="cite">Am 17.05.2017 um 10:01 schrieb Michel Dänzer: <br>
        <blockquote type="cite">On 17/05/17 04:13 PM, zhoucm1 wrote: <br>
          <blockquote type="cite">On 2017年05月17日 14:57, Michel Dänzer
            wrote: <br>
            <blockquote type="cite">On 17/05/17 01:28 PM, zhoucm1 wrote:
              <br>
              <blockquote type="cite">On 2017年05月17日 11:15, Michel
                Dänzer wrote: <br>
                <blockquote type="cite">On 17/05/17 12:04 PM, zhoucm1
                  wrote: <br>
                  <blockquote type="cite">On 2017年05月17日 09:18, Michel
                    Dänzer wrote: <br>
                    <blockquote type="cite">On 16/05/17 06:25 PM,
                      Chunming Zhou wrote: <br>
                      <blockquote type="cite">Change-Id:
                        I8eb6d7f558da05510e429d3bf1d48c8cec6c1977 <br>
                        Signed-off-by: Chunming Zhou <a
                          moz-do-not-send="true"
                          class="moz-txt-link-rfc2396E"
                          href="mailto:David1.Zhou@amd.com"><David1.Zhou@amd.com></a>
                        <br>
                        <br>
                        diff --git
                        a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c <br>
                        b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c <br>
                        index bca1fb5..f3e7525 100644 <br>
                        --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c <br>
                        +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c <br>
                        @@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct
                        drm_device *dev, <br>
                        void *data, struct drm_file *filp) <br>
                                 case AMDGPU_VM_OP_UNRESERVE_VMID: <br>
                                     amdgpu_vm_free_reserved_vmid(adev,
                        &fpriv->vm, <br>
                        AMDGPU_GFXHUB); <br>
                                     break; <br>
                        +    case AMDGPU_VM_OP_RESET: <br>
                        +        fpriv->vram_lost_counter = <br>
                        atomic_read(&adev->vram_lost_counter); <br>
                        +        break; <br>
                      </blockquote>
                      How do you envision the UMDs using this? I can
                      mostly think of them <br>
                      calling this ioctl when a context is created or
                      destroyed. But that <br>
                      would also allow any other remaining contexts
                      using the same DRM file <br>
                      descriptor to use all ioctls again. So, I think
                      there needs to be a <br>
                      vram_lost_counter in struct amdgpu_ctx instead of
                      in struct <br>
                      amdgpu_fpriv. <br>
                    </blockquote>
                    struct amdgpu_fpriv for vram_lost_counter is proper
                    place, especially <br>
                    for ioctl return value. <br>
                    if you need to reset ctx one by one, we can mark all
                    contexts of that <br>
                    vm, and then reset by userspace. <br>
                  </blockquote>
                  I'm not following. With vram_lost_counter in
                  amdgpu_fpriv, if any <br>
                  context calls this ioctl, all other contexts using the
                  same file <br>
                  descriptor will also be considered safe again, right?
                  <br>
                </blockquote>
                Yes, but it really depends on userspace requirement, if
                you need to <br>
                reset ctx one by one, we can mark all contexts of that
                vm to guilty, and <br>
                then reset one context by userspace. <br>
              </blockquote>
              Still not sure what you mean by that. <br>
              <br>
              E.g. what do you mean by "guilty"? I thought that refers
              to the context <br>
              which caused a hang. But it seems like you're using it to
              refer to any <br>
              context which hasn't reacted yet to VRAM contents being
              lost. <br>
            </blockquote>
            When vram is lost, we treat all contexts need to reset. <br>
          </blockquote>
          Essentially, your patches only track VRAM contents being lost
          per file <br>
          descriptor, not per context. I'm not sure (rather skeptical)
          that this <br>
          is suitable for OpenGL UMDs, since state is usually tracked
          per context. <br>
          Marek / Nicolai? <br>
        </blockquote>
        <br>
        Oh, yeah that's a good point. <br>
        <br>
        The problem with tracking it per context is that Vulkan also
        wants the ENODEV on the amdgpu_gem_va_ioct() and
        amdgpu_info_ioctl() which are context less. <br>
        <br>
        But thinking more about this blocking those two doesn't make
        much sense. The VM content can be restored and why should be
        disallow reading GPU info? <br>
      </blockquote>
      I can re-paste the Vulkan APIs requiring ENODEV:<br>
      "
      <p class="MsoNormal">The Vulkan APIs listed below could return
        VK_ERROR_DEVICE_LOST according to the spec. <o:p></o:p></p>
      <p class="MsoNormal">I tries to provide a list of u/k interfaces
        that could be called for each vk API.</p>
    </blockquote>
    <br>
    Well those are the Vulkan requirements, but that doesn't necessary
    mean we must follow that on the kernel side. Keep in mind that
    Vulkan can't made any requirements towards the kernel driver.<br>
    <br>
    IIRC we already have a query Vulkan can use to figure out if a GPU
    reset happened or not. So they could use that instead.<br>
    <br>
    Regards,<br>
    Christian.<br>
    <br>
    <blockquote cite="mid:591C0DFB.8030604@amd.com" type="cite">
      <p class="MsoNormal"><o:p></o:p></p>
      <p class="MsoNormal"><o:p> </o:p></p>
      <p class="MsoNormal">vkCreateDevice<o:p></o:p></p>
      <p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1
        level1 lfo2"><span style="mso-list:Ignore">-<span
            style="font:7.0pt "Times New Roman"">          </span></span><span
          style="color:#1F497D">amdgpu_device_initialize.<o:p></o:p></span></p>
      <p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1
        level1 lfo2"><span style="mso-list:Ignore">-<span
            style="font:7.0pt "Times New Roman"">          </span></span><span
          style="color:#1F497D">amdgpu_query_gpu_info<o:p></o:p></span></p>
      <p class="MsoNormal"><o:p> </o:p></p>
      <p class="MsoNormal">vkQueueSubmit<o:p></o:p></p>
      <p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1
        level1 lfo2"><span style="mso-list:Ignore">-<span
            style="font:7.0pt "Times New Roman"">          </span></span>amdgpu_cs_submit<o:p></o:p></p>
      <p class="MsoNormal"><o:p> </o:p></p>
      <p class="MsoNormal">vkWaitForFences<o:p></o:p></p>
      <p class="MsoNormal">                amdgpu_cs_wait_fences<span
          style="color:#1F497D"><o:p></o:p></span></p>
      <p class="MsoNormal"><o:p> </o:p></p>
      <p class="MsoNormal">vkGetEventStatus<o:p></o:p></p>
      <p class="MsoNormal">vkQueueWaitIdle<o:p></o:p></p>
      <p class="MsoNormal">vkDeviceWaitIdle<o:p></o:p></p>
      <p class="MsoNormal">vkGetQueryPoolResults<b><o:p></o:p></b></p>
      <p class="MsoNormal">                amdgpu_cs_query_Fence_status<o:p></o:p></p>
      <p class="MsoNormal"><o:p> </o:p></p>
      <p class="MsoNormal">vkQueueBindSparse<b><o:p></o:p></b></p>
      <p class="MsoNormal">                amdgpu_bo_va_op<o:p></o:p></p>
      <p class="MsoNormal">                amdgpu_bo_va_op_raw<o:p></o:p></p>
      <p class="MsoNormal"><o:p> </o:p></p>
      <p class="MsoNormal">vkCreateSwapchainKHR<b><o:p></o:p></b></p>
      <p class="MsoNormal">vkAcquireNextImageKHR<b><o:p></o:p></b></p>
      <p class="MsoNormal">vkQueuePresentKHR<o:p></o:p></p>
      <p class="MsoNormal">                Not related with u/k
        interface.<b><o:p></o:p></b></p>
      <p class="MsoNormal"><b><o:p> </o:p></b></p>
      Besides those listed above, I think
      amdgpu_cs_signal_Sem/amdgpu_cs_wait_sem should respond to gpu
      reset as well."<br>
      <blockquote
        cite="mid:7a302ebe-1de1-734f-fb21-aadcc7904d37@vodafone.de"
        type="cite"> <br>
        Christian. <br>
        <br>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>