<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2024-03-25 14:45, Felix Kuehling
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:4fb3033b-a025-45ea-846c-bf97566d41e0@amd.com">On
      2024-03-22 15:57, Zhigang Luo wrote:
      <br>
      <blockquote type="cite">it will cause page fault after device
        recovered if there is a process running.
        <br>
        <br>
        Signed-off-by: Zhigang Luo <a class="moz-txt-link-rfc2396E" href="mailto:Zhigang.Luo@amd.com"><Zhigang.Luo@amd.com></a>
        <br>
        Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd
        <br>
        ---
        <br>
          drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
        <br>
          1 file changed, 2 insertions(+)
        <br>
        <br>
        diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
        b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
        <br>
        index 70261eb9b0bb..2867e9186e44 100644
        <br>
        --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
        <br>
        +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
        <br>
        @@ -4974,6 +4974,8 @@ static int
        amdgpu_device_reset_sriov(struct amdgpu_device *adev,
        <br>
          retry:
        <br>
              amdgpu_amdkfd_pre_reset(adev);
        <br>
          +    amdgpu_amdkfd_wait_no_process_running(adev);
        <br>
        +
        <br>
      </blockquote>
      <br>
      This waits for the processes to be terminated. What would cause
      the processes to be terminated? Why do the processes need to be
      terminated? Isn't it enough if the processes are removed from the
      runlist in pre-reset, so they can no longer execute on the GPU?
      <br>
    </blockquote>
    <p>mode 1 reset on SRIOV is much faster then BM, kgd2kfd_pre_reset
      sends GPU reset event to user space, don't remove queues from the
      runlist, after mode1 reset is done, there is queue still running
      and generate vm fault because the GPU page table is gone.</p>
    <p>Regards,</p>
    <p>Philip  </p>
    <blockquote type="cite" cite="mid:4fb3033b-a025-45ea-846c-bf97566d41e0@amd.com">
      <br>
      Regards,
      <br>
        Felix
      <br>
      <br>
      <br>
      <blockquote type="cite">     
        amdgpu_device_stop_pending_resets(adev);
        <br>
                if (from_hypervisor)
        <br>
      </blockquote>
    </blockquote>
  </body>
</html>