<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2024-03-26 11:01, Felix Kuehling
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:1bc4dfcd-79ed-48b6-b5c9-9a2d9e5a393a@amd.com">On
      2024-03-26 10:53, Philip Yang wrote:
      <br>
      <blockquote type="cite">
        <br>
        <br>
        On 2024-03-25 14:45, Felix Kuehling wrote:
        <br>
        <blockquote type="cite">On 2024-03-22 15:57, Zhigang Luo wrote:
          <br>
          <blockquote type="cite">it will cause page fault after device
            recovered if there is a process running.
            <br>
            <br>
            Signed-off-by: Zhigang Luo <a class="moz-txt-link-rfc2396E" href="mailto:Zhigang.Luo@amd.com"><Zhigang.Luo@amd.com></a>
            <br>
            Change-Id: Ib1eddb56b69ecd41fe703abd169944154f48b0cd
            <br>
            ---
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
            <br>
              1 file changed, 2 insertions(+)
            <br>
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
            <br>
            index 70261eb9b0bb..2867e9186e44 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
            <br>
            @@ -4974,6 +4974,8 @@ static int
            amdgpu_device_reset_sriov(struct amdgpu_device *adev,
            <br>
              retry:
            <br>
                  amdgpu_amdkfd_pre_reset(adev);
            <br>
              +    amdgpu_amdkfd_wait_no_process_running(adev);
            <br>
            +
            <br>
          </blockquote>
          <br>
          This waits for the processes to be terminated. What would
          cause the processes to be terminated? Why do the processes
          need to be terminated? Isn't it enough if the processes are
          removed from the runlist in pre-reset, so they can no longer
          execute on the GPU?
          <br>
        </blockquote>
        <br>
        mode 1 reset on SRIOV is much faster then BM, kgd2kfd_pre_reset
        sends GPU reset event to user space, don't remove queues from
        the runlist, after mode1 reset is done, there is queue still
        running and generate vm fault because the GPU page table is
        gone.
        <br>
        <br>
      </blockquote>
      I think seeing a page fault during the reset is not a problem.
      Seeing a page fault after the reset would be a bug. The process
      should not be on the runlist after the reset is done.
      <br>
      <br>
      Waiting for the process to terminate first looks like a
      workaround, when the real bug is maybe that we're not updating the
      process state correctly in pre-reset. All currently running
      processes should be put into evicted state, so they are not put
      back on the runlist after the reset.
      <br>
    </blockquote>
    <p>Forgot to mention it is F/W hang issue to trigger GPU reset,
      there is also error message when kgd2kfd_pre_reset ->
      kgd2kfd_suspend to evict queues from the runlist,  yes, this seems
      W/A for the real issue related to mode1 reset.<br>
    </p>
    <p>Regards,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:1bc4dfcd-79ed-48b6-b5c9-9a2d9e5a393a@amd.com">
      <br>
      Regards,
      <br>
        Felix
      <br>
      <br>
      <br>
      <blockquote type="cite">Regards,
        <br>
        <br>
        Philip
        <br>
        <br>
        <blockquote type="cite">
          <br>
          Regards,
          <br>
            Felix
          <br>
          <br>
          <br>
          <blockquote type="cite">amdgpu_device_stop_pending_resets(adev);
            <br>
                    if (from_hypervisor)
            <br>
          </blockquote>
        </blockquote>
      </blockquote>
    </blockquote>
  </body>
</html>