<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    Hi guys,<br>
    <br>
    can you please explain to me why it's always you guys which come up
    with such nonsense?<br>
    <br>
    When you need to find the number of ongoing hardware submission then
    please use the amdgpu_fence_count_emitted() function and not mess
    with any scheduler internals.<br>
    <br>
    This patch here is a clear NAK from my side.<br>
    <br>
    Regards,<br>
    Christian.<br>
    <br>
    <div class="moz-cite-prefix">Am 13.11.24 um 09:46 schrieb Fan,
      Shikang:<br>
    </div>
    <blockquote type="cite" cite="mid:SA1PR12MB7343D24CCFC921D157E2CDCCEB5A2@SA1PR12MB7343.namprd12.prod.outlook.com">
      
      <style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
      <p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
        [AMD Official Use Only - AMD Internal Distribution Only]<br>
      </p>
      <br>
      <div>
        <div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
          +<a href="mailto:Christian.Koenig@amd.com" id="OWAAM717321" class="tWKOu mention ms-bgc-nlr ms-fcl-b" moz-do-not-send="true">@Koenig, Christian</a><br>
          <br>
          Hi Christian,<br>
          <br>
          Could you please help review this patch? Thank you.<br>
          <br>
          Regards,</div>
        <div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
          Shikang</div>
        <hr style="display:inline-block;width:98%" tabindex="-1">
        <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b>
            Shikang Fan <a class="moz-txt-link-rfc2396E" href="mailto:shikang.fan@amd.com"><shikang.fan@amd.com></a><br>
            <b>Sent:</b> Wednesday, November 13, 2024 11:14 AM<br>
            <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
            <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org"><amd-gfx@lists.freedesktop.org></a><br>
            <b>Cc:</b> Fan, Shikang <a class="moz-txt-link-rfc2396E" href="mailto:Shikang.Fan@amd.com"><Shikang.Fan@amd.com></a>; Liu01,
            Tong (Esther) <a class="moz-txt-link-rfc2396E" href="mailto:Tong.Liu01@amd.com"><Tong.Liu01@amd.com></a>; Deng, Emily
            <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a><br>
            <b>Subject:</b> [PATCH] drm/amdgpu: Check pending job
            finished or not to identify has bad job</font>
          <div> </div>
        </div>
        <div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
              <div class="PlainText">drm_sched_free_job_work is a queue
                work function,<br>
                so even job is finished in hw, it still needs some time
                to<br>
                be deleted from the pending queue by
                drm_sched_free_job_work.<br>
                here iterates over the pending job list and wait for
                each job to finish<br>
                within specified timeout (1s by default) to avoid jobs
                that are not<br>
                cleaned up in time or are about to finished.<br>
                if wait timeout, return true<br>
                <br>
                Signed-off-by: Tong Liu01 <a class="moz-txt-link-rfc2396E" href="mailto:Tong.Liu01@amd.com"><Tong.Liu01@amd.com></a><br>
                Signed-off-by: Emily Deng <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a><br>
                Signed-off-by: Shikang Fan <a class="moz-txt-link-rfc2396E" href="mailto:shikang.fan@amd.com"><shikang.fan@amd.com></a><br>
                ---<br>
                 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21
                ++++++++++++++++-----<br>
                 1 file changed, 16 insertions(+), 5 deletions(-)<br>
                <br>
                diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
                b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                index 071d3d9b345d..da2a22618f42 100644<br>
                --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                @@ -100,6 +100,7 @@
                MODULE_FIRMWARE("amdgpu/navi12_gpu_info.bin");<br>
                 #define AMDGPU_PCIE_INDEX_FALLBACK (0x38 >> 2)<br>
                 #define AMDGPU_PCIE_INDEX_HI_FALLBACK (0x44 >> 2)<br>
                 #define AMDGPU_PCIE_DATA_FALLBACK (0x3C >> 2)<br>
                +#define AMDGPU_PENDING_JOB_TIMEOUT    
                msecs_to_jiffies(1000)<br>
                 <br>
                 static const struct drm_driver amdgpu_kms_driver;<br>
                 <br>
                @@ -5224,7 +5225,8 @@ static int
                amdgpu_device_reset_sriov(struct amdgpu_device *adev,<br>
                 bool amdgpu_device_has_job_running(struct amdgpu_device
                *adev)<br>
                 {<br>
                         int i;<br>
                -       struct drm_sched_job *job;<br>
                +       struct drm_sched_job *job, *tmp;<br>
                +       long r;<br>
                 <br>
                         for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {<br>
                                 struct amdgpu_ring *ring =
                adev->rings[i];<br>
                @@ -5233,11 +5235,20 @@ bool
                amdgpu_device_has_job_running(struct amdgpu_device
                *adev)<br>
                                         continue;<br>
                 <br>
                                
                spin_lock(&ring->sched.job_list_lock);<br>
                -               job =
                list_first_entry_or_null(&ring->sched.pending_list,<br>
                -                                              struct
                drm_sched_job, list);<br>
                +<br>
                +               /* iterates over the pending job list<br>
                +                * wait for each job to finish within
                timeout (1s by default)<br>
                +                * if wait timeout, return true<br>
                +                */<br>
                +               list_for_each_entry_safe(job, tmp,
                &ring->sched.pending_list, list) {<br>
                +                       r =
                dma_fence_wait_timeout(&job->s_fence->finished,<br>
+                                                               false,
                AMDGPU_PENDING_JOB_TIMEOUT);<br>
                +                       if (r <= 0) {<br>
                +                              
                spin_unlock(&ring->sched.job_list_lock);<br>
                +                               return true;<br>
                +                       }<br>
                +               }<br>
                                
                spin_unlock(&ring->sched.job_list_lock);<br>
                -               if (job)<br>
                -                       return true;<br>
                         }<br>
                         return false;<br>
                 }<br>
                -- <br>
                2.34.1<br>
                <br>
              </div>
            </span></font></div>
      </div>
    </blockquote>
    <br>
  </body>
</html>