<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body>
    Yes, I think so as well. Andrey can you push this?<br>
    <br>
    Christian.<br>
    <br>
    <div class="moz-cite-prefix">Am 15.09.21 um 00:59 schrieb
      Grodzovsky, Andrey:<br>
    </div>
    <blockquote type="cite"
cite="mid:SN6PR12MB4623F8A1707C870938E08140EADA9@SN6PR12MB4623.namprd12.prod.outlook.com">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <div>AFAIK this one is independent.</div>
      <div><br>
      </div>
      <div>Christian, can you confirm ?</div>
      <div><br>
      </div>
      <div>Andrey</div>
      <hr style="display:inline-block;width:98%" tabindex="-1">
      <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
          face="Calibri, sans-serif" color="#000000"><b>From:</b>
          amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org"><amd-gfx-bounces@lists.freedesktop.org></a> on
          behalf of Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexdeucher@gmail.com"><alexdeucher@gmail.com></a><br>
          <b>Sent:</b> 14 September 2021 15:33<br>
          <b>To:</b> Christian König
          <a class="moz-txt-link-rfc2396E" href="mailto:ckoenig.leichtzumerken@gmail.com"><ckoenig.leichtzumerken@gmail.com></a><br>
          <b>Cc:</b> Liu, Monk <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>; amd-gfx list
          <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org"><amd-gfx@lists.freedesktop.org></a>; Maling list - DRI
          developers <a class="moz-txt-link-rfc2396E" href="mailto:dri-devel@lists.freedesktop.org"><dri-devel@lists.freedesktop.org></a><br>
          <b>Subject:</b> Re: [PATCH 1/2] drm/sched: fix the bug of time
          out calculation(v4)</font>
        <div> </div>
      </div>
      <div class="BodyFragment"><font size="2"><span
            style="font-size:11pt;">
            <div class="PlainText">Was this fix independent of the other
              discussions?  Should this be<br>
              applied to drm-misc?<br>
              <br>
              Alex<br>
              <br>
              On Wed, Sep 1, 2021 at 4:42 PM Alex Deucher
              <a class="moz-txt-link-rfc2396E" href="mailto:alexdeucher@gmail.com"><alexdeucher@gmail.com></a> wrote:<br>
              ><br>
              > On Wed, Sep 1, 2021 at 2:50 AM Christian König<br>
              > <a class="moz-txt-link-rfc2396E" href="mailto:ckoenig.leichtzumerken@gmail.com"><ckoenig.leichtzumerken@gmail.com></a> wrote:<br>
              > ><br>
              > > Am 01.09.21 um 02:46 schrieb Monk Liu:<br>
              > > > issue:<br>
              > > > in cleanup_job the cancle_delayed_work will
              cancel a TO timer<br>
              > > > even the its corresponding job is still
              running.<br>
              > > ><br>
              > > > fix:<br>
              > > > do not cancel the timer in cleanup_job,
              instead do the cancelling<br>
              > > > only when the heading job is signaled, and
              if there is a "next" job<br>
              > > > we start_timeout again.<br>
              > > ><br>
              > > > v2:<br>
              > > > further cleanup the logic, and do the TDR
              timer cancelling if the signaled job<br>
              > > > is the last one in its scheduler.<br>
              > > ><br>
              > > > v3:<br>
              > > > change the issue description<br>
              > > > remove the cancel_delayed_work in the
              begining of the cleanup_job<br>
              > > > recover the implement of
              drm_sched_job_begin.<br>
              > > ><br>
              > > > v4:<br>
              > > > remove the kthread_should_park() checking
              in cleanup_job routine,<br>
              > > > we should cleanup the signaled job asap<br>
              > > ><br>
              > > > TODO:<br>
              > > > 1)introduce pause/resume scheduler in
              job_timeout to serial the handling<br>
              > > > of scheduler and job_timeout.<br>
              > > > 2)drop the bad job's del and insert in
              scheduler due to above serialization<br>
              > > > (no race issue anymore with the
              serialization)<br>
              > > ><br>
              > > > tested-by: jingwen
              <a class="moz-txt-link-rfc2396E" href="mailto:jingwen.chen@@amd.com"><jingwen.chen@@amd.com></a><br>
              > > > Signed-off-by: Monk Liu
              <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a><br>
              > ><br>
              > > Reviewed-by: Christian König
              <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a><br>
              > ><br>
              ><br>
              > Are you planning to push this to drm-misc?<br>
              ><br>
              > Alex<br>
              ><br>
              ><br>
              > > > ---<br>
              > > >   drivers/gpu/drm/scheduler/sched_main.c |
              26 +++++++++-----------------<br>
              > > >   1 file changed, 9 insertions(+), 17
              deletions(-)<br>
              > > ><br>
              > > > diff --git
              a/drivers/gpu/drm/scheduler/sched_main.c
              b/drivers/gpu/drm/scheduler/sched_main.c<br>
              > > > index a2a9536..3e0bbc7 100644<br>
              > > > ---
              a/drivers/gpu/drm/scheduler/sched_main.c<br>
              > > > +++
              b/drivers/gpu/drm/scheduler/sched_main.c<br>
              > > > @@ -676,15 +676,6 @@
              drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)<br>
              > > >   {<br>
              > > >       struct drm_sched_job *job, *next;<br>
              > > ><br>
              > > > -     /*<br>
              > > > -      * Don't destroy jobs while the
              timeout worker is running  OR thread<br>
              > > > -      * is being parked and hence assumed
              to not touch pending_list<br>
              > > > -      */<br>
              > > > -     if ((sched->timeout !=
              MAX_SCHEDULE_TIMEOUT &&<br>
              > > > -        
              !cancel_delayed_work(&sched->work_tdr)) ||<br>
              > > > -         kthread_should_park())<br>
              > > > -             return NULL;<br>
              > > > -<br>
              > > >      
              spin_lock(&sched->job_list_lock);<br>
              > > ><br>
              > > >       job =
              list_first_entry_or_null(&sched->pending_list,<br>
              > > > @@ -693,17 +684,21 @@
              drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)<br>
              > > >       if (job &&
              dma_fence_is_signaled(&job->s_fence->finished))
              {<br>
              > > >               /* remove job from
              pending_list */<br>
              > > >              
              list_del_init(&job->list);<br>
              > > > +<br>
              > > > +             /* cancel this job's TO timer
              */<br>
              > > > +            
              cancel_delayed_work(&sched->work_tdr);<br>
              > > >               /* make the scheduled
              timestamp more accurate */<br>
              > > >               next =
              list_first_entry_or_null(&sched->pending_list,<br>
              > >
              >                                              
              typeof(*next), list);<br>
              > > > -             if (next)<br>
              > > > +<br>
              > > > +             if (next) {<br>
              > > >                      
              next->s_fence->scheduled.timestamp =<br>
              > > >                              
              job->s_fence->finished.timestamp;<br>
              > > > -<br>
              > > > +                     /* start TO timer for
              next job */<br>
              > > > +                    
              drm_sched_start_timeout(sched);<br>
              > > > +             }<br>
              > > >       } else {<br>
              > > >               job = NULL;<br>
              > > > -             /* queue timeout for next job
              */<br>
              > > > -            
              drm_sched_start_timeout(sched);<br>
              > > >       }<br>
              > > ><br>
              > > >      
              spin_unlock(&sched->job_list_lock);<br>
              > > > @@ -791,11 +786,8 @@ static int
              drm_sched_main(void *param)<br>
              > > >                                        
              (entity = drm_sched_select_entity(sched))) ||<br>
              > > >                                       
              kthread_should_stop());<br>
              > > ><br>
              > > > -             if (cleanup_job) {<br>
              > > > +             if (cleanup_job)<br>
              > > >                      
              sched->ops->free_job(cleanup_job);<br>
              > > > -                     /* queue timeout for
              next job */<br>
              > > > -                    
              drm_sched_start_timeout(sched);<br>
              > > > -             }<br>
              > > ><br>
              > > >               if (!entity)<br>
              > > >                       continue;<br>
              > ><br>
            </div>
          </span></font></div>
    </blockquote>
    <br>
  </body>
</html>