<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>I don't think i can apply this patch 'as is' as this has
      dependency on patch by Steven which also wasn't applied yet -
      588b982 Steven Price        6 weeks ago    drm: Don't free jobs in
      wait_event_interruptible()</p>
    <p><br>
    </p>
    <p>Andrey<br>
    </p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 12/3/19 2:44 PM, Deucher, Alexander
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:MWHPR12MB13589D76D1F7D518FE7D6F08F7420@MWHPR12MB1358.namprd12.prod.outlook.com">
      
      <style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
      <p style="font-family:Arial;font-size:10pt;color:#0078D7;margin:15pt;" align="Left">
        [AMD Official Use Only - Internal Distribution Only]<br>
      </p>
      <br>
      <div>
        <div style="font-family: Calibri, Arial, Helvetica, sans-serif;
          font-size: 12pt; color: rgb(0, 0, 0);">
          Please go ahead an apply whatever version is necessary for
          amd-staging-drm-next.</div>
        <div style="font-family: Calibri, Arial, Helvetica, sans-serif;
          font-size: 12pt; color: rgb(0, 0, 0);">
          <br>
        </div>
        <div style="font-family: Calibri, Arial, Helvetica, sans-serif;
          font-size: 12pt; color: rgb(0, 0, 0);">
          Alex</div>
        <div style="font-family: Calibri, Arial, Helvetica, sans-serif;
          font-size: 12pt; color: rgb(0, 0, 0);">
          <br>
        </div>
        <hr style="display:inline-block;width:98%" tabindex="-1">
        <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b>
            Grodzovsky, Andrey <a class="moz-txt-link-rfc2396E" href="mailto:Andrey.Grodzovsky@amd.com"><Andrey.Grodzovsky@amd.com></a><br>
            <b>Sent:</b> Tuesday, December 3, 2019 2:10 PM<br>
            <b>To:</b> Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; Deucher,
            Alexander <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a><br>
            <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>
            <a class="moz-txt-link-rfc2396E" href="mailto:dri-devel@lists.freedesktop.org"><dri-devel@lists.freedesktop.org></a>;
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
            <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org"><amd-gfx@lists.freedesktop.org></a>; Koenig, Christian
            <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:steven.price@arm.com">steven.price@arm.com</a>
            <a class="moz-txt-link-rfc2396E" href="mailto:steven.price@arm.com"><steven.price@arm.com></a><br>
            <b>Subject:</b> Re: [PATCH v4] drm/scheduler: Avoid
            accessing freed bad job.</font>
          <div> </div>
        </div>
        <div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
              <div class="PlainText">Yes - Christian just pushed it to
                drm-next-misc - I guess Alex/Christian
                <br>
                didn't pull to amd-staging-drm-next yet.<br>
                <br>
                Andrey<br>
                <br>
                On 12/2/19 2:24 PM, Deng, Emily wrote:<br>
                > [AMD Official Use Only - Internal Distribution
                Only]<br>
                ><br>
                > Hi Andrey,<br>
                >      Seems this patch is still not in
                amd-staging-drm-next?<br>
                ><br>
                > Best wishes<br>
                > Emily Deng<br>
                ><br>
                ><br>
                ><br>
                >> -----Original Message-----<br>
                >> From: Deng, Emily<br>
                >> Sent: Tuesday, November 26, 2019 4:41 PM<br>
                >> To: Grodzovsky, Andrey
                <a class="moz-txt-link-rfc2396E" href="mailto:Andrey.Grodzovsky@amd.com"><Andrey.Grodzovsky@amd.com></a><br>
                >> Cc: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>;
                <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Koenig,<br>
                >> Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>;
                <a class="moz-txt-link-abbreviated" href="mailto:steven.price@arm.com">steven.price@arm.com</a><br>
                >> Subject: RE: [PATCH v4] drm/scheduler: Avoid
                accessing freed bad job.<br>
                >><br>
                >> [AMD Official Use Only - Internal Distribution
                Only]<br>
                >><br>
                >> Reviewed-by: Emily Deng
                <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a><br>
                >><br>
                >>> -----Original Message-----<br>
                >>> From: Grodzovsky, Andrey
                <a class="moz-txt-link-rfc2396E" href="mailto:Andrey.Grodzovsky@amd.com"><Andrey.Grodzovsky@amd.com></a><br>
                >>> Sent: Tuesday, November 26, 2019 7:37 AM<br>
                >>> Cc: <a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>;
                <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>;<br>
                >>> Koenig, Christian
                <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>; Deng, Emily<br>
                >>> <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>;
                <a class="moz-txt-link-abbreviated" href="mailto:steven.price@arm.com">steven.price@arm.com</a><br>
                >>> Subject: Re: [PATCH v4] drm/scheduler:
                Avoid accessing freed bad job.<br>
                >>><br>
                >>> Ping<br>
                >>><br>
                >>> Andrey<br>
                >>><br>
                >>> On 11/25/19 3:51 PM, Andrey Grodzovsky
                wrote:<br>
                >>>> Problem:<br>
                >>>> Due to a race between
                drm_sched_cleanup_jobs in sched thread and<br>
                >>>> drm_sched_job_timedout in timeout work
                there is a possiblity that bad<br>
                >>>> job was already freed while still being
                accessed from the timeout<br>
                >>>> thread.<br>
                >>>><br>
                >>>> Fix:<br>
                >>>> Instead of just peeking at the bad job
                in the mirror list remove it<br>
                >>>> from the list under lock and then put
                it back later when we are<br>
                >>>> garanteed no race with main sched
                thread is possible which is after<br>
                >>>> the thread is parked.<br>
                >>>><br>
                >>>> v2: Lock around processing
                ring_mirror_list in drm_sched_cleanup_jobs.<br>
                >>>><br>
                >>>> v3: Rebase on top of drm-misc-next. v2
                is not needed anymore as<br>
                >>>> drm_sched_get_cleanup_job already has a
                lock there.<br>
                >>>><br>
                >>>> v4: Fix comments to relfect latest code
                in drm-misc.<br>
                >>>><br>
                >>>> Signed-off-by: Andrey Grodzovsky
                <a class="moz-txt-link-rfc2396E" href="mailto:andrey.grodzovsky@amd.com"><andrey.grodzovsky@amd.com></a><br>
                >>>> Reviewed-by: Christian König
                <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a><br>
                >>>> Tested-by: Emily Deng
                <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a><br>
                >>>> ---<br>
                >>>>   
                drivers/gpu/drm/scheduler/sched_main.c | 27<br>
                >>> +++++++++++++++++++++++++++<br>
                >>>>    1 file changed, 27 insertions(+)<br>
                >>>><br>
                >>>> diff --git
                a/drivers/gpu/drm/scheduler/sched_main.c<br>
                >>>>
                b/drivers/gpu/drm/scheduler/sched_main.c<br>
                >>>> index 6774955..1bf9c40 100644<br>
                >>>> ---
                a/drivers/gpu/drm/scheduler/sched_main.c<br>
                >>>> +++
                b/drivers/gpu/drm/scheduler/sched_main.c<br>
                >>>> @@ -284,10 +284,21 @@ static void
                drm_sched_job_timedout(struct<br>
                >>> work_struct *work)<br>
                >>>>     unsigned long flags;<br>
                >>>><br>
                >>>>     sched = container_of(work, struct
                drm_gpu_scheduler,<br>
                >>>> work_tdr.work);<br>
                >>>> +<br>
                >>>> +  /* Protects against concurrent
                deletion in<br>
                >>> drm_sched_get_cleanup_job */<br>
                >>>> + 
                spin_lock_irqsave(&sched->job_list_lock, flags);<br>
                >>>>     job =
                list_first_entry_or_null(&sched->ring_mirror_list,<br>
                >>>>                                   
                struct drm_sched_job, node);<br>
                >>>><br>
                >>>>     if (job) {<br>
                >>>> +          /*<br>
                >>>> +           * Remove the bad job so it
                cannot be freed by concurrent<br>
                >>>> +           * drm_sched_cleanup_jobs.
                It will be reinserted back after<br>
                >>> sched->thread<br>
                >>>> +           * is parked at which point
                it's safe.<br>
                >>>> +           */<br>
                >>>> +         
                list_del_init(&job->node);<br>
                >>>> +         
                spin_unlock_irqrestore(&sched->job_list_lock,
                flags);<br>
                >>>> +<br>
                >>>>            
                job->sched->ops->timedout_job(job);<br>
                >>>><br>
                >>>>             /*<br>
                >>>> @@ -298,6 +309,8 @@ static void
                drm_sched_job_timedout(struct<br>
                >>> work_struct *work)<br>
                >>>>                    
                job->sched->ops->free_job(job);<br>
                >>>>                    
                sched->free_guilty = false;<br>
                >>>>             }<br>
                >>>> +  } else {<br>
                >>>> +         
                spin_unlock_irqrestore(&sched->job_list_lock,
                flags);<br>
                >>>>     }<br>
                >>>><br>
                >>>>    
                spin_lock_irqsave(&sched->job_list_lock, flags);
                @@ -370,6 +383,20<br>
                >>>> @@ void drm_sched_stop(struct
                drm_gpu_scheduler *sched, struct<br>
                >>> drm_sched_job *bad)<br>
                >>>>     kthread_park(sched->thread);<br>
                >>>><br>
                >>>>     /*<br>
                >>>> +   * Reinsert back the bad job here -
                now it's safe as<br>
                >>>> +   * drm_sched_get_cleanup_job cannot
                race against us and release the<br>
                >>>> +   * bad job at this point - we parked
                (waited for) any in progress<br>
                >>>> +   * (earlier) cleanups and
                drm_sched_get_cleanup_job will not be<br>
                >>> called<br>
                >>>> +   * now until the scheduler thread is
                unparked.<br>
                >>>> +   */<br>
                >>>> +  if (bad && bad->sched ==
                sched)<br>
                >>>> +          /*<br>
                >>>> +           * Add at the head of the
                queue to reflect it was the earliest<br>
                >>>> +           * job extracted.<br>
                >>>> +           */<br>
                >>>> +          list_add(&bad->node,
                &sched->ring_mirror_list);<br>
                >>>> +<br>
                >>>> +  /*<br>
                >>>>      * Iterate the job list from later
                to  earlier one and either deactive<br>
                >>>>      * their HW callbacks or remove
                them from mirror list if they already<br>
                >>>>      * signaled.<br>
              </div>
            </span></font></div>
      </div>
    </blockquote>
  </body>
</html>