<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Attached. </p>
    <p>If the general idea in the patch is OK I can think of a test (and
      maybe add to libdrm amdgpu tests) to actually simulate this
      scenario with 2 forked</p>
    <p>concurrent processes working on same entity's job queue when one
      is dying while the other keeps pushing to the same queue. For now
      I only tested it</p>
    <p>with normal boot and ruining multiple glxgears concurrently -
      which doesn't really test this code path since i think each of
      them works on it's own FD.<br>
    </p>
    <p>Andrey<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 08/10/2018 09:27 AM, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:5bf40a54-18f9-98fd-a3df-dd0b8da0a424@gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">Crap, yeah indeed that needs to be
        protected by some lock.<br>
        <br>
        Going to prepare a patch for that,<br>
        Christian.<br>
        <br>
        Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:<br>
      </div>
      <blockquote type="cite"
        cite="mid:54621fc1-7246-f1bf-26bb-a16c4daf249f@amd.com">
        <p>Reviewed-by: Andrey Grodzovsky <a
            class="moz-txt-link-rfc2396E"
            href="mailto:andrey.grodzovsky@amd.com"
            moz-do-not-send="true"><andrey.grodzovsky@amd.com></a></p>
        <p><br>
        </p>
        <p>But I still  have questions about entity->last_user
          (didn't notice this before) - <br>
        </p>
        <p>Looks to me there is a race condition with it's current
          usage, let's say process A was preempted after doing
          drm_sched_entity_flush->cmpxchg(...)</p>
        <p>now process B working on same entity (forked) is inside
          drm_sched_entity_push_job, he writes his PID to
          entity->last_user and also</p>
        <p>executes drm_sched_rq_add_entity. Now process A runs again
          and execute drm_sched_rq_remove_entity inadvertently causing
          process B removal</p>
        <p>from it's scheduler rq.</p>
        <p>Looks to me like instead we should lock together
          entity->last_user accesses and adds/removals of entity to
          the rq.</p>
        <p>Andrey<br>
        </p>
        <br>
        <div class="moz-cite-prefix">On 08/06/2018 10:18 AM, Nayan
          Deshmukh wrote:<br>
        </div>
        <blockquote type="cite"
cite="mid:CAFd4ddzyvHPHepAgs=mjyWVj0WDV_pQbE9x7aHwNZ_zcME6fqQ@mail.gmail.com">
          <div dir="ltr">
            <div>
              <div>I forgot about this since we started discussing
                possible scenarios of processes and threads.<br>
                <br>
              </div>
              In any case, this check is redundant. Acked-by: Nayan
              Deshmukh <<a href="mailto:nayan26deshmukh@gmail.com"
                moz-do-not-send="true">nayan26deshmukh@gmail.com</a>><br>
              <br>
            </div>
            Nayan<br>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr">On Mon, Aug 6, 2018 at 7:43 PM Christian
              König <<a
                href="mailto:ckoenig.leichtzumerken@gmail.com"
                moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">Ping.
              Any objections to that?<br>
              <br>
              Christian.<br>
              <br>
              Am 03.08.2018 um 13:08 schrieb Christian König:<br>
              > That is superflous now.<br>
              ><br>
              > Signed-off-by: Christian König <<a
                href="mailto:christian.koenig@amd.com" target="_blank"
                moz-do-not-send="true">christian.koenig@amd.com</a>><br>
              > ---<br>
              >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -----<br>
              >   1 file changed, 5 deletions(-)<br>
              ><br>
              > diff --git
              a/drivers/gpu/drm/scheduler/gpu_scheduler.c
              b/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
              > index 85908c7f913e..65078dd3c82c 100644<br>
              > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
              > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
              > @@ -590,11 +590,6 @@ void
              drm_sched_entity_push_job(struct drm_sched_job *sched_job,<br>
              >       if (first) {<br>
              >               /* Add the entity to the run queue */<br>
              >               spin_lock(&entity->rq_lock);<br>
              > -             if (!entity->rq) {<br>
              > -                     DRM_ERROR("Trying to push to a
              killed entity\n");<br>
              > -                   
               spin_unlock(&entity->rq_lock);<br>
              > -                     return;<br>
              > -             }<br>
              >               drm_sched_rq_add_entity(entity->rq,
              entity);<br>
              >               spin_unlock(&entity->rq_lock);<br>
              >             
               drm_sched_wakeup(entity->rq->sched);<br>
              <br>
            </blockquote>
          </div>
        </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>