<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>I assume that this is the only code change and no locks are taken
      in drm_sched_entity_push_job - <br>
    </p>
    <p>What happens if process A runs drm_sched_entity_push_job after
      this code was executed from the  (dying) process B and there</p>
    <p>are still jobs in the queue (the wait_event terminated
      prematurely), the entity already removed from rq , but bool
      'first' in drm_sched_entity_push_job</p>
    <p>will return false and so the entity will not be reinserted back
      into rq entity list and no wake up trigger will happen for process
      A pushing a new job.</p>
    <p><br>
    </p>
    <p>Another issue bellow - <br>
    </p>
    <p>Andrey<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 08/14/2018 03:05 AM, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:0fa473f5-155a-223e-fbb6-37147fd47a17@gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">I would rather like to avoid taking
        the lock in the hot path.<br>
        <br>
        How about this:<br>
        <br>
             /* For killed process disable any more IBs enqueue right
        now */<br>
            last_user = cmpxchg(&entity->last_user,
        current->group_leader, NULL);<br>
             if ((!last_user || last_user == current->group_leader)
        &&<br>
                 (current->flags & PF_EXITING) &&
        (current->exit_code == SIGKILL)) {<br>
                grab_lock();<br>
                 drm_sched_rq_remove_entity(entity->rq, entity);<br>
                if (READ_ONCE(&entity->last_user) != NULL)<br>
      </div>
    </blockquote>
    <br>
    This condition is true because just exactly now process A did
    drm_sched_entity_push_job->WRITE_ONCE(entity->last_user,
    current->group_leader);<br>
    and so the line bellow executed and entity reinserted into rq. Let's
    say also that the entity job queue is empty now. For process A bool
    'first' will be true<br>
    and hence also
    drm_sched_entity_push_job->drm_sched_rq_add_entity(entity->rq,
    entity) will take place causing double insertion of the entity queue
    into rq list.<br>
    <br>
    Andrey<br>
    <br>
    <blockquote type="cite"
      cite="mid:0fa473f5-155a-223e-fbb6-37147fd47a17@gmail.com">
      <div class="moz-cite-prefix">            
        drm_sched_rq_add_entity(entity->rq, entity);<br>
                drop_lock();<br>
            }<br>
         <br>
        Christian.<br>
        <br>
        Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:<br>
      </div>
      <blockquote type="cite"
        cite="mid:82109a00-aebf-1e5f-5346-eef541a361df@amd.com">
        <p>Attached. </p>
        <p>If the general idea in the patch is OK I can think of a test
          (and maybe add to libdrm amdgpu tests) to actually simulate
          this scenario with 2 forked</p>
        <p>concurrent processes working on same entity's job queue when
          one is dying while the other keeps pushing to the same queue.
          For now I only tested it</p>
        <p>with normal boot and ruining multiple glxgears concurrently -
          which doesn't really test this code path since i think each of
          them works on it's own FD.<br>
        </p>
        <p>Andrey<br>
        </p>
        <br>
        <div class="moz-cite-prefix">On 08/10/2018 09:27 AM, Christian
          König wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:5bf40a54-18f9-98fd-a3df-dd0b8da0a424@gmail.com">
          <div class="moz-cite-prefix">Crap, yeah indeed that needs to
            be protected by some lock.<br>
            <br>
            Going to prepare a patch for that,<br>
            Christian.<br>
            <br>
            Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:<br>
          </div>
          <blockquote type="cite"
            cite="mid:54621fc1-7246-f1bf-26bb-a16c4daf249f@amd.com">
            <p>Reviewed-by: Andrey Grodzovsky <a
                class="moz-txt-link-rfc2396E"
                href="mailto:andrey.grodzovsky@amd.com"
                moz-do-not-send="true"><andrey.grodzovsky@amd.com></a></p>
            <p><br>
            </p>
            <p>But I still  have questions about entity->last_user
              (didn't notice this before) - <br>
            </p>
            <p>Looks to me there is a race condition with it's current
              usage, let's say process A was preempted after doing
              drm_sched_entity_flush->cmpxchg(...)</p>
            <p>now process B working on same entity (forked) is inside
              drm_sched_entity_push_job, he writes his PID to
              entity->last_user and also</p>
            <p>executes drm_sched_rq_add_entity. Now process A runs
              again and execute drm_sched_rq_remove_entity inadvertently
              causing process B removal</p>
            <p>from it's scheduler rq.</p>
            <p>Looks to me like instead we should lock together
              entity->last_user accesses and adds/removals of entity
              to the rq.</p>
            <p>Andrey<br>
            </p>
            <br>
            <div class="moz-cite-prefix">On 08/06/2018 10:18 AM, Nayan
              Deshmukh wrote:<br>
            </div>
            <blockquote type="cite"
cite="mid:CAFd4ddzyvHPHepAgs=mjyWVj0WDV_pQbE9x7aHwNZ_zcME6fqQ@mail.gmail.com">
              <div dir="ltr">
                <div>
                  <div>I forgot about this since we started discussing
                    possible scenarios of processes and threads.<br>
                    <br>
                  </div>
                  In any case, this check is redundant. Acked-by: Nayan
                  Deshmukh <<a
                    href="mailto:nayan26deshmukh@gmail.com"
                    moz-do-not-send="true">nayan26deshmukh@gmail.com</a>><br>
                  <br>
                </div>
                Nayan<br>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr">On Mon, Aug 6, 2018 at 7:43 PM Christian
                  König <<a
                    href="mailto:ckoenig.leichtzumerken@gmail.com"
                    moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">Ping.
                  Any objections to that?<br>
                  <br>
                  Christian.<br>
                  <br>
                  Am 03.08.2018 um 13:08 schrieb Christian König:<br>
                  > That is superflous now.<br>
                  ><br>
                  > Signed-off-by: Christian König <<a
                    href="mailto:christian.koenig@amd.com"
                    target="_blank" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
                  > ---<br>
                  >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5
                  -----<br>
                  >   1 file changed, 5 deletions(-)<br>
                  ><br>
                  > diff --git
                  a/drivers/gpu/drm/scheduler/gpu_scheduler.c
                  b/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                  > index 85908c7f913e..65078dd3c82c 100644<br>
                  > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                  > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                  > @@ -590,11 +590,6 @@ void
                  drm_sched_entity_push_job(struct drm_sched_job
                  *sched_job,<br>
                  >       if (first) {<br>
                  >               /* Add the entity to the run queue
                  */<br>
                  >               spin_lock(&entity->rq_lock);<br>
                  > -             if (!entity->rq) {<br>
                  > -                     DRM_ERROR("Trying to push
                  to a killed entity\n");<br>
                  > -                   
                   spin_unlock(&entity->rq_lock);<br>
                  > -                     return;<br>
                  > -             }<br>
                  >             
                   drm_sched_rq_add_entity(entity->rq, entity);<br>
                  >             
                   spin_unlock(&entity->rq_lock);<br>
                  >             
                   drm_sched_wakeup(entity->rq->sched);<br>
                  <br>
                </blockquote>
              </div>
            </blockquote>
            <br>
          </blockquote>
          <br>
        </blockquote>
        <br>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <br>
        <pre wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a>
</pre>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>