<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">I would rather like to avoid taking the
      lock in the hot path.<br>
      <br>
      How about this:<br>
      <br>
           /* For killed process disable any more IBs enqueue right now
      */<br>
          last_user = cmpxchg(&entity->last_user,
      current->group_leader, NULL);<br>
           if ((!last_user || last_user == current->group_leader)
      &&<br>
               (current->flags & PF_EXITING) &&
      (current->exit_code == SIGKILL)) {<br>
              grab_lock();<br>
               drm_sched_rq_remove_entity(entity->rq, entity);<br>
              if (READ_ONCE(&entity->last_user) != NULL)<br>
                  drm_sched_rq_add_entity(entity->rq, entity);<br>
              drop_lock();<br>
          }<br>
       <br>
      Christian.<br>
      <br>
      Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:<br>
    </div>
    <blockquote type="cite"
      cite="mid:82109a00-aebf-1e5f-5346-eef541a361df@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <p>Attached. </p>
      <p>If the general idea in the patch is OK I can think of a test
        (and maybe add to libdrm amdgpu tests) to actually simulate this
        scenario with 2 forked</p>
      <p>concurrent processes working on same entity's job queue when
        one is dying while the other keeps pushing to the same queue.
        For now I only tested it</p>
      <p>with normal boot and ruining multiple glxgears concurrently -
        which doesn't really test this code path since i think each of
        them works on it's own FD.<br>
      </p>
      <p>Andrey<br>
      </p>
      <br>
      <div class="moz-cite-prefix">On 08/10/2018 09:27 AM, Christian
        König wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:5bf40a54-18f9-98fd-a3df-dd0b8da0a424@gmail.com">
        <meta http-equiv="Content-Type" content="text/html;
          charset=utf-8">
        <div class="moz-cite-prefix">Crap, yeah indeed that needs to be
          protected by some lock.<br>
          <br>
          Going to prepare a patch for that,<br>
          Christian.<br>
          <br>
          Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:<br>
        </div>
        <blockquote type="cite"
          cite="mid:54621fc1-7246-f1bf-26bb-a16c4daf249f@amd.com">
          <p>Reviewed-by: Andrey Grodzovsky <a
              class="moz-txt-link-rfc2396E"
              href="mailto:andrey.grodzovsky@amd.com"
              moz-do-not-send="true"><andrey.grodzovsky@amd.com></a></p>
          <p><br>
          </p>
          <p>But I still  have questions about entity->last_user
            (didn't notice this before) - <br>
          </p>
          <p>Looks to me there is a race condition with it's current
            usage, let's say process A was preempted after doing
            drm_sched_entity_flush->cmpxchg(...)</p>
          <p>now process B working on same entity (forked) is inside
            drm_sched_entity_push_job, he writes his PID to
            entity->last_user and also</p>
          <p>executes drm_sched_rq_add_entity. Now process A runs again
            and execute drm_sched_rq_remove_entity inadvertently causing
            process B removal</p>
          <p>from it's scheduler rq.</p>
          <p>Looks to me like instead we should lock together
            entity->last_user accesses and adds/removals of entity to
            the rq.</p>
          <p>Andrey<br>
          </p>
          <br>
          <div class="moz-cite-prefix">On 08/06/2018 10:18 AM, Nayan
            Deshmukh wrote:<br>
          </div>
          <blockquote type="cite"
cite="mid:CAFd4ddzyvHPHepAgs=mjyWVj0WDV_pQbE9x7aHwNZ_zcME6fqQ@mail.gmail.com">
            <div dir="ltr">
              <div>
                <div>I forgot about this since we started discussing
                  possible scenarios of processes and threads.<br>
                  <br>
                </div>
                In any case, this check is redundant. Acked-by: Nayan
                Deshmukh <<a href="mailto:nayan26deshmukh@gmail.com"
                  moz-do-not-send="true">nayan26deshmukh@gmail.com</a>><br>
                <br>
              </div>
              Nayan<br>
            </div>
            <br>
            <div class="gmail_quote">
              <div dir="ltr">On Mon, Aug 6, 2018 at 7:43 PM Christian
                König <<a
                  href="mailto:ckoenig.leichtzumerken@gmail.com"
                  moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>>
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">Ping.
                Any objections to that?<br>
                <br>
                Christian.<br>
                <br>
                Am 03.08.2018 um 13:08 schrieb Christian König:<br>
                > That is superflous now.<br>
                ><br>
                > Signed-off-by: Christian König <<a
                  href="mailto:christian.koenig@amd.com" target="_blank"
                  moz-do-not-send="true">christian.koenig@amd.com</a>><br>
                > ---<br>
                >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5
                -----<br>
                >   1 file changed, 5 deletions(-)<br>
                ><br>
                > diff --git
                a/drivers/gpu/drm/scheduler/gpu_scheduler.c
                b/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                > index 85908c7f913e..65078dd3c82c 100644<br>
                > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                > @@ -590,11 +590,6 @@ void
                drm_sched_entity_push_job(struct drm_sched_job
                *sched_job,<br>
                >       if (first) {<br>
                >               /* Add the entity to the run queue */<br>
                >               spin_lock(&entity->rq_lock);<br>
                > -             if (!entity->rq) {<br>
                > -                     DRM_ERROR("Trying to push to
                a killed entity\n");<br>
                > -                   
                 spin_unlock(&entity->rq_lock);<br>
                > -                     return;<br>
                > -             }<br>
                >             
                 drm_sched_rq_add_entity(entity->rq, entity);<br>
                >               spin_unlock(&entity->rq_lock);<br>
                >             
                 drm_sched_wakeup(entity->rq->sched);<br>
                <br>
              </blockquote>
            </div>
          </blockquote>
          <br>
        </blockquote>
        <br>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
dri-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:dri-devel@lists.freedesktop.org">dri-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/dri-devel">https://lists.freedesktop.org/mailman/listinfo/dri-devel</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>