<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 08/03/2018 10:06 AM, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:0a7c28f9-e60e-74de-1c48-0089935bc4f7@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">Am 02.08.2018 um 16:11 schrieb Andrey
        Grodzovsky:<br>
      </div>
      <blockquote type="cite"
        cite="mid:5d3ec8da-db2e-0aa2-0a7c-a55f27b27a04@amd.com">
        <p><br>
        </p>
        <br>
        <div class="moz-cite-prefix">On 08/02/2018 02:47 AM, Nayan
          Deshmukh wrote:<br>
        </div>
        <blockquote type="cite"
cite="mid:CAFd4ddwKJvfrTu5uAub1K517DhdcOZbZ8k=gyzPy6GVqx_PhtA@mail.gmail.com">
          <div dir="ltr"><br>
            <br>
            <div class="gmail_quote">
              <div dir="ltr">On Thu, Aug 2, 2018 at 12:12 PM Christian
                König <<a
                  href="mailto:ckoenig.leichtzumerken@gmail.com"
                  moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>>
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">Am
                02.08.2018 um 00:25 schrieb Andrey Grodzovsky:<br>
                > Thinking again about this change and 53d5f1e
                drm/scheduler: move idle <br>
                > entities to scheduler with less load v2<br>
                ><br>
                > Looks to me like use case for which fc9a539
                drm/scheduler: add NULL <br>
                > pointer check for run queue (v2) was done<br>
                ><br>
                > will not work anymore.<br>
                ><br>
                > First of all in drm_sched_entity_push_job, 'if
                (!entity->rq)' will <br>
                > never be true any more since we stopped
                entity->rq to NULL in<br>
                ><br>
                > drm_sched_entity_flush.<br>
                <br>
                Good point, going to remove that.<br>
              </blockquote>
            </div>
          </div>
        </blockquote>
        <br>
        So basically we don't need that if (...){ return; } in
        drm_sched_entity_push_job any more ?<br>
      </blockquote>
      <br>
      Yes, exactly.<br>
      <br>
      <blockquote type="cite"
        cite="mid:5d3ec8da-db2e-0aa2-0a7c-a55f27b27a04@amd.com"> <br>
        <blockquote type="cite"
cite="mid:CAFd4ddwKJvfrTu5uAub1K517DhdcOZbZ8k=gyzPy6GVqx_PhtA@mail.gmail.com">
          <div dir="ltr">
            <div class="gmail_quote">
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
                ><br>
                > Second of all, even after we removed the entity
                from rq in <br>
                > drm_sched_entity_flush to terminate any subsequent
                submissions<br>
                ><br>
                > to the queue the other thread doing push job can
                just acquire again a <br>
                > run queue<br>
                ><br>
                > from drm_sched_entity_get_free_sched and continue
                submission<br>
                <br>
              </blockquote>
              <div>Hi Christian <br>
              </div>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex"> That
                is actually desired.<br>
                <br>
                When another process is now using the entity to submit
                jobs adding it <br>
                back to the rq is actually the right thing to do cause
                the entity is <br>
                still in use.<br>
              </blockquote>
            </div>
          </div>
        </blockquote>
        <br>
        Yes, no problem if it's another process. But what about another
        thread from same process ? Is it a possible use case that 2
        threads from same process submit to same entity concurrently ?
        If so and we specifically kill one, the other will not stop
        event if we want him to because current code makes him just
        require a rq for him self.<br>
      </blockquote>
      <br>
      Well you can't kill a single thread of a process (you can only
      interrupt it), a SIGKILL will always kill the whole process.<br>
    </blockquote>
    <br>
    Is the following scenario possible and acceptable ? <br>
    2 threads from same process working on same queue where thread A
    currently in drm_sched_entity_flush->wait_event_timeout (the
    process getting shut down because of SIGKILL sent by user)<br>
    while thread B still inside drm_sched_entity_push_job before 'if
    (reschedule)'. 'A' stopped waiting because queue became empty and
    then removes the entity queue from scheduler's run queue while<br>
    B goes inside 'reschedule' because it evaluates to true ('first' is
    true and all the rest of the conditions), acquires new rq, and later
    adds it back to scheduler (different one maybe) and keeps submitting
    jobs as much as he likes and then can be stack for up to 'timeout'
    time  in his drm_sched_entity_flush waiting for them.<br>
    <br>
    My understanding was that introduction of entity->last is to
    force immediate termination job submissions by any thread from the
    terminating process.<br>
    <br>
    Andrey<br>
    <br>
    <blockquote type="cite"
      cite="mid:0a7c28f9-e60e-74de-1c48-0089935bc4f7@amd.com"> <br>
      <blockquote type="cite"
        cite="mid:5d3ec8da-db2e-0aa2-0a7c-a55f27b27a04@amd.com"> <br>
        <blockquote type="cite"
cite="mid:CAFd4ddwKJvfrTu5uAub1K517DhdcOZbZ8k=gyzPy6GVqx_PhtA@mail.gmail.com">
          <div dir="ltr">
            <div class="gmail_quote">
              <div>I am not aware of the usecase so I might just be
                rambling. but if the thread/process that created the
                entity has called drm_sched_entity_flush then we
                shouldn't allow other processes to push jobs to that
                entity. <br>
                <br>
              </div>
              <div>Nayan  <br>
              </div>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
                Christian.<br>
              </blockquote>
            </div>
          </div>
        </blockquote>
        We don't really know who created the entity, the creation could
        be by X itself and then it's passed to other process for use.
        Check 'drm/scheduler: only kill entity if last user is killed
        v2', the idea is that if by the time you want to<br>
        kill this entity another process (not another thread from your
        process - i.e. current->group_leader != last_user in
        drm_sched_entity_flush) already started to use this entity just
        let it be.<br>
      </blockquote>
      <br>
      Yes, exactly that's the idea.<br>
      <br>
      Christian.<br>
      <br>
      <blockquote type="cite"
        cite="mid:5d3ec8da-db2e-0aa2-0a7c-a55f27b27a04@amd.com"> <br>
        Andrey<br>
        <blockquote type="cite"
cite="mid:CAFd4ddwKJvfrTu5uAub1K517DhdcOZbZ8k=gyzPy6GVqx_PhtA@mail.gmail.com">
          <div dir="ltr">
            <div class="gmail_quote">
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
                > so you can't substitute ' if (!entity->rq)' to
                'if <br>
                > (list_empty(&entity->list))'.<br>
                ><br>
                > What about adding a 'stopped' flag to
                drm_sched_entity to be set in <br>
                > drm_sched_entity_flush and in<br>
                ><br>
                > drm_sched_entity_push_job check for  'if
                (!entity->stopped)' instead <br>
                > of  ' if (!entity->rq)' ?<br>
                ><br>
                > Andrey<br>
                ><br>
                ><br>
                > On 07/30/2018 07:03 AM, Christian König wrote:<br>
                >> We removed the redundancy of having an extra
                scheduler field, so we<br>
                >> can't set the rq to NULL any more or otherwise
                won't know which<br>
                >> scheduler to use for the cleanup.<br>
                >><br>
                >> Just remove the entity from the scheduling list
                instead.<br>
                >><br>
                >> Signed-off-by: Christian König <<a
                  href="mailto:christian.koenig@amd.com" target="_blank"
                  moz-do-not-send="true">christian.koenig@amd.com</a>><br>
                >> ---<br>
                >>   drivers/gpu/drm/scheduler/gpu_scheduler.c |
                35 <br>
                >> +++++++------------------------<br>
                >>   1 file changed, 8 insertions(+), 27
                deletions(-)<br>
                >><br>
                >> diff --git
                a/drivers/gpu/drm/scheduler/gpu_scheduler.c <br>
                >> b/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                >> index f563e4fbb4b6..1b733229201e 100644<br>
                >> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                >> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c<br>
                >> @@ -198,21 +198,6 @@ int
                drm_sched_entity_init(struct <br>
                >> drm_sched_entity *entity,<br>
                >>   }<br>
                >>   EXPORT_SYMBOL(drm_sched_entity_init);<br>
                >>   -/**<br>
                >> - * drm_sched_entity_is_initialized - Query if
                entity is initialized<br>
                >> - *<br>
                >> - * @sched: Pointer to scheduler instance<br>
                >> - * @entity: The pointer to a valid scheduler
                entity<br>
                >> - *<br>
                >> - * return true if entity is initialized, false
                otherwise<br>
                >> -*/<br>
                >> -static bool
                drm_sched_entity_is_initialized(struct drm_gpu_scheduler
                <br>
                >> *sched,<br>
                >> -                        struct
                drm_sched_entity *entity)<br>
                >> -{<br>
                >> -    return entity->rq != NULL &&<br>
                >> -        entity->rq->sched == sched;<br>
                >> -}<br>
                >> -<br>
                >>   /**<br>
                >>    * drm_sched_entity_is_idle - Check if entity
                is idle<br>
                >>    *<br>
                >> @@ -224,7 +209,8 @@ static bool
                drm_sched_entity_is_idle(struct <br>
                >> drm_sched_entity *entity)<br>
                >>   {<br>
                >>       rmb();<br>
                >>   -    if (!entity->rq ||
                spsc_queue_peek(&entity->job_queue) == NULL)<br>
                >> +    if (list_empty(&entity->list) ||<br>
                >> +       
                spsc_queue_peek(&entity->job_queue) == NULL)<br>
                >>           return true;<br>
                >>         return false;<br>
                >> @@ -279,8 +265,6 @@ long
                drm_sched_entity_flush(struct <br>
                >> drm_sched_entity *entity, long timeout)<br>
                >>       long ret = timeout;<br>
                >>         sched = entity->rq->sched;<br>
                >> -    if
                (!drm_sched_entity_is_initialized(sched, entity))<br>
                >> -        return ret;<br>
                >>       /**<br>
                >>        * The client will not queue more IBs
                during this fini, consume <br>
                >> existing<br>
                >>        * queued IBs or discard them on SIGKILL<br>
                >> @@ -299,7 +283,7 @@ long
                drm_sched_entity_flush(struct <br>
                >> drm_sched_entity *entity, long timeout)<br>
                >>       last_user =
                cmpxchg(&entity->last_user,
                current->group_leader, <br>
                >> NULL);<br>
                >>       if ((!last_user || last_user ==
                current->group_leader) &&<br>
                >>           (current->flags & PF_EXITING)
                && (current->exit_code == <br>
                >> SIGKILL))<br>
                >> -        drm_sched_entity_set_rq(entity, NULL);<br>
                >> +       
                drm_sched_rq_remove_entity(entity->rq, entity);<br>
                >>         return ret;<br>
                >>   }<br>
                >> @@ -320,7 +304,7 @@ void
                drm_sched_entity_fini(struct <br>
                >> drm_sched_entity *entity)<br>
                >>       struct drm_gpu_scheduler *sched;<br>
                >>         sched = entity->rq->sched;<br>
                >> -    drm_sched_entity_set_rq(entity, NULL);<br>
                >> +    drm_sched_rq_remove_entity(entity->rq,
                entity);<br>
                >>         /* Consumption of existing IBs wasn't
                completed. Forcefully<br>
                >>        * remove them here.<br>
                >> @@ -416,15 +400,12 @@ void
                drm_sched_entity_set_rq(struct <br>
                >> drm_sched_entity *entity,<br>
                >>       if (entity->rq == rq)<br>
                >>           return;<br>
                >>   -    spin_lock(&entity->rq_lock);<br>
                >> -<br>
                >> -    if (entity->rq)<br>
                >> -       
                drm_sched_rq_remove_entity(entity->rq, entity);<br>
                >> +    BUG_ON(!rq);<br>
                >>   +    spin_lock(&entity->rq_lock);<br>
                >> +    drm_sched_rq_remove_entity(entity->rq,
                entity);<br>
                >>       entity->rq = rq;<br>
                >> -    if (rq)<br>
                >> -        drm_sched_rq_add_entity(rq, entity);<br>
                >> -<br>
                >> +    drm_sched_rq_add_entity(rq, entity);<br>
                >>       spin_unlock(&entity->rq_lock);<br>
                >>   }<br>
                >>   EXPORT_SYMBOL(drm_sched_entity_set_rq);<br>
                ><br>
                <br>
              </blockquote>
            </div>
          </div>
        </blockquote>
        <br>
      </blockquote>
      <br>
    </blockquote>
    <br>
  </body>
</html>