[PATCH 2/2] drm/scheduler: stop setting rq to NULL

Fri Aug 3 14:06:40 UTC 2018

Am 02.08.2018 um 16:11 schrieb Andrey Grodzovsky:
>
>
>
> On 08/02/2018 02:47 AM, Nayan Deshmukh wrote:
>>
>>
>> On Thu, Aug 2, 2018 at 12:12 PM Christian König 
>> <ckoenig.leichtzumerken at gmail.com 
>> <mailto:ckoenig.leichtzumerken at gmail.com>> wrote:
>>
>>     Am 02.08.2018 um 00:25 schrieb Andrey Grodzovsky:
>>     > Thinking again about this change and 53d5f1e drm/scheduler:
>>     move idle
>>     > entities to scheduler with less load v2
>>     >
>>     > Looks to me like use case for which fc9a539 drm/scheduler: add
>>     NULL
>>     > pointer check for run queue (v2) was done
>>     >
>>     > will not work anymore.
>>     >
>>     > First of all in drm_sched_entity_push_job, 'if (!entity->rq)' will
>>     > never be true any more since we stopped entity->rq to NULL in
>>     >
>>     > drm_sched_entity_flush.
>>
>>     Good point, going to remove that.
>>
>
> So basically we don't need that if (...){ return; } in 
> drm_sched_entity_push_job any more ?

Yes, exactly.

>
>>
>>     >
>>     > Second of all, even after we removed the entity from rq in
>>     > drm_sched_entity_flush to terminate any subsequent submissions
>>     >
>>     > to the queue the other thread doing push job can just acquire
>>     again a
>>     > run queue
>>     >
>>     > from drm_sched_entity_get_free_sched and continue submission
>>
>> Hi Christian
>>
>>     That is actually desired.
>>
>>     When another process is now using the entity to submit jobs
>>     adding it
>>     back to the rq is actually the right thing to do cause the entity is
>>     still in use.
>>
>
> Yes, no problem if it's another process. But what about another thread 
> from same process ? Is it a possible use case that 2 threads from same 
> process submit to same entity concurrently ? If so and we specifically 
> kill one, the other will not stop event if we want him to because 
> current code makes him just require a rq for him self.

Well you can't kill a single thread of a process (you can only interrupt 
it), a SIGKILL will always kill the whole process.

>
>> I am not aware of the usecase so I might just be rambling. but if the 
>> thread/process that created the entity has called 
>> drm_sched_entity_flush then we shouldn't allow other processes to 
>> push jobs to that entity.
>>
>> Nayan
>>
>>
>>     Christian.
>>
> We don't really know who created the entity, the creation could be by 
> X itself and then it's passed to other process for use. Check 
> 'drm/scheduler: only kill entity if last user is killed v2', the idea 
> is that if by the time you want to
> kill this entity another process (not another thread from your process 
> - i.e. current->group_leader != last_user in drm_sched_entity_flush) 
> already started to use this entity just let it be.

Yes, exactly that's the idea.

Christian.

>
> Andrey
>>
>>
>>     > so you can't substitute ' if (!entity->rq)' to 'if
>>     > (list_empty(&entity->list))'.
>>     >
>>     > What about adding a 'stopped' flag to drm_sched_entity to be
>>     set in
>>     > drm_sched_entity_flush and in
>>     >
>>     > drm_sched_entity_push_job check for  'if (!entity->stopped)'
>>     instead
>>     > of  ' if (!entity->rq)' ?
>>     >
>>     > Andrey
>>     >
>>     >
>>     > On 07/30/2018 07:03 AM, Christian König wrote:
>>     >> We removed the redundancy of having an extra scheduler field,
>>     so we
>>     >> can't set the rq to NULL any more or otherwise won't know which
>>     >> scheduler to use for the cleanup.
>>     >>
>>     >> Just remove the entity from the scheduling list instead.
>>     >>
>>     >> Signed-off-by: Christian König <christian.koenig at amd.com
>>     <mailto:christian.koenig at amd.com>>
>>     >> ---
>>     >>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 35
>>     >> +++++++------------------------
>>     >>   1 file changed, 8 insertions(+), 27 deletions(-)
>>     >>
>>     >> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>     >> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>     >> index f563e4fbb4b6..1b733229201e 100644
>>     >> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>     >> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>     >> @@ -198,21 +198,6 @@ int drm_sched_entity_init(struct
>>     >> drm_sched_entity *entity,
>>     >>   }
>>     >>   EXPORT_SYMBOL(drm_sched_entity_init);
>>     >>   -/**
>>     >> - * drm_sched_entity_is_initialized - Query if entity is
>>     initialized
>>     >> - *
>>     >> - * @sched: Pointer to scheduler instance
>>     >> - * @entity: The pointer to a valid scheduler entity
>>     >> - *
>>     >> - * return true if entity is initialized, false otherwise
>>     >> -*/
>>     >> -static bool drm_sched_entity_is_initialized(struct
>>     drm_gpu_scheduler
>>     >> *sched,
>>     >> -                        struct drm_sched_entity *entity)
>>     >> -{
>>     >> -    return entity->rq != NULL &&
>>     >> -        entity->rq->sched == sched;
>>     >> -}
>>     >> -
>>     >>   /**
>>     >>    * drm_sched_entity_is_idle - Check if entity is idle
>>     >>    *
>>     >> @@ -224,7 +209,8 @@ static bool drm_sched_entity_is_idle(struct
>>     >> drm_sched_entity *entity)
>>     >>   {
>>     >>       rmb();
>>     >>   -    if (!entity->rq || spsc_queue_peek(&entity->job_queue)
>>     == NULL)
>>     >> +    if (list_empty(&entity->list) ||
>>     >> + spsc_queue_peek(&entity->job_queue) == NULL)
>>     >>           return true;
>>     >>         return false;
>>     >> @@ -279,8 +265,6 @@ long drm_sched_entity_flush(struct
>>     >> drm_sched_entity *entity, long timeout)
>>     >>       long ret = timeout;
>>     >>         sched = entity->rq->sched;
>>     >> -    if (!drm_sched_entity_is_initialized(sched, entity))
>>     >> -        return ret;
>>     >>       /**
>>     >>        * The client will not queue more IBs during this fini,
>>     consume
>>     >> existing
>>     >>        * queued IBs or discard them on SIGKILL
>>     >> @@ -299,7 +283,7 @@ long drm_sched_entity_flush(struct
>>     >> drm_sched_entity *entity, long timeout)
>>     >>       last_user = cmpxchg(&entity->last_user,
>>     current->group_leader,
>>     >> NULL);
>>     >>       if ((!last_user || last_user == current->group_leader) &&
>>     >>           (current->flags & PF_EXITING) && (current->exit_code ==
>>     >> SIGKILL))
>>     >> -        drm_sched_entity_set_rq(entity, NULL);
>>     >> + drm_sched_rq_remove_entity(entity->rq, entity);
>>     >>         return ret;
>>     >>   }
>>     >> @@ -320,7 +304,7 @@ void drm_sched_entity_fini(struct
>>     >> drm_sched_entity *entity)
>>     >>       struct drm_gpu_scheduler *sched;
>>     >>         sched = entity->rq->sched;
>>     >> -    drm_sched_entity_set_rq(entity, NULL);
>>     >> +    drm_sched_rq_remove_entity(entity->rq, entity);
>>     >>         /* Consumption of existing IBs wasn't completed.
>>     Forcefully
>>     >>        * remove them here.
>>     >> @@ -416,15 +400,12 @@ void drm_sched_entity_set_rq(struct
>>     >> drm_sched_entity *entity,
>>     >>       if (entity->rq == rq)
>>     >>           return;
>>     >>   -    spin_lock(&entity->rq_lock);
>>     >> -
>>     >> -    if (entity->rq)
>>     >> - drm_sched_rq_remove_entity(entity->rq, entity);
>>     >> +    BUG_ON(!rq);
>>     >>   +    spin_lock(&entity->rq_lock);
>>     >> +    drm_sched_rq_remove_entity(entity->rq, entity);
>>     >>       entity->rq = rq;
>>     >> -    if (rq)
>>     >> -        drm_sched_rq_add_entity(rq, entity);
>>     >> -
>>     >> +    drm_sched_rq_add_entity(rq, entity);
>>     >>       spin_unlock(&entity->rq_lock);
>>     >>   }
>>     >>   EXPORT_SYMBOL(drm_sched_entity_set_rq);
>>     >
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180803/03e9debc/attachment-0001.html>