[PATCH] drm/scheduler: Remove entity->rq NULL check

Tue Aug 14 07:05:19 UTC 2018

I would rather like to avoid taking the lock in the hot path.

How about this:

      /* For killed process disable any more IBs enqueue right now */
     last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
      if ((!last_user || last_user == current->group_leader) &&
          (current->flags & PF_EXITING) && (current->exit_code == 
SIGKILL)) {
         grab_lock();
          drm_sched_rq_remove_entity(entity->rq, entity);
         if (READ_ONCE(&entity->last_user) != NULL)
             drm_sched_rq_add_entity(entity->rq, entity);
         drop_lock();
     }

Christian.

Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:
>
> Attached.
>
> If the general idea in the patch is OK I can think of a test (and 
> maybe add to libdrm amdgpu tests) to actually simulate this scenario 
> with 2 forked
>
> concurrent processes working on same entity's job queue when one is 
> dying while the other keeps pushing to the same queue. For now I only 
> tested it
>
> with normal boot and ruining multiple glxgears concurrently - which 
> doesn't really test this code path since i think each of them works on 
> it's own FD.
>
> Andrey
>
>
> On 08/10/2018 09:27 AM, Christian König wrote:
>> Crap, yeah indeed that needs to be protected by some lock.
>>
>> Going to prepare a patch for that,
>> Christian.
>>
>> Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:
>>>
>>> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>
>>>
>>> But I still  have questions about entity->last_user (didn't notice 
>>> this before) -
>>>
>>> Looks to me there is a race condition with it's current usage, let's 
>>> say process A was preempted after doing 
>>> drm_sched_entity_flush->cmpxchg(...)
>>>
>>> now process B working on same entity (forked) is inside 
>>> drm_sched_entity_push_job, he writes his PID to entity->last_user 
>>> and also
>>>
>>> executes drm_sched_rq_add_entity. Now process A runs again and 
>>> execute drm_sched_rq_remove_entity inadvertently causing process B 
>>> removal
>>>
>>> from it's scheduler rq.
>>>
>>> Looks to me like instead we should lock together entity->last_user 
>>> accesses and adds/removals of entity to the rq.
>>>
>>> Andrey
>>>
>>>
>>> On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
>>>> I forgot about this since we started discussing possible scenarios 
>>>> of processes and threads.
>>>>
>>>> In any case, this check is redundant. Acked-by: Nayan Deshmukh 
>>>> <nayan26deshmukh at gmail.com <mailto:nayan26deshmukh at gmail.com>>
>>>>
>>>> Nayan
>>>>
>>>> On Mon, Aug 6, 2018 at 7:43 PM Christian König 
>>>> <ckoenig.leichtzumerken at gmail.com 
>>>> <mailto:ckoenig.leichtzumerken at gmail.com>> wrote:
>>>>
>>>>     Ping. Any objections to that?
>>>>
>>>>     Christian.
>>>>
>>>>     Am 03.08.2018 um 13:08 schrieb Christian König:
>>>>     > That is superflous now.
>>>>     >
>>>>     > Signed-off-by: Christian König <christian.koenig at amd.com
>>>>     <mailto:christian.koenig at amd.com>>
>>>>     > ---
>>>>     >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -----
>>>>     >   1 file changed, 5 deletions(-)
>>>>     >
>>>>     > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>     b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>     > index 85908c7f913e..65078dd3c82c 100644
>>>>     > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>     > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>     > @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
>>>>     drm_sched_job *sched_job,
>>>>     >       if (first) {
>>>>     >               /* Add the entity to the run queue */
>>>>     >               spin_lock(&entity->rq_lock);
>>>>     > -             if (!entity->rq) {
>>>>     > -                     DRM_ERROR("Trying to push to a killed
>>>>     entity\n");
>>>>     > -  spin_unlock(&entity->rq_lock);
>>>>     > -                     return;
>>>>     > -             }
>>>>     >  drm_sched_rq_add_entity(entity->rq, entity);
>>>>     >               spin_unlock(&entity->rq_lock);
>>>>     >  drm_sched_wakeup(entity->rq->sched);
>>>>
>>>
>>
>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180814/0c7e0dbf/attachment.html>