[PATCH] drm/scheduler: Remove entity->rq NULL check

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Mon Aug 13 16:43:25 UTC 2018


Attached.

If the general idea in the patch is OK I can think of a test (and maybe 
add to libdrm amdgpu tests) to actually simulate this scenario with 2 forked

concurrent processes working on same entity's job queue when one is 
dying while the other keeps pushing to the same queue. For now I only 
tested it

with normal boot and ruining multiple glxgears concurrently - which 
doesn't really test this code path since i think each of them works on 
it's own FD.

Andrey


On 08/10/2018 09:27 AM, Christian König wrote:
> Crap, yeah indeed that needs to be protected by some lock.
>
> Going to prepare a patch for that,
> Christian.
>
> Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:
>>
>> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>
>>
>> But I still  have questions about entity->last_user (didn't notice 
>> this before) -
>>
>> Looks to me there is a race condition with it's current usage, let's 
>> say process A was preempted after doing 
>> drm_sched_entity_flush->cmpxchg(...)
>>
>> now process B working on same entity (forked) is inside 
>> drm_sched_entity_push_job, he writes his PID to entity->last_user and 
>> also
>>
>> executes drm_sched_rq_add_entity. Now process A runs again and 
>> execute drm_sched_rq_remove_entity inadvertently causing process B 
>> removal
>>
>> from it's scheduler rq.
>>
>> Looks to me like instead we should lock together entity->last_user 
>> accesses and adds/removals of entity to the rq.
>>
>> Andrey
>>
>>
>> On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
>>> I forgot about this since we started discussing possible scenarios 
>>> of processes and threads.
>>>
>>> In any case, this check is redundant. Acked-by: Nayan Deshmukh 
>>> <nayan26deshmukh at gmail.com <mailto:nayan26deshmukh at gmail.com>>
>>>
>>> Nayan
>>>
>>> On Mon, Aug 6, 2018 at 7:43 PM Christian König 
>>> <ckoenig.leichtzumerken at gmail.com 
>>> <mailto:ckoenig.leichtzumerken at gmail.com>> wrote:
>>>
>>>     Ping. Any objections to that?
>>>
>>>     Christian.
>>>
>>>     Am 03.08.2018 um 13:08 schrieb Christian König:
>>>     > That is superflous now.
>>>     >
>>>     > Signed-off-by: Christian König <christian.koenig at amd.com
>>>     <mailto:christian.koenig at amd.com>>
>>>     > ---
>>>     >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -----
>>>     >   1 file changed, 5 deletions(-)
>>>     >
>>>     > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>     b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>     > index 85908c7f913e..65078dd3c82c 100644
>>>     > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>     > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>     > @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
>>>     drm_sched_job *sched_job,
>>>     >       if (first) {
>>>     >               /* Add the entity to the run queue */
>>>     >               spin_lock(&entity->rq_lock);
>>>     > -             if (!entity->rq) {
>>>     > -                     DRM_ERROR("Trying to push to a killed
>>>     entity\n");
>>>     > -  spin_unlock(&entity->rq_lock);
>>>     > -                     return;
>>>     > -             }
>>>     >               drm_sched_rq_add_entity(entity->rq, entity);
>>>     >               spin_unlock(&entity->rq_lock);
>>>     >  drm_sched_wakeup(entity->rq->sched);
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180813/48cdd878/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-scheduler-Fix-possible-race-condition.patch
Type: text/x-patch
Size: 2701 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180813/48cdd878/attachment-0001.bin>


More information about the dri-devel mailing list