[PATCH] drm/scheduler: fix race condition in load balancer
Nirmoy
nirmodas at amd.com
Tue Jan 14 16:20:47 UTC 2020
On 1/14/20 5:01 PM, Christian König wrote:
> Am 14.01.20 um 16:43 schrieb Nirmoy Das:
>> Jobs submitted in an entity should execute in the order those jobs
>> are submitted. We make sure that by checking entity->job_queue in
>> drm_sched_entity_select_rq() so that we don't loadbalance jobs within
>> an entity.
>>
>> But because we update entity->job_queue later in
>> drm_sched_entity_push_job(),
>> there remains a open window when it is possibe that entity->rq might get
>> updated by drm_sched_entity_select_rq() which should not be allowed.
>
> NAK, concurrent calls to
> drm_sched_job_init()/drm_sched_entity_push_job() are not allowed in
> the first place or otherwise we mess up the fence sequence order and
> risk memory corruption.
if I am not missing something, I don't see any lock securing
drm_sched_job_init()/drm_sched_entity_push_job() calls in
amdgpu_cs_submit().
Regards,
Nirmoy
More information about the amd-gfx
mailing list