[PATCH v3 1/5] drm/scheduler: rework job destruction
Grodzovsky, Andrey
Andrey.Grodzovsky at amd.com
Tue Apr 16 14:36:20 UTC 2019
On 4/16/19 5:47 AM, Christian König wrote:
> Am 15.04.19 um 23:17 schrieb Eric Anholt:
>> Andrey Grodzovsky <andrey.grodzovsky at amd.com> writes:
>>
>>> From: Christian König <christian.koenig at amd.com>
>>>
>>> We now destroy finished jobs from the worker thread to make sure that
>>> we never destroy a job currently in timeout processing.
>>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>>> which should solve a deadlock reported by a user.
>>>
>>> v2: Remove unused variable.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++--
>>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 -
>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +-
>>> drivers/gpu/drm/scheduler/sched_main.c | 138
>>> +++++++++++++++++------------
>>> drivers/gpu/drm/v3d/v3d_sched.c | 9 +-
>> Missing corresponding panfrost and lima updates. You should probably
>> pull in drm-misc for hacking on the scheduler.
>>
>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>> index ce7c737b..8efb091 100644
>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>> @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>> struct drm_sched_job *sched_job)
>>> /* block scheduler */
>>> for (q = 0; q < V3D_MAX_QUEUES; q++)
>>> - drm_sched_stop(&v3d->queue[q].sched);
>>> + drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>> if(sched_job)
>>> drm_sched_increase_karma(sched_job);
>>> + /*
>>> + * Guilty job did complete and hence needs to be manually removed
>>> + * See drm_sched_stop doc.
>>> + */
>>> + if (list_empty(&sched_job->node))
>>> + sched_job->sched->ops->free_job(sched_job);
>> If the if (sched_job) is necessary up above, then this should clearly be
>> under it.
>>
>> But, can we please have a core scheduler thing we call here instead of
>> drivers all replicating it?
>
> Yeah that's also something I noted before.
>
> Essential problem is that we remove finished jobs from the mirror list
> and so need to destruct them because we otherwise leak them.
>
> Alternative approach here would be to keep the jobs on the ring mirror
> list, but not submit them again.
>
> Regards,
> Christian.
I really prefer to avoid this, it means adding extra flag to sched_job
to check in each iteration of the ring mirror list. What about changing
signature of drm_sched_backend_ops.timedout_job to return drm_sched_job*
instead of void, this way we can return the guilty job back from the
driver specific handler to the generic drm_sched_job_timedout and
release it there.
Andrey
>
>>
>>> +
>>> /* get the GPU back into the init state */
>>> v3d_reset(v3d);
>
More information about the amd-gfx
mailing list