[PATCH v3 1/5] drm/scheduler: rework job destruction
Koenig, Christian
Christian.Koenig at amd.com
Tue Apr 16 16:00:00 UTC 2019
Am 16.04.19 um 17:42 schrieb Grodzovsky, Andrey:
> On 4/16/19 10:58 AM, Grodzovsky, Andrey wrote:
>> On 4/16/19 10:43 AM, Koenig, Christian wrote:
>>> Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>>> On 4/16/19 5:47 AM, Christian König wrote:
>>>>> Am 15.04.19 um 23:17 schrieb Eric Anholt:
>>>>>> Andrey Grodzovsky <andrey.grodzovsky at amd.com> writes:
>>>>>>
>>>>>>> From: Christian König <christian.koenig at amd.com>
>>>>>>>
>>>>>>> We now destroy finished jobs from the worker thread to make sure that
>>>>>>> we never destroy a job currently in timeout processing.
>>>>>>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>>>>>>> which should solve a deadlock reported by a user.
>>>>>>>
>>>>>>> v2: Remove unused variable.
>>>>>>>
>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++--
>>>>>>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 -
>>>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +-
>>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 138
>>>>>>> +++++++++++++++++------------
>>>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 9 +-
>>>>>> Missing corresponding panfrost and lima updates. You should probably
>>>>>> pull in drm-misc for hacking on the scheduler.
>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> index ce7c737b..8efb091 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>>>>> struct drm_sched_job *sched_job)
>>>>>>> /* block scheduler */
>>>>>>> for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>>>>> - drm_sched_stop(&v3d->queue[q].sched);
>>>>>>> + drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>>>> if(sched_job)
>>>>>>> drm_sched_increase_karma(sched_job);
>>>>>>> + /*
>>>>>>> + * Guilty job did complete and hence needs to be manually removed
>>>>>>> + * See drm_sched_stop doc.
>>>>>>> + */
>>>>>>> + if (list_empty(&sched_job->node))
>>>>>>> + sched_job->sched->ops->free_job(sched_job);
>>>>>> If the if (sched_job) is necessary up above, then this should clearly be
>>>>>> under it.
>>>>>>
>>>>>> But, can we please have a core scheduler thing we call here instead of
>>>>>> drivers all replicating it?
>>>>> Yeah that's also something I noted before.
>>>>>
>>>>> Essential problem is that we remove finished jobs from the mirror list
>>>>> and so need to destruct them because we otherwise leak them.
>>>>>
>>>>> Alternative approach here would be to keep the jobs on the ring mirror
>>>>> list, but not submit them again.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>> I really prefer to avoid this, it means adding extra flag to sched_job
>>>> to check in each iteration of the ring mirror list.
>>> Mhm, why actually? We just need to check if the scheduler fence is signaled.
>> OK, i see it's equivalent but this still en extra check for all the
>> iterations.
>>
>>>> What about changing
>>>> signature of drm_sched_backend_ops.timedout_job to return drm_sched_job*
>>>> instead of void, this way we can return the guilty job back from the
>>>> driver specific handler to the generic drm_sched_job_timedout and
>>>> release it there.
>>> Well the timeout handler already has the job, so returning it doesn't
>>> make much sense.
>>>
>>> The problem is rather that the timeout handler doesn't know if it should
>>> destroy the job or not.
>> But the driver specific handler does, and actually returning back either
>> the pointer to the job or null will give an indication of that. We can
>> even return bool.
>>
>> Andrey
>
> Thinking a bit more about this - the way this check is done now "if
> (list_empty(&sched_job->node)) then free the sched_job" actually makes
> it possible to just move this as is from driver specific callbacks into
> drm_sched_job_timeout without any other changes.
Oh, well that sounds like a good idea off hand.
Need to see the final code, but at least the best idea so far.
Christian.
>
> Andrey
>
>>> Christian.
>>>
>>>> Andrey
>>>>
>>>>>>> +
>>>>>>> /* get the GPU back into the init state */
>>>>>>> v3d_reset(v3d);
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list