[PATCH 2/2] drm/sched: fix timeout handling

Mon Oct 8 17:55:55 UTC 2018

Am 08.10.2018 um 19:40 schrieb Nayan Deshmukh:
> Hi Christian,
>
> I am a bit confused with this patch. It will be better if you can
> explain what these 3 functions are supposed to do. From my
> understanding this is what they are supposed to do:
>
> 1. drm_sched_job_timedout: This function is called when a job has been
> executing for more than "timeout" seconds. This function needs to
> identify the wrong job and call the timedout_job function which will
> notify the driver of that job and the gpu should be recovered to its
> original state.
>
> 2. drm_sched_hw_job_reset: stop the scheduler if it contains the bad
> job. ATM this function first removes all the callback and goes through
> the ring_mirror_list afterward to find the bad job. I think it should
> do it in the opposite order. Or my understanding of this function is
> not fully correct.

That function actually has a couple of bugs itself, but it is irrelevant 
for the current patch.

> 3. drm_sched_job_recovery: recover jobs after a reset. It resubmits
> the job to the hardware queue avoiding the guilty job.

That function is completely unrelated and only called after recovering 
from a hard reset.

>
> Some other questions and suggestions:
>
> 1. We can avoid the race condition altogether if we shift the
> canceling and rescheduling of the timedout work item to the
> drm_sched_process_job().

That won't work. drm_sched_process_job() is called from interrupt 
context and can't sync with a timeout worker.

> 2. How does removing and adding the callback help with the race
> condition? Moreover, the hardware might execute some jobs while we are
> in this function leading to more races.

We need to make sure that the underlying hardware fence doesn't signal 
and triggers new processing while we are about to call the drivers 
timeout function to reset the hardware.

This is done by removing the callbacks in reverse order (e.g. newest to 
oldest).

If we find that we can't remove the callback because the hardware has 
actually continued and the fence has already signaled we add back the 
callback again in normal order (e.g. oldest to newest) starting from the 
job which was already  signaled.

>
> 3. What is the order in which the above 3 functions should be executed
> by the hardware? I think the answer to this question might clear a lot
> of my doubts.

If we can quickly recover from a problem only the 
drm_sched_job_timedout() should be execute.

The other two are for hard resets where we need to stop multiple 
scheduler instances and get them running again.

That the karma handling is mixed into that is rather unfortunate and 
actually quite buggy as well.

I should probably also clean that up.

Regards,
Christian.

>
> Regards,
> Nayan
>
> On Mon, Oct 8, 2018 at 8:36 PM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>> We need to make sure that we don't race between job completion and
>> timeout.
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 28 +++++++++++++++++++++++++++-
>>   1 file changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index bd7d11c47202..ad3c57c9fd21 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -248,14 +248,40 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>   static void drm_sched_job_timedout(struct work_struct *work)
>>   {
>>          struct drm_gpu_scheduler *sched;
>> +       struct drm_sched_fence *fence;
>>          struct drm_sched_job *job;
>> +       int r;
>>
>>          sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>> +
>> +       spin_lock(&sched->job_list_lock);
>> +       list_for_each_entry_reverse(job, &sched->ring_mirror_list, node) {
>> +               fence = job->s_fence;
>> +               if (!dma_fence_remove_callback(fence->parent, &fence->cb))
>> +                       goto already_signaled;
>> +       }
>> +
>>          job = list_first_entry_or_null(&sched->ring_mirror_list,
>>                                         struct drm_sched_job, node);
>> +       spin_unlock(&sched->job_list_lock);
>>
>>          if (job)
>> -               job->sched->ops->timedout_job(job);
>> +               sched->ops->timedout_job(job);
>> +
>> +       spin_lock(&sched->job_list_lock);
>> +       list_for_each_entry(job, &sched->ring_mirror_list, node) {
>> +               fence = job->s_fence;
>> +               if (!fence->parent || !list_empty(&fence->cb.node))
>> +                       continue;
>> +
>> +               r = dma_fence_add_callback(fence->parent, &fence->cb,
>> +                                          drm_sched_process_job);
>> +               if (r)
>> +already_signaled:
>> +                       drm_sched_process_job(fence->parent, &fence->cb);
>> +
>> +       }
>> +       spin_unlock(&sched->job_list_lock);
>>   }
>>
>>   /**
>> --
>> 2.14.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel