[PATCH v6 1/2] drm/sched: Refactor ring mirror list handling.

Tue Mar 12 14:56:54 UTC 2019

On 3/12/19 3:43 AM, Tomeu Vizoso wrote:
> On Thu, 27 Dec 2018 at 20:28, Andrey Grodzovsky
> <andrey.grodzovsky at amd.com> wrote:
>> Decauple sched threads stop and start and ring mirror
>> list handling from the policy of what to do about the
>> guilty jobs.
>> When stoppping the sched thread and detaching sched fences
>> from non signaled HW fenes wait for all signaled HW fences
>> to complete before rerunning the jobs.
>>
>> v2: Fix resubmission of guilty job into HW after refactoring.
>>
>> v4:
>> Full restart for all the jobs, not only from guilty ring.
>> Extract karma increase into standalone function.
>>
>> v5:
>> Rework waiting for signaled jobs without relying on the job
>> struct itself as those might already be freed for non 'guilty'
>> job's schedulers.
>> Expose karma increase to drivers.
>>
>> v6:
>> Use list_for_each_entry_safe_continue and drm_sched_process_job
>> in case fence already signaled.
>> Call drm_sched_increase_karma only once for amdgpu and add documentation.
>>
>> Suggested-by: Christian Koenig <Christian.Koenig at amd.com>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  20 ++-
>>   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |  11 +-
>>   drivers/gpu/drm/scheduler/sched_main.c     | 195 +++++++++++++++++++----------
>>   drivers/gpu/drm/v3d/v3d_sched.c            |  12 +-
>>   include/drm/gpu_scheduler.h                |   8 +-
>>   5 files changed, 157 insertions(+), 89 deletions(-)
>>
> [snip]
>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
>> index 445b2ef..f76d9ed 100644
>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> @@ -178,18 +178,22 @@ v3d_job_timedout(struct drm_sched_job *sched_job)
>>          for (q = 0; q < V3D_MAX_QUEUES; q++) {
>>                  struct drm_gpu_scheduler *sched = &v3d->queue[q].sched;
>>
>> -               kthread_park(sched->thread);
>> -               drm_sched_hw_job_reset(sched, (sched_job->sched == sched ?
>> +               drm_sched_stop(sched, (sched_job->sched == sched ?
>>                                                 sched_job : NULL));
>> +
>> +               if(sched_job)
>> +                       drm_sched_increase_karma(sched_job);
>>          }
>>
>>          /* get the GPU back into the init state */
>>          v3d_reset(v3d);
>>
>> +       for (q = 0; q < V3D_MAX_QUEUES; q++)
>> +               drm_sched_resubmit_jobs(sched_job->sched);
> Hi Andrey,
>
> I'm not sure of what was the original intent, but I guess it wasn't to
> repeatedly call resubmit_jobs on that specific job's queue?
>
> Regards,
>
> Tomeu

My bad,  there is also another mistake here with increasing karma for 
the guilty job's entity multiple times. I will fix that. Thanks for 
pointing out.

Andrey