[PATCH 1/1] drm/scheduler: Job timeout handler returns status (v2)

Fri Dec 11 20:44:44 UTC 2020

On 2020-12-10 4:41 a.m., Christian König wrote:
> Am 10.12.20 um 10:31 schrieb Lucas Stach:
>> Hi Luben,
>>
>> Am Mittwoch, den 09.12.2020, 21:14 -0500 schrieb Luben Tuikov:
>>> [SNIP]
>>> -static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>>> +static enum drm_task_status etnaviv_sched_timedout_job(struct drm_sched_job
>>> +						       *sched_job)
>>>   {
>>>   	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
>>>   	struct etnaviv_gpu *gpu = submit->gpu;
>>> @@ -120,9 +121,16 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>>>   
>>>   	drm_sched_resubmit_jobs(&gpu->sched);
>>>   
>>> +	/* Tell the DRM scheduler that this task needs
>>> +	 * more time.
>>> +	 */
>> This comment doesn't match the kernel coding style, but it's also moot
>> as the whole added code block isn't needed. The code just below is
>> identical, so letting execution continue here instead of returning
>> would be the right thing to do, but maybe you mean to return
>> DRM_TASK_STATUS_COMPLETE? It's a bit confusing that aborted and job
>> successfully finished should be signaled with the same return code.
> 
> Yes and no. As I tried to describe in my previous mail the naming of the 
> enum values is also not very good.

I tried to make the naming as minimal as possible:
COMPLETE: the task is out of the hardware.
ALIVE: the task is in the hardware.

> 
> See even when the job has completed we need to restart the timer for the 
> potential next job.

Sure, yes. But this is something which the DRM decides--why should
drivers know of the internals of the DRM? (i.e. that it restarts
the timer or that there is a timer, etc.) Return minimal
value and let the DRM decide what to do next.

> 
> Only when the device is completely gone and unrecoverable we should not 
> restart the timer.

Yes, agreed.

> 
> I suggest to either make this an int and return -ENODEV when that 
> happens or rename the enum to something like DRM_SCHED_NODEV.

It was an int, but you suggested that it'd be a macro, so I made
it an enum so that the complier can check the values against the macros
returned.

I suggested, DRM_TASK_SCHED_ENODEV, but let Andrey add it
when he adds his patches on top of my patch here, because his
work adds hotplug/unplug support.

Also, note that if the pending list is freed, while the DRM
had been blocked, i.e. notified that the device is gone,
then returning DRM_TASK_SCHED_ENODEV would be moot, as there
are no tasks in the pending list.

This patch here is good as it is, since it is minimal and doesn't make
assumptions on DRM behaviour.

Regards,
Luben

> 
> Regards,
> Christian.
> 
>>
>>> +	drm_sched_start(&gpu->sched, true);
>>> +	return DRM_TASK_STATUS_ALIVE;
>>> +
>>>   out_no_timeout:
>>>   	/* restart scheduler after GPU is usable again */
>>>   	drm_sched_start(&gpu->sched, true);
>>> +	return DRM_TASK_STATUS_ALIVE;
>>>   }
>> Regards,
>> Lucas
>>
>