[PATCH 1/1] drm/scheduler: Job timeout handler returns status (v2)

Christian König christian.koenig at amd.com
Thu Dec 10 09:41:59 UTC 2020


Am 10.12.20 um 10:31 schrieb Lucas Stach:
> Hi Luben,
>
> Am Mittwoch, den 09.12.2020, 21:14 -0500 schrieb Luben Tuikov:
>> [SNIP]
>> -static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>> +static enum drm_task_status etnaviv_sched_timedout_job(struct drm_sched_job
>> +						       *sched_job)
>>   {
>>   	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
>>   	struct etnaviv_gpu *gpu = submit->gpu;
>> @@ -120,9 +121,16 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>>   
>>   	drm_sched_resubmit_jobs(&gpu->sched);
>>   
>> +	/* Tell the DRM scheduler that this task needs
>> +	 * more time.
>> +	 */
> This comment doesn't match the kernel coding style, but it's also moot
> as the whole added code block isn't needed. The code just below is
> identical, so letting execution continue here instead of returning
> would be the right thing to do, but maybe you mean to return
> DRM_TASK_STATUS_COMPLETE? It's a bit confusing that aborted and job
> successfully finished should be signaled with the same return code.

Yes and no. As I tried to describe in my previous mail the naming of the 
enum values is also not very good.

See even when the job has completed we need to restart the timer for the 
potential next job.

Only when the device is completely gone and unrecoverable we should not 
restart the timer.

I suggest to either make this an int and return -ENODEV when that 
happens or rename the enum to something like DRM_SCHED_NODEV.

Regards,
Christian.

>
>> +	drm_sched_start(&gpu->sched, true);
>> +	return DRM_TASK_STATUS_ALIVE;
>> +
>>   out_no_timeout:
>>   	/* restart scheduler after GPU is usable again */
>>   	drm_sched_start(&gpu->sched, true);
>> +	return DRM_TASK_STATUS_ALIVE;
>>   }
> Regards,
> Lucas
>



More information about the amd-gfx mailing list