[PATCH] drm/scheduler: don't update last scheduled fence in TDR

Christian König ckoenig.leichtzumerken at gmail.com
Thu May 3 12:09:59 UTC 2018


That change is indeed fixing a problem. When drm_sched_job_recovery() is 
called s_job->entity should already be NULL.

And even when the process is killed we should still either re-submit the 
jobs or set the error.

BTW: There is another bug in that function: guilty_context needs to be 
kept alive over the course of a loop.

Going to submit a patch for that.

Regards,
Christian.

Am 26.04.2018 um 06:23 schrieb zhoucm1:
> NAK on it.
>
> First of all, without this patch, Does it cause any issue?
>
> second,
>
> entity->last_scheduled present the last submiting job.
>
> Your this change will break this meaning and don't work, e.g.
>
> 1. mirror list has jobA and jobB, assuming they are belonged to same 
> entity, then the entity->last_scheduled is jobB->finished.
>
> 2. when you do recovery, re-submit jobA first, if don't update 
> last_scheduled, then entity->last_scheduled still is jobB->finished.
>
> 3. killed this process, will call to drm_sched_entity_cleanup, will 
> register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB 
> isn't submitted at all.
>
>
> So the change isn't necessary at all.
>
>
> Regards,
>
> David Zhou
>
>
> On 2018年04月26日 11:47, Ding, Pixel wrote:
>> Hi Monk,
>>
>> Please review it. Thanks.
>>
>>>> Sincerely Yours,
>> Pixel
>>
>>
>> On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding at amd.com> wrote:
>>
>>      The current sequence in scheduler thread is:
>>      1. update last sched fence
>>      2. job begin (adding to mirror list)
>>      3. job finish (remove from mirror list)
>>      4. back to 1
>>           Since we update last sched prior to joining mirror list, 
>> the jobs
>>      in mirror list already pass the last sched fence. TDR just run
>>      the jobs in mirror list, so we should not update the last sched
>>      fences in TDR.
>>           Signed-off-by: Pixel Ding <Pixel.Ding at amd.com>
>>      ---
>>       drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
>>       1 file changed, 3 deletions(-)
>>           diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
>> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>      index 088ff2b..1f1dd70 100644
>>      --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>      +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>      @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct 
>> drm_gpu_scheduler *sched)
>>               fence = sched->ops->run_job(s_job);
>>               atomic_inc(&sched->hw_rq_count);
>>            - dma_fence_put(s_job->entity->last_scheduled);
>>      -        s_job->entity->last_scheduled = 
>> dma_fence_get(&s_fence->finished);
>>      -
>>               if (fence) {
>>                   s_fence->parent = dma_fence_get(fence);
>>                   r = dma_fence_add_callback(fence, &s_fence->cb,
>>      --
>>      2.7.4
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list