regression with d6c650c0a8f6f671e49553725e1db541376d95f2

Liu, Monk Monk.Liu at
Fri Oct 13 08:49:29 UTC 2017

No that’s not true

The free_job() is called in sched_job_finish() which is queued on a WORK and scheduled from that “amd_sched_fence_finished()”
So the finishing timing of free_job() is asynchronized with sched_process_job()

How can you sure free_job() must before “trace_amd_sched_process_job” ?

From: Koenig, Christian
Sent: 2017年10月13日 16:43
To: Liu, Monk <Monk.Liu at>; Nicolai Hähnle <nhaehnle at>; amd-gfx at
Subject: Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2

The free_job() callback is called only way after the job has finished.

That is one change actually made by you to the code :)


Am 13.10.2017 um 10:39 schrieb Liu, Monk:
I doubt it would always work fine…

First, we have FENCE_TRACE reference s_fence->finished after “fence_signal(&fence->finished)”
Second, we have trace_amd_sched_proess_job(s_fence) after “amd_sched_fence_finished()”,

If you put the finished before free_job() and by coincidence the job_finish() get very soon executed you’ll have odds to hit wild pointer on above two cases

BR Monk

From: Koenig, Christian
Sent: 2017年10月13日 16:17
To: Liu, Monk <Monk.Liu at><mailto:Monk.Liu at>; Nicolai Hähnle <nhaehnle at><mailto:nhaehnle at>; amd-gfx at<mailto:amd-gfx at>
Subject: Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2

Yeah, that change is actually incorrect and should be reverted.

What we really need to do is remove dropping sched_job->s_fence from amd_sched_process_job() into amd_sched_job_finish() directly before the call to free_job().


Am 13.10.2017 um 09:24 schrieb Liu, Monk:
commit d6c650c0a8f6f671e49553725e1db541376d95f2
Author: Nicolai Hähnle <nicolai.haehnle at><mailto:nicolai.haehnle at>
@@ -611,6 +611,10 @@ static int amd_sched_main(void *param)

                fence = sched->ops->run_job(sched_job);
+               /* amd_sched_process_job drops the job's reference of the fence. */
+               sched_job->s_fence = NULL;
                if (fence) {
                        s_fence->parent = dma_fence_get(fence);
                        r = dma_fence_add_callback(fence, &s_fence->cb,

Hi Nicolai

with this patch, you will break "amdgpu_sched_hw_job_reset()"routine:

amd_sched_hw_job_reset(struct amd_gpu_scheduler


    struct amd_sched_job


&sched->ring_mirror_list, node) {

        if (s_job->s_fence->parent










see that without sched_job->s_fence, you cannot remove the callback from its hw fence,

any idea??

BR Monk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the amd-gfx mailing list