regression with d6c650c0a8f6f671e49553725e1db541376d95f2

No that’s not true

The free_job() is called in sched_job_finish() which is queued on a WORK and scheduled from that “amd_sched_fence_finished()”
So the finishing timing of free_job() is asynchronized with sched_process_job()

How can you sure free_job() must before “trace_amd_sched_process_job” ?

The free_job() callback is called only way after the job has finished.

That is one change actually made by you to the code :)


Am 13.10.2017 um 10:39 schrieb Liu, Monk:
I doubt it would always work fine…

First, we have FENCE_TRACE reference s_fence->finished after “fence_signal(&fence->finished)”
Second, we have trace_amd_sched_proess_job(s_fence) after “amd_sched_fence_finished()”,

If you put the finished before free_job() and by coincidence the job_finish() get very soon executed you’ll have odds to hit wild pointer on above two cases

Yeah, that change is actually incorrect and should be reverted.

What we really need to do is remove dropping sched_job->s_fence from amd_sched_process_job() into amd_sched_job_finish() directly before the call to free_job().


commit d6c650c0a8f6f671e49553725e1db541376d95f2
Author: Nicolai Hähnle
@@ -611,6 +611,10 @@ static int amd_sched_main(void *param)

                fence = sched->ops->run_job(sched_job);
+               /* amd_sched_process_job drops the job's reference of the fence. */
+               sched_job->s_fence = NULL;
                if (fence) {
                        s_fence->parent = dma_fence_get(fence);
                        r = dma_fence_add_callback(fence, &s_fence->cb,

Hi Nicolai

with this patch, you will break "amdgpu_sched_hw_job_reset()"routine:

amd_sched_hw_job_reset(struct amd_gpu_scheduler


    struct amd_sched_job


&sched->ring_mirror_list, node) {

        if (s_job->s_fence->parent










see that without sched_job->s_fence, you cannot remove the callback from its hw fence,

any idea??

