regression with d6c650c0a8f6f671e49553725e1db541376d95f2

Christian König christian.koenig at amd.com
Fri Oct 13 08:42:40 UTC 2017


The free_job() callback is called only way after the job has finished.

That is one change actually made by you to the code :)

Christian.

Am 13.10.2017 um 10:39 schrieb Liu, Monk:
>
> I doubt it would always work fine…
>
> First, we have FENCE_TRACE reference s_fence->finished after 
> “fence_signal(&fence->finished)”
>
> Second, we have trace_amd_sched_proess_job(s_fence) after 
> “amd_sched_fence_finished()”,
>
> If you put the finished before free_job() and by coincidence the 
> job_finish() get very soon executed you’ll have odds to hit wild 
> pointer on above two cases
>
> BR Monk
>
> *From:*Koenig, Christian
> *Sent:* 2017年10月13日16:17
> *To:* Liu, Monk <Monk.Liu at amd.com>; Nicolai Hähnle 
> <nhaehnle at gmail.com>; amd-gfx at lists.freedesktop.org
> *Subject:* Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2
>
> Yeah, that change is actually incorrect and should be reverted.
>
> What we really need to do is remove dropping sched_job->s_fence from 
> amd_sched_process_job() into amd_sched_job_finish() directly before 
> the call to free_job().
>
> Regards,
> Christian.
>
> Am 13.10.2017 um 09:24 schrieb Liu, Monk:
>
>     commit d6c650c0a8f6f671e49553725e1db541376d95f2
>     Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
>     <mailto:nicolai.haehnle at amd.com>
>     @@ -611,6 +611,10 @@ static int amd_sched_main(void *param)
>
>                     fence = sched->ops->run_job(sched_job);
>                     amd_sched_fence_scheduled(s_fence);
>     +
>     +               /* amd_sched_process_job drops the job's reference
>     of the fence. */
>     +               sched_job->s_fence = NULL;
>     +
>                     if (fence) {
>                             s_fence->parent = dma_fence_get(fence);
>                             r = dma_fence_add_callback(fence,
>     &s_fence->cb,
>
>     Hi Nicolai
>
>     with this patch, you will break "amdgpu_sched_hw_job_reset()"routine:
>
>     void
>
>     amd_sched_hw_job_reset(structamd_gpu_scheduler
>
>     *sched)
>
>     {
>
>     structamd_sched_job
>
>     *s_job;
>
>     spin_lock(&sched->job_list_lock);
>
>     list_for_each_entry_reverse(s_job,
>
>     &sched->ring_mirror_list, node) {
>
>     if(s_job->s_fence->parent
>
>     &&
>
>     fence_remove_callback(s_job->s_fence->parent,
>
>     &s_job->s_fence->cb))
>
>     {
>
>     fence_put(s_job->s_fence->parent);
>
>                 s_job->s_fence->parent
>
>     =
>
>     NULL;
>
>     atomic_dec(&sched->hw_rq_count);
>
>             }
>
>         }
>
>     spin_unlock(&sched->job_list_lock);
>
>     }
>
>     see that without sched_job->s_fence, you cannot remove the
>     callback from its hw fence,
>
>     any idea??
>
>     BR Monk
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20171013/0c11208a/attachment-0001.html>


More information about the amd-gfx mailing list