regression with d6c650c0a8f6f671e49553725e1db541376d95f2

Nicolai Hähnle nhaehnle at gmail.com
Tue Oct 17 08:57:55 UTC 2017


On 13.10.2017 10:46, Liu, Monk wrote:
> Just revert Nicolai’s patch,if other routine want to reference s_fence, 
> it should get the finished fence in the first place/time,
> 
> For gpu_reset routine, it refers to s_fence only on those unfinished job 
> in sched_hw_job_reset, so totally safe to refer to s_fence pointer
> 
> I wonder what issue Nicolai met with and submitted this patch

The original motivation of my patch was to catch accidental use of 
job->s_fence after the fence was destroyed in amd_sched_process_job. 
Basically, prevent a dangling pointer. I still think that's a reasonable 
idea, though clearly my first attempt at it was just wrong.

In Christian's v2 patch, it might make sense to add

	spin_unlock(&sched->job_list_lock);
+	dma_fence_put(&s_job->s_fence->finished);
+	s_job->s_fence = NULL;
	sched->ops->free_job(s_job);

... though I'm not 100% certain about how the fence lifetimes work.

Cheers,
Nicolai


> 
> BR Monk
> 
> *From:*amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] *On Behalf 
> Of *Liu, Monk
> *Sent:* 2017年10月13日16:40
> *To:* Koenig, Christian <Christian.Koenig at amd.com>; Nicolai Hähnle 
> <nhaehnle at gmail.com>; amd-gfx at lists.freedesktop.org
> *Subject:* RE: regression with d6c650c0a8f6f671e49553725e1db541376d95f2
> 
> I doubt it would always work fine…
> 
> First, we have FENCE_TRACE reference s_fence->finished after 
> “fence_signal(&fence->finished)”
> 
> Second, we have trace_amd_sched_proess_job(s_fence) after 
> “amd_sched_fence_finished()”,
> 
> If you put the finished before free_job() and by coincidence the 
> job_finish() get very soon executed you’ll have odds to hit wild pointer 
> on above two cases
> 
> BR Monk
> 
> *From:*Koenig, Christian
> *Sent:* 2017年10月13日16:17
> *To:* Liu, Monk <Monk.Liu at amd.com <mailto:Monk.Liu at amd.com>>; Nicolai 
> Hähnle <nhaehnle at gmail.com <mailto:nhaehnle at gmail.com>>; 
> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
> *Subject:* Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2
> 
> Yeah, that change is actually incorrect and should be reverted.
> 
> What we really need to do is remove dropping sched_job->s_fence from 
> amd_sched_process_job() into amd_sched_job_finish() directly before the 
> call to free_job().
> 
> Regards,
> Christian.
> 
> Am 13.10.2017 um 09:24 schrieb Liu, Monk:
> 
>     commit d6c650c0a8f6f671e49553725e1db541376d95f2
>     Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
>     <mailto:nicolai.haehnle at amd.com>
>     @@ -611,6 +611,10 @@ static int amd_sched_main(void *param)
> 
>                      fence = sched->ops->run_job(sched_job);
>                      amd_sched_fence_scheduled(s_fence);
>     +
>     +               /* amd_sched_process_job drops the job's reference
>     of the fence. */
>     +               sched_job->s_fence = NULL;
>     +
>                      if (fence) {
>                              s_fence->parent = dma_fence_get(fence);
>                              r = dma_fence_add_callback(fence, &s_fence->cb,
> 
>     Hi Nicolai
> 
>     with this patch, you will break "amdgpu_sched_hw_job_reset()"routine:
> 
>     void
> 
>     amd_sched_hw_job_reset(structamd_gpu_scheduler
> 
>     *sched)
> 
>     {
> 
>     structamd_sched_job
> 
>     *s_job;
> 
>     spin_lock(&sched->job_list_lock);
> 
>     list_for_each_entry_reverse(s_job,
> 
>     &sched->ring_mirror_list, node) {
> 
>     if(s_job->s_fence->parent
> 
>     &&
> 
>     fence_remove_callback(s_job->s_fence->parent,
> 
>                           &s_job->s_fence->cb))
> 
>     {
> 
>     fence_put(s_job->s_fence->parent);
> 
>                  s_job->s_fence->parent
> 
>     =
> 
>     NULL;
> 
>     atomic_dec(&sched->hw_rq_count);
> 
>              }
> 
>          }
> 
>     spin_unlock(&sched->job_list_lock);
> 
>     }
> 
>     see that without sched_job->s_fence, you cannot remove the callback
>     from its hw fence,
> 
>     any idea??
> 
>     BR Monk
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the amd-gfx mailing list