regression with d6c650c0a8f6f671e49553725e1db541376d95f2
Christian König
christian.koenig at amd.com
Fri Oct 13 08:42:40 UTC 2017
The free_job() callback is called only way after the job has finished.
That is one change actually made by you to the code :)
Christian.
Am 13.10.2017 um 10:39 schrieb Liu, Monk:
>
> I doubt it would always work fine…
>
> First, we have FENCE_TRACE reference s_fence->finished after
> “fence_signal(&fence->finished)”
>
> Second, we have trace_amd_sched_proess_job(s_fence) after
> “amd_sched_fence_finished()”,
>
> If you put the finished before free_job() and by coincidence the
> job_finish() get very soon executed you’ll have odds to hit wild
> pointer on above two cases
>
> BR Monk
>
> *From:*Koenig, Christian
> *Sent:* 2017年10月13日16:17
> *To:* Liu, Monk <Monk.Liu at amd.com>; Nicolai Hähnle
> <nhaehnle at gmail.com>; amd-gfx at lists.freedesktop.org
> *Subject:* Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2
>
> Yeah, that change is actually incorrect and should be reverted.
>
> What we really need to do is remove dropping sched_job->s_fence from
> amd_sched_process_job() into amd_sched_job_finish() directly before
> the call to free_job().
>
> Regards,
> Christian.
>
> Am 13.10.2017 um 09:24 schrieb Liu, Monk:
>
> commit d6c650c0a8f6f671e49553725e1db541376d95f2
> Author: Nicolai Hähnle <nicolai.haehnle at amd.com>
> <mailto:nicolai.haehnle at amd.com>
> @@ -611,6 +611,10 @@ static int amd_sched_main(void *param)
>
> fence = sched->ops->run_job(sched_job);
> amd_sched_fence_scheduled(s_fence);
> +
> + /* amd_sched_process_job drops the job's reference
> of the fence. */
> + sched_job->s_fence = NULL;
> +
> if (fence) {
> s_fence->parent = dma_fence_get(fence);
> r = dma_fence_add_callback(fence,
> &s_fence->cb,
>
> Hi Nicolai
>
> with this patch, you will break "amdgpu_sched_hw_job_reset()"routine:
>
> void
>
> amd_sched_hw_job_reset(structamd_gpu_scheduler
>
> *sched)
>
> {
>
> structamd_sched_job
>
> *s_job;
>
> spin_lock(&sched->job_list_lock);
>
> list_for_each_entry_reverse(s_job,
>
> &sched->ring_mirror_list, node) {
>
> if(s_job->s_fence->parent
>
> &&
>
> fence_remove_callback(s_job->s_fence->parent,
>
> &s_job->s_fence->cb))
>
> {
>
> fence_put(s_job->s_fence->parent);
>
> s_job->s_fence->parent
>
> =
>
> NULL;
>
> atomic_dec(&sched->hw_rq_count);
>
> }
>
> }
>
> spin_unlock(&sched->job_list_lock);
>
> }
>
> see that without sched_job->s_fence, you cannot remove the
> callback from its hw fence,
>
> any idea??
>
> BR Monk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20171013/0c11208a/attachment-0001.html>
More information about the amd-gfx
mailing list