[PATCH] drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
Christian König
deathsimple at vodafone.de
Wed Sep 6 09:33:07 UTC 2017
> What’s you plan ?
Not 100% sure yet. I need to move the fencing around to fix userptrs anyway.
When I'm done with that and when the UVD/VCE stuff is fixed then I'm
going to tackle this next.
Regards,
Christian.
Am 06.09.2017 um 11:25 schrieb Liu, Monk:
>
> Yeah, you are right, although it has 32 slots (compared with
> entit_push_job which only waits for two slots in gpu scheduler) but
> still have chance to wait and meanwhile one job could under processing
> by gpu reset
>
> What’s you plan ?
>
> Revert this patch is correct since it have potential dirty reference,
> but how we need another patch to walk around this PD reservation dead
> lock
>
> BR Monk
>
> *From:*Christian König [mailto:deathsimple at vodafone.de]
> *Sent:* 2017年9月6日16:20
> *To:* Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org;
> Zhou, David(ChunMing) <David1.Zhou at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: revert "fix deadlock of reservation
> between cs and gpu reset v2"
>
> but how to understand 1)
>
> what do you mean "The CS can still be blocked because of
> amdgpu_ctx_add_fence()."
>
> See amdgpu_ctx_add_fence(), it can block for previous command
> submissions just like entity_push_job(). So only moving
> entity_push_job() out of locking the PD doesn't help at all.
>
>
> for 2)The order of submission isn't correct any more.
>
> I think since the pointer "job" is already a dirty pointer,
> meaningless that we talking about it if the order is correct ...
>
> The problem isn't parser->job, but rather that the job is referencing
> the entity which is part of the context and we already called
> amdgpu_ctx_put() on that one.
>
> Regards,
> Christian.
>
> Am 06.09.2017 um 10:04 schrieb Liu, Monk:
>
> >The patch doesn't work at all:
> 1. The CS can still be blocked because of amdgpu_ctx_add_fence().
> 2. The order of submission isn't correct any more.
> 3. We could end up using freed up memory because we now drop the
> ctx reference to early.
>
> I suddenly found that the parser->job is really a wild pointer:
>
> amdgpu_cs_parser_fini(p,
>
> 0, true);
>
> trace_amdgpu_cs_ioctl(job);
>
> amd_sched_entity_push_job(&job->base);
>
> so "cs_parser_fini" cannot be called before "entity_push_job",
> that part is correct
>
> but how to understand 1)
>
> what do you mean "The CS can still be blocked because of
> amdgpu_ctx_add_fence()."
>
> for 2)The order of submission isn't correct any more.
>
> I think since the pointer "job" is already a dirty pointer,
> meaningless that we talking about it if the order is correct ...
>
> BR Monk
>
> ------------------------------------------------------------------------
>
> *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org>
> <mailto:amd-gfx-bounces at lists.freedesktop.org> on behalf of
> Christian König <deathsimple at vodafone.de>
> <mailto:deathsimple at vodafone.de>
> *Sent:* Tuesday, September 5, 2017 9:14:23 PM
> *To:* amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>; Zhou, David(ChunMing)
> *Subject:* [PATCH] drm/amdgpu: revert "fix deadlock of reservation
> between cs and gpu reset v2"
>
> From: Christian König <christian.koenig at amd.com>
> <mailto:christian.koenig at amd.com>
>
> This reverts commit 10e709cb296c98424c03408d23e3addeddcd4088.
>
> The patch doesn't work at all:
> 1. The CS can still be blocked because of amdgpu_ctx_add_fence().
> 2. The order of submission isn't correct any more.
> 3. We could end up using freed up memory because we now drop the
> ctx reference to early.
>
> This needs to be fixed cleanly by doing the context handling after
> the BO
> handling, but this is a larger task just avoid the obvious crashes
> for now.
>
> Signed-off-by: Christian König <christian.koenig at amd.com>
> <mailto:christian.koenig at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index b96776c..2db4010 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1061,7 +1061,6 @@ static int amdgpu_cs_submit(struct
> amdgpu_cs_parser *p,
> cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring,
> p->fence);
> job->uf_sequence = cs->out.handle;
> amdgpu_job_free_resources(job);
> - amdgpu_cs_parser_fini(p, 0, true);
>
> trace_amdgpu_cs_ioctl(job);
> amd_sched_entity_push_job(&job->base);
> @@ -1120,10 +1119,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev,
> void *data, struct drm_file *filp)
> goto out;
>
> r = amdgpu_cs_submit(&parser, cs);
> - if (r)
> - goto out;
>
> - return 0;
> out:
> amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
> return r;
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
>
> _______________________________________________
>
> amd-gfx mailing list
>
> amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
>
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170906/bcf25756/attachment-0001.html>
More information about the amd-gfx
mailing list