[PATCH] drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"

Christian König deathsimple at vodafone.de
Wed Sep 6 09:33:07 UTC 2017


> What’s you plan ? 
Not 100% sure yet. I need to move the fencing around to fix userptrs anyway.

When I'm done with that and when the UVD/VCE stuff is fixed then I'm 
going to tackle this next.

Regards,
Christian.

Am 06.09.2017 um 11:25 schrieb Liu, Monk:
>
> Yeah, you are right, although it has 32 slots (compared with 
> entit_push_job which only waits for two slots in gpu scheduler) but 
> still have chance to wait and meanwhile one job could under processing 
> by gpu reset
>
> What’s you plan ?
>
> Revert this patch is correct since it have potential dirty reference, 
> but how we need another patch to walk around this PD reservation dead 
> lock
>
> BR Monk
>
> *From:*Christian König [mailto:deathsimple at vodafone.de]
> *Sent:* 2017年9月6日16:20
> *To:* Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org; 
> Zhou, David(ChunMing) <David1.Zhou at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: revert "fix deadlock of reservation 
> between cs and gpu reset v2"
>
>     but how to understand 1)
>
>     what do you mean "The CS can still be blocked because of
>     amdgpu_ctx_add_fence()."
>
> See amdgpu_ctx_add_fence(), it can block for previous command 
> submissions just like entity_push_job(). So only moving 
> entity_push_job() out of locking the PD doesn't help at all.
>
>
>     for 2)The order of submission isn't correct any more.
>
>     I think since the pointer "job" is already a dirty pointer,
>     meaningless that  we talking about it if the order is correct ...
>
> The problem isn't parser->job, but rather that the job is referencing 
> the entity which is part of the context and we already called 
> amdgpu_ctx_put() on that one.
>
> Regards,
> Christian.
>
> Am 06.09.2017 um 10:04 schrieb Liu, Monk:
>
>     >The patch doesn't work at all:
>     1. The CS can still be blocked because of amdgpu_ctx_add_fence().
>     2. The order of submission isn't correct any more.
>     3. We could end up using freed up memory because we now drop the
>        ctx reference to early.
>
>     I suddenly found that the parser->job is really a wild pointer:
>
>     amdgpu_cs_parser_fini(p,
>
>     0, true);
>
>     trace_amdgpu_cs_ioctl(job);
>
>     amd_sched_entity_push_job(&job->base);
>
>     so "cs_parser_fini" cannot be called before "entity_push_job",
>     that part is correct
>
>     but how to understand 1)
>
>     what do you mean "The CS can still be blocked because of
>     amdgpu_ctx_add_fence()."
>
>     for 2)The order of submission isn't correct any more.
>
>     I think since the pointer "job" is already a dirty pointer,
>     meaningless that  we talking about it if the order is correct ...
>
>     BR Monk
>
>     ------------------------------------------------------------------------
>
>     *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org>
>     <mailto:amd-gfx-bounces at lists.freedesktop.org> on behalf of
>     Christian König <deathsimple at vodafone.de>
>     <mailto:deathsimple at vodafone.de>
>     *Sent:* Tuesday, September 5, 2017 9:14:23 PM
>     *To:* amd-gfx at lists.freedesktop.org
>     <mailto:amd-gfx at lists.freedesktop.org>; Zhou, David(ChunMing)
>     *Subject:* [PATCH] drm/amdgpu: revert "fix deadlock of reservation
>     between cs and gpu reset v2"
>
>     From: Christian König <christian.koenig at amd.com>
>     <mailto:christian.koenig at amd.com>
>
>     This reverts commit 10e709cb296c98424c03408d23e3addeddcd4088.
>
>     The patch doesn't work at all:
>     1. The CS can still be blocked because of amdgpu_ctx_add_fence().
>     2. The order of submission isn't correct any more.
>     3. We could end up using freed up memory because we now drop the
>        ctx reference to early.
>
>     This needs to be fixed cleanly by doing the context handling after
>     the BO
>     handling, but this is a larger task just avoid the obvious crashes
>     for now.
>
>     Signed-off-by: Christian König <christian.koenig at amd.com>
>     <mailto:christian.koenig at amd.com>
>     ---
>      drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ----
>      1 file changed, 4 deletions(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>     index b96776c..2db4010 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>     @@ -1061,7 +1061,6 @@ static int amdgpu_cs_submit(struct
>     amdgpu_cs_parser *p,
>              cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring,
>     p->fence);
>              job->uf_sequence = cs->out.handle;
>              amdgpu_job_free_resources(job);
>     -       amdgpu_cs_parser_fini(p, 0, true);
>
>              trace_amdgpu_cs_ioctl(job);
>              amd_sched_entity_push_job(&job->base);
>     @@ -1120,10 +1119,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev,
>     void *data, struct drm_file *filp)
>                      goto out;
>
>              r = amdgpu_cs_submit(&parser, cs);
>     -       if (r)
>     -               goto out;
>
>     -       return 0;
>      out:
>              amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
>              return r;
>     -- 
>     2.7.4
>
>     _______________________________________________
>     amd-gfx mailing list
>     amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
>     https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
>
>     _______________________________________________
>
>     amd-gfx mailing list
>
>     amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org>
>
>     https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170906/bcf25756/attachment-0001.html>


More information about the amd-gfx mailing list