[PATCH] drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
Liu, Monk
Monk.Liu at amd.com
Wed Sep 6 09:46:56 UTC 2017
Reviewed-by: Monk Liu monk.liu at amd.com<mailto:monk.liu at amd.com>
I’m thinking of directly call “ttm_eu_fence_buffer_object” before ctx_ad_fence(), and right after “fence_get(&job->base.s_fence->finished);”
Since that time we already have parser->ticket/validated/fence,
BR Monk
From: Christian König [mailto:deathsimple at vodafone.de]
Sent: 2017年9月6日 17:33
To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org; Zhou, David(ChunMing) <David1.Zhou at amd.com>
Subject: Re: [PATCH] drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
What’s you plan ?
Not 100% sure yet. I need to move the fencing around to fix userptrs anyway.
When I'm done with that and when the UVD/VCE stuff is fixed then I'm going to tackle this next.
Regards,
Christian.
Am 06.09.2017 um 11:25 schrieb Liu, Monk:
Yeah, you are right, although it has 32 slots (compared with entit_push_job which only waits for two slots in gpu scheduler) but still have chance to wait and meanwhile one job could under processing by gpu reset
What’s you plan ?
Revert this patch is correct since it have potential dirty reference, but how we need another patch to walk around this PD reservation dead lock
BR Monk
From: Christian König [mailto:deathsimple at vodafone.de]
Sent: 2017年9月6日 16:20
To: Liu, Monk <Monk.Liu at amd.com><mailto:Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhou, David(ChunMing) <David1.Zhou at amd.com><mailto:David1.Zhou at amd.com>
Subject: Re: [PATCH] drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
but how to understand 1)
what do you mean "The CS can still be blocked because of amdgpu_ctx_add_fence()."
See amdgpu_ctx_add_fence(), it can block for previous command submissions just like entity_push_job(). So only moving entity_push_job() out of locking the PD doesn't help at all.
for 2)The order of submission isn't correct any more.
I think since the pointer "job" is already a dirty pointer, meaningless that we talking about it if the order is correct ...
The problem isn't parser->job, but rather that the job is referencing the entity which is part of the context and we already called amdgpu_ctx_put() on that one.
Regards,
Christian.
Am 06.09.2017 um 10:04 schrieb Liu, Monk:
>The patch doesn't work at all:
1. The CS can still be blocked because of amdgpu_ctx_add_fence().
2. The order of submission isn't correct any more.
3. We could end up using freed up memory because we now drop the
ctx reference to early.
I suddenly found that the parser->job is really a wild pointer:
amdgpu_cs_parser_fini(p,
0, true);
trace_amdgpu_cs_ioctl(job);
amd_sched_entity_push_job(&job->base);
so "cs_parser_fini" cannot be called before "entity_push_job", that part is correct
but how to understand 1)
what do you mean "The CS can still be blocked because of amdgpu_ctx_add_fence()."
for 2)The order of submission isn't correct any more.
I think since the pointer "job" is already a dirty pointer, meaningless that we talking about it if the order is correct ...
BR Monk
________________________________
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org><mailto:amd-gfx-bounces at lists.freedesktop.org> on behalf of Christian König <deathsimple at vodafone.de><mailto:deathsimple at vodafone.de>
Sent: Tuesday, September 5, 2017 9:14:23 PM
To: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhou, David(ChunMing)
Subject: [PATCH] drm/amdgpu: revert "fix deadlock of reservation between cs and gpu reset v2"
From: Christian König <christian.koenig at amd.com><mailto:christian.koenig at amd.com>
This reverts commit 10e709cb296c98424c03408d23e3addeddcd4088.
The patch doesn't work at all:
1. The CS can still be blocked because of amdgpu_ctx_add_fence().
2. The order of submission isn't correct any more.
3. We could end up using freed up memory because we now drop the
ctx reference to early.
This needs to be fixed cleanly by doing the context handling after the BO
handling, but this is a larger task just avoid the obvious crashes for now.
Signed-off-by: Christian König <christian.koenig at amd.com><mailto:christian.koenig at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b96776c..2db4010 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1061,7 +1061,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
job->uf_sequence = cs->out.handle;
amdgpu_job_free_resources(job);
- amdgpu_cs_parser_fini(p, 0, true);
trace_amdgpu_cs_ioctl(job);
amd_sched_entity_push_job(&job->base);
@@ -1120,10 +1119,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
goto out;
r = amdgpu_cs_submit(&parser, cs);
- if (r)
- goto out;
- return 0;
out:
amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
return r;
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170906/13b3c531/attachment-0001.html>
More information about the amd-gfx
mailing list