<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 2017年05月03日 14:02, Liu, Monk wrote:<br>
</div>
<blockquote
cite="mid:DM5PR12MB161082763FA0163FF22E1C1F84160@DM5PR12MB1610.namprd12.prod.outlook.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">You can add ctx as filed of job, but
not get reference of it, when you try to use ctx, just check
if ctx == NULL.<br>
> that doesn't work at all... job->ctx will always be
non-NULL after it is initialized, you just refer to a wild
pointer after CTX released <br>
</div>
</span></font></blockquote>
<font size="2">job->ctx is a **ctx, which could resolve your this
concern.<br>
<br>
</font>
<blockquote
cite="mid:DM5PR12MB161082763FA0163FF22E1C1F84160@DM5PR12MB1610.namprd12.prod.outlook.com"
type="cite"><font size="2"><span style="font-size:10pt;">
<div class="PlainText">
<br>
Another stupid method:<br>
Use idr_for_each_entry(..job->vm->ctx_mgr...) and
compare the job->fence->fence_context with
ctx->ring[].entity->fence_context. if not found, then
ctx is freed, otherwise you can do your things for this ctx.<br>
> 1) fence_context have chance to incorrectly represent
the context behind it, because the number can be used up and
will wrapped around from beginning <br>
</div>
</span></font></blockquote>
<font size="2">No, fence_context is unique in global in kernel.<br>
<br>
Regards,<br>
David Zhou<br>
</font>
<blockquote
cite="mid:DM5PR12MB161082763FA0163FF22E1C1F84160@DM5PR12MB1610.namprd12.prod.outlook.com"
type="cite"><font size="2"><span style="font-size:10pt;">
<div class="PlainText">
2) why not just keep CTX alive, which is much simple than
this method <br>
<br>
BR<br>
<br>
-----Original Message-----<br>
From: Zhou, David(ChunMing) <br>
Sent: Wednesday, May 03, 2017 12:54 PM<br>
To: Liu, Monk <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>; Liu, Monk
<a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>; Christian König
<a class="moz-txt-link-rfc2396E" href="mailto:deathsimple@vodafone.de"><deathsimple@vodafone.de></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all
job finished<br>
<br>
You can add ctx as filed of job, but not get reference of
it, when you try to use ctx, just check if ctx == NULL. <br>
<br>
Another stupid method:<br>
Use idr_for_each_entry(..job->vm->ctx_mgr...) and
compare the job->fence->fence_context with
ctx->ring[].entity->fence_context. if not found, then
ctx is freed, otherwise you can do your things for this ctx.<br>
<br>
Regards,<br>
David Zhou<br>
<br>
-----Original Message-----<br>
From: amd-gfx [<a moz-do-not-send="true"
href="mailto:amd-gfx-bounces@lists.freedesktop.org">mailto:amd-gfx-bounces@lists.freedesktop.org</a>]
On Behalf Of Liu, Monk<br>
Sent: Wednesday, May 03, 2017 11:57 AM<br>
To: Liu, Monk <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>; Christian König
<a class="moz-txt-link-rfc2396E" href="mailto:deathsimple@vodafone.de"><deathsimple@vodafone.de></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all
job finished<br>
<br>
@Christian,<br>
<br>
The thing is I need mark all entities behind this timeout
job as guilty, so I use a member in entity named
"ptr_guilty", to point to The member in amdgpu_ctx named
"guilty", that way read *entity->ptr_guilty can get you
if this entity is "invalid", and set *entity->ptr_guilty
Can let you set all entities belongs to the context as
"invalid".<br>
<br>
Above logic is to guarantee we can kick out all guilty
entities in entity kfifo, and also block all IOCTL with a
ctx handle point to this guilty context, And we only
recovery other jobs/entities/context after scheduler
unparked <br>
<br>
If you reject the patch to keep ctx alive till all jobs
signaled, please give me a solution to satisfy above logic <br>
<br>
BR Monk<br>
<br>
-----Original Message-----<br>
From: amd-gfx [<a moz-do-not-send="true"
href="mailto:amd-gfx-bounces@lists.freedesktop.org">mailto:amd-gfx-bounces@lists.freedesktop.org</a>]
On Behalf Of Liu, Monk<br>
Sent: Wednesday, May 03, 2017 11:31 AM<br>
To: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:deathsimple@vodafone.de"><deathsimple@vodafone.de></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all
job finished<br>
<br>
1, This is necessary otherwise how can I access entity
pointer after a job timedout , and why it is dangerous ?<br>
2, what's the status field in the fences you were referring
to ? I need to judge if it could satisfy my requirement <br>
<br>
<br>
<br>
-----Original Message-----<br>
From: Christian König [<a moz-do-not-send="true"
href="mailto:deathsimple@vodafone.de">mailto:deathsimple@vodafone.de</a>]<br>
Sent: Monday, May 01, 2017 10:48 PM<br>
To: Liu, Monk <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
Subject: Re: [PATCH 1/5] drm/amdgpu:keep ctx alive till all
job finished<br>
<br>
Am 01.05.2017 um 09:22 schrieb Monk Liu:<br>
> for TDR guilty context feature, we need access
ctx/s_entity field <br>
> member through sched job pointer,so ctx must keep alive
till all job <br>
> from it signaled.<br>
<br>
NAK, that is unnecessary and quite dangerous.<br>
<br>
Instead we have the designed status field in the fences
which should be checked for that.<br>
<br>
Regards,<br>
Christian.<br>
<br>
><br>
> Change-Id: Ib87e9502f7a5c8c054c7e56956d7f7ad75998e43<br>
> Signed-off-by: Monk Liu <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a><br>
> ---<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6
+++++-<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 9
+++++++++<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 9
+++++++--<br>
> drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 6
------<br>
> drivers/gpu/drm/amd/scheduler/gpu_scheduler.h | 1 +<br>
> 6 files changed, 23 insertions(+), 10 deletions(-)<br>
><br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
> index e330009..8e031d6 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
> @@ -760,10 +760,12 @@ struct amdgpu_ib {<br>
> uint32_t flags;<br>
> };<br>
> <br>
> +struct amdgpu_ctx;<br>
> +<br>
> extern const struct amd_sched_backend_ops
amdgpu_sched_ops;<br>
> <br>
> int amdgpu_job_alloc(struct amdgpu_device *adev,
unsigned num_ibs,<br>
> - struct amdgpu_job **job, struct
amdgpu_vm *vm);<br>
> + struct amdgpu_job **job, struct
amdgpu_vm *vm, struct <br>
> +amdgpu_ctx *ctx);<br>
> int amdgpu_job_alloc_with_ib(struct amdgpu_device
*adev, unsigned size,<br>
> struct amdgpu_job **job);<br>
> <br>
> @@ -802,6 +804,7 @@ struct amdgpu_ctx_mgr {<br>
> <br>
> struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv
*fpriv, uint32_t id);<br>
> int amdgpu_ctx_put(struct amdgpu_ctx *ctx);<br>
> +struct amdgpu_ctx *amdgpu_ctx_kref_get(struct
amdgpu_ctx *ctx);<br>
> <br>
> uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
struct amdgpu_ring *ring,<br>
> struct fence *fence);<br>
> @@ -1129,6 +1132,7 @@ struct amdgpu_job {<br>
> struct amdgpu_sync sync;<br>
> struct amdgpu_ib *ibs;<br>
> struct fence *fence; /* the hw fence
*/<br>
> + struct amdgpu_ctx *ctx;<br>
> uint32_t preamble_status;<br>
> uint32_t num_ibs;<br>
> void *owner;<br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
> index 699f5fe..267fb65 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
> @@ -234,7 +234,7 @@ int amdgpu_cs_parser_init(struct
amdgpu_cs_parser *p, void *data)<br>
> }<br>
> }<br>
> <br>
> - ret = amdgpu_job_alloc(p->adev, num_ibs,
&p->job, vm);<br>
> + ret = amdgpu_job_alloc(p->adev, num_ibs,
&p->job, vm, p->ctx);<br>
> if (ret)<br>
> goto free_all_kdata;<br>
> <br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
> index b4bbbb3..81438af 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
> @@ -25,6 +25,13 @@<br>
> #include <drm/drmP.h><br>
> #include "amdgpu.h"<br>
> <br>
> +struct amdgpu_ctx *amdgpu_ctx_kref_get(struct
amdgpu_ctx *ctx) {<br>
> + if (ctx)<br>
> + kref_get(&ctx->refcount);<br>
> + return ctx;<br>
> +}<br>
> +<br>
> static int amdgpu_ctx_init(struct amdgpu_device
*adev, struct amdgpu_ctx *ctx)<br>
> {<br>
> unsigned i, j;<br>
> @@ -56,6 +63,8 @@ static int amdgpu_ctx_init(struct
amdgpu_device *adev, struct amdgpu_ctx *ctx)<br>
> rq,
amdgpu_sched_jobs);<br>
> if (r)<br>
> goto failed;<br>
> +<br>
> + ctx->rings[i].entity.ptr_guilty =
&ctx->guilty; /* kernel entity <br>
> +doesn't have ptr_guilty */<br>
> }<br>
> <br>
> return 0;<br>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<br>
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<br>
> index 690ef3d..208da11 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<br>
> @@ -40,7 +40,7 @@ static void
amdgpu_job_timedout(struct amd_sched_job *s_job)<br>
> }<br>
> <br>
> int amdgpu_job_alloc(struct amdgpu_device *adev,
unsigned num_ibs,<br>
> - struct amdgpu_job **job, struct
amdgpu_vm *vm)<br>
> + struct amdgpu_job **job, struct
amdgpu_vm *vm, struct <br>
> +amdgpu_ctx *ctx)<br>
> {<br>
> size_t size = sizeof(struct amdgpu_job);<br>
> <br>
> @@ -57,6 +57,7 @@ int amdgpu_job_alloc(struct
amdgpu_device *adev, unsigned num_ibs,<br>
> (*job)->vm = vm;<br>
> (*job)->ibs = (void *)&(*job)[1];<br>
> (*job)->num_ibs = num_ibs;<br>
> + (*job)->ctx = amdgpu_ctx_kref_get(ctx);<br>
> <br>
> amdgpu_sync_create(&(*job)->sync);<br>
> <br>
> @@ -68,7 +69,7 @@ int amdgpu_job_alloc_with_ib(struct
amdgpu_device *adev, unsigned size,<br>
> {<br>
> int r;<br>
> <br>
> - r = amdgpu_job_alloc(adev, 1, job, NULL);<br>
> + r = amdgpu_job_alloc(adev, 1, job, NULL, NULL);<br>
> if (r)<br>
> return r;<br>
> <br>
> @@ -94,6 +95,10 @@ void
amdgpu_job_free_resources(struct amdgpu_job *job)<br>
> static void amdgpu_job_free_cb(struct amd_sched_job
*s_job)<br>
> {<br>
> struct amdgpu_job *job = container_of(s_job,
struct amdgpu_job, <br>
> base);<br>
> + struct amdgpu_ctx *ctx = job->ctx;<br>
> +<br>
> + if (ctx)<br>
> + amdgpu_ctx_put(ctx);<br>
> <br>
> fence_put(job->fence);<br>
> amdgpu_sync_free(&job->sync);<br>
> diff --git
a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
> index 6f4e31f..9100ca8 100644<br>
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
> @@ -208,12 +208,6 @@ void amd_sched_entity_fini(struct
amd_gpu_scheduler *sched,<br>
> if (!amd_sched_entity_is_initialized(sched,
entity))<br>
> return;<br>
> <br>
> - /**<br>
> - * The client will not queue more IBs during this
fini, consume existing<br>
> - * queued IBs<br>
> - */<br>
> - wait_event(sched->job_scheduled,
amd_sched_entity_is_idle(entity));<br>
> -<br>
> amd_sched_rq_remove_entity(rq, entity);<br>
> kfifo_free(&entity->job_queue);<br>
> }<br>
> diff --git
a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h<br>
> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h<br>
> index 8cb41d3..ccbbcb0 100644<br>
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h<br>
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h<br>
> @@ -49,6 +49,7 @@ struct amd_sched_entity {<br>
> <br>
> struct fence *dependency;<br>
> struct fence_cb cb;<br>
> + bool *ptr_guilty;<br>
> };<br>
> <br>
> /**<br>
<br>
<br>
_______________________________________________<br>
amd-gfx mailing list<br>
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
<a moz-do-not-send="true"
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
_______________________________________________<br>
amd-gfx mailing list<br>
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
<a moz-do-not-send="true"
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
</div>
</span></font>
</blockquote>
<br>
</body>
</html>