<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 2017年05月03日 14:02, Liu, Monk wrote:<br>
    </div>
    <blockquote
cite="mid:DM5PR12MB161082763FA0163FF22E1C1F84160@DM5PR12MB1610.namprd12.prod.outlook.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <meta name="Generator" content="Microsoft Exchange Server">
      <!-- converted from text -->
      <style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
      <font size="2"><span style="font-size:10pt;">
          <div class="PlainText">You can add ctx as filed of job, but
            not get reference of it, when you try to use ctx, just check
            if ctx == NULL.<br>
            > that doesn't work at all... job->ctx will always be
            non-NULL after it is initialized, you just refer to a wild
            pointer after CTX released <br>
          </div>
        </span></font></blockquote>
    <font size="2">job->ctx is a **ctx, which could resolve your this
      concern.<br>
      <br>
    </font>
    <blockquote
cite="mid:DM5PR12MB161082763FA0163FF22E1C1F84160@DM5PR12MB1610.namprd12.prod.outlook.com"
      type="cite"><font size="2"><span style="font-size:10pt;">
          <div class="PlainText">
            <br>
            Another stupid method:<br>
            Use idr_for_each_entry(..job->vm->ctx_mgr...) and
            compare the job->fence->fence_context with
            ctx->ring[].entity->fence_context. if not found, then
            ctx is freed, otherwise you can do your things for this ctx.<br>
            > 1) fence_context have chance to incorrectly represent
            the context behind it, because the number can be used up and
            will wrapped around from beginning <br>
          </div>
        </span></font></blockquote>
    <font size="2">No, fence_context is unique in global in kernel.<br>
      <br>
      Regards,<br>
      David Zhou<br>
    </font>
    <blockquote
cite="mid:DM5PR12MB161082763FA0163FF22E1C1F84160@DM5PR12MB1610.namprd12.prod.outlook.com"
      type="cite"><font size="2"><span style="font-size:10pt;">
          <div class="PlainText">
               2) why not just keep CTX alive, which is much simple than
            this method <br>
            <br>
            BR<br>
            <br>
            -----Original Message-----<br>
            From: Zhou, David(ChunMing) <br>
            Sent: Wednesday, May 03, 2017 12:54 PM<br>
            To: Liu, Monk <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>; Liu, Monk
            <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>; Christian König
            <a class="moz-txt-link-rfc2396E" href="mailto:deathsimple@vodafone.de"><deathsimple@vodafone.de></a>;
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
            Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all
            job finished<br>
            <br>
            You can add ctx as filed of job, but not get reference of
            it, when you try to use ctx, just check if ctx == NULL. <br>
            <br>
            Another stupid method:<br>
            Use idr_for_each_entry(..job->vm->ctx_mgr...) and
            compare the job->fence->fence_context with
            ctx->ring[].entity->fence_context. if not found, then
            ctx is freed, otherwise you can do your things for this ctx.<br>
            <br>
            Regards,<br>
            David Zhou<br>
            <br>
            -----Original Message-----<br>
            From: amd-gfx [<a moz-do-not-send="true"
              href="mailto:amd-gfx-bounces@lists.freedesktop.org">mailto:amd-gfx-bounces@lists.freedesktop.org</a>]
            On Behalf Of Liu, Monk<br>
            Sent: Wednesday, May 03, 2017 11:57 AM<br>
            To: Liu, Monk <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>; Christian König
            <a class="moz-txt-link-rfc2396E" href="mailto:deathsimple@vodafone.de"><deathsimple@vodafone.de></a>;
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
            Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all
            job finished<br>
            <br>
            @Christian,<br>
            <br>
            The thing is I need mark all entities behind this timeout
            job as guilty, so I use a member in entity named
            "ptr_guilty", to point to The member in amdgpu_ctx named
            "guilty", that way read *entity->ptr_guilty can get you
            if this entity is "invalid", and set *entity->ptr_guilty
            Can let you set all entities belongs to the context as
            "invalid".<br>
            <br>
            Above logic is to guarantee we can kick out all guilty
            entities in entity kfifo, and also block all IOCTL with a
            ctx handle point to this guilty context, And we only
            recovery other jobs/entities/context after scheduler
            unparked <br>
            <br>
            If you reject the patch to keep ctx alive till all jobs
            signaled, please give me a solution to satisfy above logic <br>
            <br>
            BR Monk<br>
            <br>
            -----Original Message-----<br>
            From: amd-gfx [<a moz-do-not-send="true"
              href="mailto:amd-gfx-bounces@lists.freedesktop.org">mailto:amd-gfx-bounces@lists.freedesktop.org</a>]
            On Behalf Of Liu, Monk<br>
            Sent: Wednesday, May 03, 2017 11:31 AM<br>
            To: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:deathsimple@vodafone.de"><deathsimple@vodafone.de></a>;
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
            Subject: RE: [PATCH 1/5] drm/amdgpu:keep ctx alive till all
            job finished<br>
            <br>
            1, This is necessary otherwise how can I access entity
            pointer after a job timedout , and why it is dangerous ?<br>
            2, what's the status field in the fences you were referring
            to ? I need to judge if it could satisfy my requirement <br>
            <br>
            <br>
            <br>
            -----Original Message-----<br>
            From: Christian König [<a moz-do-not-send="true"
              href="mailto:deathsimple@vodafone.de">mailto:deathsimple@vodafone.de</a>]<br>
            Sent: Monday, May 01, 2017 10:48 PM<br>
            To: Liu, Monk <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a>;
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
            Subject: Re: [PATCH 1/5] drm/amdgpu:keep ctx alive till all
            job finished<br>
            <br>
            Am 01.05.2017 um 09:22 schrieb Monk Liu:<br>
            > for TDR guilty context feature, we need access
            ctx/s_entity field <br>
            > member through sched job pointer,so ctx must keep alive
            till all job <br>
            > from it signaled.<br>
            <br>
            NAK, that is unnecessary and quite dangerous.<br>
            <br>
            Instead we have the designed status field in the fences
            which should be checked for that.<br>
            <br>
            Regards,<br>
            Christian.<br>
            <br>
            ><br>
            > Change-Id: Ib87e9502f7a5c8c054c7e56956d7f7ad75998e43<br>
            > Signed-off-by: Monk Liu <a class="moz-txt-link-rfc2396E" href="mailto:Monk.Liu@amd.com"><Monk.Liu@amd.com></a><br>
            > ---<br>
            >   drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 6
            +++++-<br>
            >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        | 2 +-<br>
            >   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 9
            +++++++++<br>
            >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       | 9
            +++++++--<br>
            >   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 6
            ------<br>
            >   drivers/gpu/drm/amd/scheduler/gpu_scheduler.h | 1 +<br>
            >   6 files changed, 23 insertions(+), 10 deletions(-)<br>
            ><br>
            > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
            > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
            > index e330009..8e031d6 100644<br>
            > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
            > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h<br>
            > @@ -760,10 +760,12 @@ struct amdgpu_ib {<br>
            >        uint32_t                        flags;<br>
            >   };<br>
            >   <br>
            > +struct amdgpu_ctx;<br>
            > +<br>
            >   extern const struct amd_sched_backend_ops
            amdgpu_sched_ops;<br>
            >   <br>
            >   int amdgpu_job_alloc(struct amdgpu_device *adev,
            unsigned num_ibs,<br>
            > -                  struct amdgpu_job **job, struct
            amdgpu_vm *vm);<br>
            > +                  struct amdgpu_job **job, struct
            amdgpu_vm *vm, struct <br>
            > +amdgpu_ctx *ctx);<br>
            >   int amdgpu_job_alloc_with_ib(struct amdgpu_device
            *adev, unsigned size,<br>
            >                             struct amdgpu_job **job);<br>
            >   <br>
            > @@ -802,6 +804,7 @@ struct amdgpu_ctx_mgr {<br>
            >   <br>
            >   struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv
            *fpriv, uint32_t id);<br>
            >   int amdgpu_ctx_put(struct amdgpu_ctx *ctx);<br>
            > +struct amdgpu_ctx *amdgpu_ctx_kref_get(struct
            amdgpu_ctx *ctx);<br>
            >   <br>
            >   uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx,
            struct amdgpu_ring *ring,<br>
            >                              struct fence *fence);<br>
            > @@ -1129,6 +1132,7 @@ struct amdgpu_job {<br>
            >        struct amdgpu_sync      sync;<br>
            >        struct amdgpu_ib        *ibs;<br>
            >        struct fence            *fence; /* the hw fence
            */<br>
            > +     struct amdgpu_ctx *ctx;<br>
            >        uint32_t                preamble_status;<br>
            >        uint32_t                num_ibs;<br>
            >        void                    *owner;<br>
            > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
            > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
            > index 699f5fe..267fb65 100644<br>
            > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
            > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
            > @@ -234,7 +234,7 @@ int amdgpu_cs_parser_init(struct
            amdgpu_cs_parser *p, void *data)<br>
            >                }<br>
            >        }<br>
            >   <br>
            > -     ret = amdgpu_job_alloc(p->adev, num_ibs,
            &p->job, vm);<br>
            > +     ret = amdgpu_job_alloc(p->adev, num_ibs,
            &p->job, vm, p->ctx);<br>
            >        if (ret)<br>
            >                goto free_all_kdata;<br>
            >   <br>
            > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
            > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
            > index b4bbbb3..81438af 100644<br>
            > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
            > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
            > @@ -25,6 +25,13 @@<br>
            >   #include <drm/drmP.h><br>
            >   #include "amdgpu.h"<br>
            >   <br>
            > +struct amdgpu_ctx *amdgpu_ctx_kref_get(struct
            amdgpu_ctx *ctx) {<br>
            > +     if (ctx)<br>
            > +             kref_get(&ctx->refcount);<br>
            > +     return ctx;<br>
            > +}<br>
            > +<br>
            >   static int amdgpu_ctx_init(struct amdgpu_device
            *adev, struct amdgpu_ctx *ctx)<br>
            >   {<br>
            >        unsigned i, j;<br>
            > @@ -56,6 +63,8 @@ static int amdgpu_ctx_init(struct
            amdgpu_device *adev, struct amdgpu_ctx *ctx)<br>
            >                                          rq,
            amdgpu_sched_jobs);<br>
            >                if (r)<br>
            >                        goto failed;<br>
            > +<br>
            > +             ctx->rings[i].entity.ptr_guilty =
            &ctx->guilty; /* kernel entity <br>
            > +doesn't have ptr_guilty */<br>
            >        }<br>
            >   <br>
            >        return 0;<br>
            > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<br>
            > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<br>
            > index 690ef3d..208da11 100644<br>
            > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<br>
            > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<br>
            > @@ -40,7 +40,7 @@ static void
            amdgpu_job_timedout(struct amd_sched_job *s_job)<br>
            >   }<br>
            >   <br>
            >   int amdgpu_job_alloc(struct amdgpu_device *adev,
            unsigned num_ibs,<br>
            > -                  struct amdgpu_job **job, struct
            amdgpu_vm *vm)<br>
            > +                  struct amdgpu_job **job, struct
            amdgpu_vm *vm, struct <br>
            > +amdgpu_ctx *ctx)<br>
            >   {<br>
            >        size_t size = sizeof(struct amdgpu_job);<br>
            >   <br>
            > @@ -57,6 +57,7 @@ int amdgpu_job_alloc(struct
            amdgpu_device *adev, unsigned num_ibs,<br>
            >        (*job)->vm = vm;<br>
            >        (*job)->ibs = (void *)&(*job)[1];<br>
            >        (*job)->num_ibs = num_ibs;<br>
            > +     (*job)->ctx = amdgpu_ctx_kref_get(ctx);<br>
            >   <br>
            >        amdgpu_sync_create(&(*job)->sync);<br>
            >   <br>
            > @@ -68,7 +69,7 @@ int amdgpu_job_alloc_with_ib(struct
            amdgpu_device *adev, unsigned size,<br>
            >   {<br>
            >        int r;<br>
            >   <br>
            > -     r = amdgpu_job_alloc(adev, 1, job, NULL);<br>
            > +     r = amdgpu_job_alloc(adev, 1, job, NULL, NULL);<br>
            >        if (r)<br>
            >                return r;<br>
            >   <br>
            > @@ -94,6 +95,10 @@ void
            amdgpu_job_free_resources(struct amdgpu_job *job)<br>
            >   static void amdgpu_job_free_cb(struct amd_sched_job
            *s_job)<br>
            >   {<br>
            >        struct amdgpu_job *job = container_of(s_job,
            struct amdgpu_job, <br>
            > base);<br>
            > +     struct amdgpu_ctx *ctx = job->ctx;<br>
            > +<br>
            > +     if (ctx)<br>
            > +             amdgpu_ctx_put(ctx);<br>
            >   <br>
            >        fence_put(job->fence);<br>
            >        amdgpu_sync_free(&job->sync);<br>
            > diff --git
            a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
            > b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
            > index 6f4e31f..9100ca8 100644<br>
            > --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
            > +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c<br>
            > @@ -208,12 +208,6 @@ void amd_sched_entity_fini(struct
            amd_gpu_scheduler *sched,<br>
            >        if (!amd_sched_entity_is_initialized(sched,
            entity))<br>
            >                return;<br>
            >   <br>
            > -     /**<br>
            > -      * The client will not queue more IBs during this
            fini, consume existing<br>
            > -      * queued IBs<br>
            > -     */<br>
            > -     wait_event(sched->job_scheduled,
            amd_sched_entity_is_idle(entity));<br>
            > -<br>
            >        amd_sched_rq_remove_entity(rq, entity);<br>
            >        kfifo_free(&entity->job_queue);<br>
            >   }<br>
            > diff --git
            a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h<br>
            > b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h<br>
            > index 8cb41d3..ccbbcb0 100644<br>
            > --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h<br>
            > +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h<br>
            > @@ -49,6 +49,7 @@ struct amd_sched_entity {<br>
            >   <br>
            >        struct fence                    *dependency;<br>
            >        struct fence_cb                 cb;<br>
            > +     bool *ptr_guilty;<br>
            >   };<br>
            >   <br>
            >   /**<br>
            <br>
            <br>
            _______________________________________________<br>
            amd-gfx mailing list<br>
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
            <a moz-do-not-send="true"
              href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
            _______________________________________________<br>
            amd-gfx mailing list<br>
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
            <a moz-do-not-send="true"
              href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
          </div>
        </span></font>
    </blockquote>
    <br>
  </body>
</html>