[Intel-gfx] [PATCH 3/3] drm/i915: Store a pointer to intel_context in i915_request
Chris Wilson
chris at chris-wilson.co.uk
Thu Apr 19 11:10:22 UTC 2018
Quoting Tvrtko Ursulin (2018-04-19 11:40:06)
>
> On 19/04/2018 08:17, Chris Wilson wrote:
> > To ease the frequent and ugly pointer dance of
> > &request->gem_context->engine[request->engine->id] during request
> > submission, store that pointer as request->hw_context. One major
> > advantage that we will exploit later is that this decouples the logical
> > context state from the engine itself.
>
> I can see merit in making all the places which work on engines or
> requests more logically and elegantly grab the engine specific context.
>
> Authors of current unmerged work will probably be not so thrilled though.
> > @@ -353,60 +346,56 @@ int intel_gvt_scan_and_shadow_workload(struct intel_vgpu_workload *workload)
> > struct intel_vgpu_submission *s = &vgpu->submission;
> > struct i915_gem_context *shadow_ctx = s->shadow_ctx;
> > struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
> > - int ring_id = workload->ring_id;
> > - struct intel_engine_cs *engine = dev_priv->engine[ring_id];
> > - struct intel_ring *ring;
> > + struct intel_engine_cs *engine = dev_priv->engine[workload->ring_id];
> > + struct intel_context *ce;
> > int ret;
> >
> > lockdep_assert_held(&dev_priv->drm.struct_mutex);
> >
> > - if (workload->shadowed)
> > + if (workload->req)
> > return 0;
>
> GVT team will need to look at this hunk/function.
...
> At which point I skip over all GVT and continue further down. :)
That code doesn't merit your attention (or ours really until it is
improved, it really is staging/ quality).
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 2fe779cab298..ed1c591ee866 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -125,16 +125,10 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
> > i915_ppgtt_put(ctx->ppgtt);
> >
> > for (i = 0; i < I915_NUM_ENGINES; i++) {
> > - struct intel_context *ce = &ctx->engine[i];
> > + struct intel_context *ce = &ctx->__engine[i];
> >
> > - if (!ce->state)
> > - continue;
> > -
> > - WARN_ON(ce->pin_count);
> > - if (ce->ring)
> > - intel_ring_free(ce->ring);
> > -
> > - __i915_gem_object_release_unless_active(ce->state->obj);
> > + if (ce->ops)
>
> Is ce->ops now the gate which was "if (!ce->state) continue" before, to
> avoid GEM_BUG_ON in execlists_context_destroy? I am wondering if this
> loop should not just be for_each_engine. Places which walk until
> I915_NUM_ENGINES should be very special and this one doesn't feel like
> one.
s/I915_NUM_ENGINES/ARRAY_SIZE(ctx->__engine)/ That's the semantic intent
here. That we aren't just thinking about engines, but the contents of
the context struct. We teardown the struct.
> > @@ -1010,9 +1018,12 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
> > */
> > rq = __i915_gem_active_peek(&engine->timeline->last_request);
> > if (rq)
> > - return rq->gem_context == kernel_context;
> > - else
> > - return engine->last_retired_context == kernel_context;
> > + return rq->gem_context == kctx;
> > +
> > + if (!engine->last_retired_context)
> > + return false;
> > +
> > + return engine->last_retired_context->gem_context == kctx;
>
> You thought this will be a bit more readable/obvious?
Hmm, I wrote this before I wrote to_intel_context(). If we use that, it
looks like
const struct intel_context *kernel_context =
to_intel_context(engine->i915->kernel_context, engine);
struct i915_request *rq;
rq = __i915_gem_active_peek(&engine->timeline->last_request);
if (rq)
return rq->hw_context == kernel_context;
else
return engine->last_retired_context == kernel_context;
which isn't as bad as the first pass was...
> > @@ -511,9 +511,7 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > {
> > struct intel_guc_client *client = guc->execbuf_client;
> > struct intel_engine_cs *engine = rq->engine;
> > - u32 ctx_desc =
> > - lower_32_bits(intel_lr_context_descriptor(rq->gem_context,
> > - engine));
> > + u32 ctx_desc = lower_32_bits(rq->hw_context->lrc_desc);
>
> Ha...
>
> > u32 ring_tail = intel_ring_set_tail(rq->ring, rq->tail) / sizeof(u64);
> >
> > spin_lock(&client->wq_lock);
> > @@ -551,8 +549,8 @@ static void inject_preempt_context(struct work_struct *work)
> > preempt_work[engine->id]);
> > struct intel_guc_client *client = guc->preempt_client;
> > struct guc_stage_desc *stage_desc = __get_stage_desc(client);
> > - u32 ctx_desc = lower_32_bits(intel_lr_context_descriptor(client->owner,
> > - engine));
> > + u32 ctx_desc = lower_32_bits(to_intel_context(client->owner,
> > + engine)->lrc_desc);
>
> ..hum.
Just pointing out that this code is horrible ;)
One of the goals of having rq->hw_context was to avoid the pointer
dance, such as exemplified by intel_lr_context_descriptor(). But here we
should just stick the ctx_desc in the intel_guc_client.
> > @@ -708,7 +706,7 @@ static void guc_dequeue(struct intel_engine_cs *engine)
> > struct i915_request *rq, *rn;
> >
> > list_for_each_entry_safe(rq, rn, &p->requests, sched.link) {
> > - if (last && rq->gem_context != last->gem_context) {
> > + if (last && rq->hw_context != last->hw_context) {
>
> Same thing really since this is already per engine.
Not always...
> > if (port == last_port) {
> > __list_del_many(&p->requests,
> > &rq->sched.link);
> > @@ -991,7 +989,8 @@ static void guc_fill_preempt_context(struct intel_guc *guc)
> > enum intel_engine_id id;
> >
> > for_each_engine(engine, dev_priv, id) {
> > - struct intel_context *ce = &client->owner->engine[id];
> > + struct intel_context *ce =
> > + to_intel_context(client->owner, engine);
> > u32 addr = intel_hws_preempt_done_address(engine);
> > u32 *cs;
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 0777226e65a6..e6725690b5b6 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -164,7 +164,8 @@
> > #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
> >
> > static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> > - struct intel_engine_cs *engine);
> > + struct intel_engine_cs *engine,
> > + struct intel_context *ce);
>
> Feels a bit redundant to pass in everything but OK.
You do remember which patch this was split out of ;)
> > {
> > return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
> > - i915_gem_context_force_single_submission(ctx));
> > + i915_gem_context_force_single_submission(ctx->gem_context));
> > }
>
> But the whole function change is again not needed I think.
>
> >
> > -static bool can_merge_ctx(const struct i915_gem_context *prev,
> > - const struct i915_gem_context *next)
> > +static bool can_merge_ctx(const struct intel_context *prev,
> > + const struct intel_context *next)
>
> This one neither.
Consider what it means. The rule is that we are not allowed to submit
the same lrc descriptor to subsequent ports; that corresponds to
the hw context, not gem_context. The payoff, aside from the better
semantics now, is in subsequent patches.
> > @@ -686,14 +687,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> > * the same context (even though a different
> > * request) to the second port.
> > */
> > - if (ctx_single_port_submission(last->gem_context) ||
> > - ctx_single_port_submission(rq->gem_context)) {
> > + if (ctx_single_port_submission(last->hw_context) ||
> > + ctx_single_port_submission(rq->hw_context)) {
> > __list_del_many(&p->requests,
> > &rq->sched.link);
> > goto done;
> > }
> >
> > - GEM_BUG_ON(last->gem_context == rq->gem_context);
> > + GEM_BUG_ON(last->hw_context == rq->hw_context);
>
> And if you agree with the above then these two hunks do not have to be
> in the patch.
Respectfully disagree as this is ground work for subsequent logical
contexts that float between engines.
> > +static struct intel_context *
> > +execlists_context_pin(struct intel_engine_cs *engine,
> > + struct i915_gem_context *ctx)
> > {
> > - struct intel_context *ce = &ctx->engine[engine->id];
> > + struct intel_context *ce = to_intel_context(ctx, engine);
> >
> > lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> > - GEM_BUG_ON(ce->pin_count == 0);
> > -
> > - if (--ce->pin_count)
> > - return;
> >
> > - intel_ring_unpin(ce->ring);
> > + if (likely(ce->pin_count++))
> > + return ce;
> > + GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
> >
> > - ce->state->obj->pin_global--;
> > - i915_gem_object_unpin_map(ce->state->obj);
> > - i915_vma_unpin(ce->state);
> > + ce->ops = &execlists_context_ops;
>
> If ops are set on pin it would be good to unset them on destroy.
No point on destroy, since it's being destroyed.
> Hm, or even set them not on pin but on creation.
engine->context_pin() is our intel_context factory, pin is the creation
function.
Definitely a bit asymmetrical, the key bit was having ce->unpin() so that
we didn't try to release the intel_context via a stale engine reference.
What I have been considering is passing intel_context to
i915_request_alloc() instead and trying to avoid that asymmetry by
managing the creation in the caller and separating it from pin().
> Not a huge deal but it is a
> bit unbalanced, and also if ce->ops is also a guard to distinguish
> possible from impossible engines then it is a bit more strange.
Not impossible engines in this regard, unused contexts.
> >
> > - i915_gem_context_put(ctx);
> > + return __execlists_context_pin(engine, ctx, ce);
> > }
> >
> > static int execlists_request_alloc(struct i915_request *request)
> > {
> > - struct intel_engine_cs *engine = request->engine;
> > - struct intel_context *ce = &request->gem_context->engine[engine->id];
> > - int ret;
> > + int err;
> >
> > - GEM_BUG_ON(!ce->pin_count);
> > + GEM_BUG_ON(!request->hw_context->pin_count);
> >
> > /* Flush enough space to reduce the likelihood of waiting after
> > * we start building the request - in which case we will just
> > @@ -1388,9 +1413,9 @@ static int execlists_request_alloc(struct i915_request *request)
> > */
> > request->reserved_space += EXECLISTS_REQUEST_SIZE;
> >
> > - ret = intel_ring_wait_for_space(request->ring, request->reserved_space);
> > - if (ret)
> > - return ret;
> > + err = intel_ring_wait_for_space(request->ring, request->reserved_space);
> > + if (err)
> > + return err;
>
> Rename of ret to err for a smaller diff is avoidable.
A couple of lines for trying to harmonise between ret and err.
> > @@ -1288,32 +1305,37 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
> >
> > i915_gem_context_get(ctx);
> >
> > -out:
> > /* One ringbuffer to rule them all */
> > - return engine->buffer;
> > + GEM_BUG_ON(!engine->buffer);
> > + ce->ring = engine->buffer;
> > +
> > + return ce;
> >
> > err:
> > ce->pin_count = 0;
>
> Strictly speaking this should be done in the caller.
Sssh. Tail call, think of it as a single contiguous function ;)
> > @@ -1323,10 +1345,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
> >
> > intel_engine_setup_common(engine);
> >
> > - err = intel_engine_init_common(engine);
> > - if (err)
> > - goto err;
> > -
> > ring = intel_engine_create_ring(engine, 32 * PAGE_SIZE);
> > if (IS_ERR(ring)) {
> > err = PTR_ERR(ring);
> > @@ -1341,8 +1359,14 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
> > GEM_BUG_ON(engine->buffer);
> > engine->buffer = ring;
> >
> > + err = intel_engine_init_common(engine);
> > + if (err)
> > + goto err_unpin;
> > +
>
> This ordering change doesn't look like it has to be done in this patch.
It does. Do you want to take another look ;)
The problem here is the legacy ringbuffer submission uses a common
engine->buffer, and the kernel contexts are pinned in init_common; so we
have to create that buffer first.
> > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > index e6d4b882599a..47a9de741c5f 100644
> > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > @@ -243,6 +243,11 @@ struct drm_i915_private *mock_gem_device(void)
> > if (!i915->kernel_context)
> > goto err_engine;
> >
> > + /* Fake the initial load for the powersaving context */
> > + i915->engine[RCS]->last_retired_context =
> > + intel_context_pin(i915->kernel_context, i915->engine[RCS]);
> > + GEM_BUG_ON(IS_ERR_OR_NULL(i915->engine[RCS]->last_retired_context));
>
> What is this?
Keeping selftests happy. I was asserting engine->last_retired_context was
set chasing ce->ops and found an instance where it wasn't which caused
this patch to blow up. And I had I written
intel_engine_has_kernel_context() differently, this chunk wouldn't have
been required.
-Chris
More information about the Intel-gfx
mailing list