[Intel-gfx] [PATCH 20/27] drm/i915: Remove logical HW ID
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Sep 25 14:38:59 UTC 2019
On 25/09/2019 13:51, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-09-25 13:41:10)
>> [+ Daniele, I think he might want to have a look at this.]
>>
>> On 25/09/2019 11:01, Chris Wilson wrote:
>>> With the introduction of ctx->engines[] we allow multiple logical
>>> contexts to be used on the same engine (e.g. with virtual engines). Each
>>> logical context requires a unique tag in order for context-switching to
>>> occur correctly between them.
>>>
>>> We only need to keep a unique tag for the active lifetime of the
>>> context, and for as long as we need to identify that context. The HW
>>> uses the tag to determine if it should use a lite-restore (why not the
>>> LRCA?) and passes the tag back for various status identifies. The only
>>> status we need to track is for OA, so when using perf, we assign the
>>> specific context a unique tag.
>>>
>>> Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines")
>>
>> Fixes? Why?
>
> The above paragraph explains that as we give two distinct contexts the
> same tag then it will only perform (according to bspec) a lite-restore.
Right I see. But not with virtual engines per se, but even if someone
created an engine map with two same engines in it. Virtual engines are
then a super set of this problem.
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> index da4b45aa84b1..7bce176e696c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
>>> @@ -296,10 +296,12 @@ struct intel_engine_cs {
>>> u8 uabi_class;
>>> u8 uabi_instance;
>>>
>>> + u32 uabi_capabilities;
>>> u32 context_size;
>>> u32 mmio_base;
>>>
>>> - u32 uabi_capabilities;
>>> + unsigned int context_tag;
>>> +#define NUM_CONTEXT_TAG (256)
>>
>> AFAICT this define now defines how deep ELSP we can have before we start
>> getting hard to debug problems. GuC angle needs considering here.
>
> The guc is immaterial here, since they use their own mechanism. It just
> needs to be a value larger than 2 * EXECLIST_MAX_PORTS and less then
> BIT(10).
I don't know how GuC (will) work so hopefully Daniele can confirm or
deny this won't be a problem there.
>
>>> @@ -978,6 +965,15 @@ __execlists_schedule_in(struct i915_request *rq)
>>>
>>> intel_context_get(ce);
>>>
>>> + if (ce->tag) {
>>> + ce->lrc_desc |= (u64)ce->tag << 32;
>>> + } else {
>>> + ce->lrc_desc &= ~GENMASK_ULL(47, 37);
>>> + ce->lrc_desc |=
>>> + (u64)(engine->context_tag++ % NUM_CONTEXT_TAG) <<
>>> + GEN11_SW_CTX_ID_SHIFT;
>>> + }
>>
>> So hw id is valid only while context is in ELSP and it changes with
>> every submission except in the OA case?
>
> Aye. OA being a pita.
>
>> Shifts and masks look dodgy. For >= gen11 the current code has the hw_id
>> in [37, 42], otherwise [32, 52]. This patch does not seem to handle the
>> differences between gens.
>
> Because the values we supply do not matter (they are cookies), just the
> lifetime.
Okay so you are choosing to not use some of the available bits since we
don't need all 20 (pre gen11), or even 10 (gen11+). This definitely
deserves a comment. And another assert with NUM_CONTEXT_TAG relating to
available bits.
>
>>> +
>>> intel_gt_pm_get(engine->gt);
>>> execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
>>> intel_engine_context_in(engine);
>>> @@ -2224,7 +2220,6 @@ static void execlists_context_unpin(struct intel_context *ce)
>>> check_redzone((void *)ce->lrc_reg_state - LRC_STATE_PN * PAGE_SIZE,
>>> ce->engine);
>>>
>>> - i915_gem_context_unpin_hw_id(ce->gem_context);
>>> i915_gem_object_unpin_map(ce->state->obj);
>>> intel_ring_reset(ce->ring, ce->ring->tail);
>>> }
>>> @@ -2274,18 +2269,12 @@ __execlists_context_pin(struct intel_context *ce,
>>> goto unpin_active;
>>> }
>>>
>>> - ret = i915_gem_context_pin_hw_id(ce->gem_context);
>>> - if (ret)
>>> - goto unpin_map;
>>> -
>>> ce->lrc_desc = lrc_descriptor(ce, engine);
>>> ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>>> __execlists_update_reg_state(ce, engine);
>>>
>>> return 0;
>>>
>>> -unpin_map:
>>> - i915_gem_object_unpin_map(ce->state->obj);
>>> unpin_active:
>>> intel_context_active_release(ce);
>>> err:
>>> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> index 343d79c1cb7e..04a5a0d90823 100644
>>> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> @@ -1564,27 +1564,10 @@ vgpu_id_show(struct device *dev, struct device_attribute *attr,
>>> return sprintf(buf, "\n");
>>> }
>>>
>>> -static ssize_t
>>> -hw_id_show(struct device *dev, struct device_attribute *attr,
>>> - char *buf)
>>> -{
>>> - struct mdev_device *mdev = mdev_from_dev(dev);
>>> -
>>> - if (mdev) {
>>> - struct intel_vgpu *vgpu = (struct intel_vgpu *)
>>> - mdev_get_drvdata(mdev);
>>> - return sprintf(buf, "%u\n",
>>> - vgpu->submission.shadow[0]->gem_context->hw_id);
>>> - }
>>> - return sprintf(buf, "\n");
>>> -}
>>> -
>>> static DEVICE_ATTR_RO(vgpu_id);
>>> -static DEVICE_ATTR_RO(hw_id);
>>>
>>> static struct attribute *intel_vgpu_attrs[] = {
>>> &dev_attr_vgpu_id.attr,
>>> - &dev_attr_hw_id.attr,
>>> NULL
>>> };
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index fd41505d52ec..8caaa446490f 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -1504,9 +1504,6 @@ static int i915_context_status(struct seq_file *m, void *unused)
>>> struct intel_context *ce;
>>>
>>> seq_puts(m, "HW context ");
>>> - if (!list_empty(&ctx->hw_id_link))
>>> - seq_printf(m, "%x [pin %u]", ctx->hw_id,
>>> - atomic_read(&ctx->hw_id_pin_count));
>>> if (ctx->pid) {
>>> struct task_struct *task;
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index a86d28bda6dd..47239df653f2 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -471,9 +471,9 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
>>> const char *header,
>>> const struct drm_i915_error_context *ctx)
>>> {
>>> - err_printf(m, "%s%s[%d] hw_id %d, prio %d, guilty %d active %d\n",
>>> - header, ctx->comm, ctx->pid, ctx->hw_id,
>>> - ctx->sched_attr.priority, ctx->guilty, ctx->active);
>>> + err_printf(m, "%s%s[%d] prio %d, guilty %d active %d\n",
>>> + header, ctx->comm, ctx->pid, ctx->sched_attr.priority,
>>> + ctx->guilty, ctx->active);
>>> }
>>>
>>> static void error_print_engine(struct drm_i915_error_state_buf *m,
>>> @@ -1262,7 +1262,6 @@ static bool record_context(struct drm_i915_error_context *e,
>>> rcu_read_unlock();
>>> }
>>>
>>> - e->hw_id = ctx->hw_id;
>>> e->sched_attr = ctx->sched;
>>> e->guilty = atomic_read(&ctx->guilty_count);
>>> e->active = atomic_read(&ctx->active_count);
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
>>> index caaed5093d95..4dc36d6ee3a2 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
>>> @@ -117,7 +117,6 @@ struct i915_gpu_state {
>>> struct drm_i915_error_context {
>>> char comm[TASK_COMM_LEN];
>>> pid_t pid;
>>> - u32 hw_id;
>>> int active;
>>> int guilty;
>>> struct i915_sched_attr sched_attr;
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>> index 80055501eccb..d36ba248d438 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -1283,22 +1283,15 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>> } else {
>>> stream->specific_ctx_id_mask =
>>> (1U << GEN8_CTX_ID_WIDTH) - 1;
>>> - stream->specific_ctx_id =
>>> - upper_32_bits(ce->lrc_desc);
>>> - stream->specific_ctx_id &=
>>> - stream->specific_ctx_id_mask;
>>> + stream->specific_ctx_id = stream->specific_ctx_id_mask;
>>> }
>>> break;
>>>
>>> case 11:
>>> case 12: {
>>> stream->specific_ctx_id_mask =
>>> - ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32) |
>>> - ((1U << GEN11_ENGINE_INSTANCE_WIDTH) - 1) << (GEN11_ENGINE_INSTANCE_SHIFT - 32) |
>>> - ((1 << GEN11_ENGINE_CLASS_WIDTH) - 1) << (GEN11_ENGINE_CLASS_SHIFT - 32);
>>> - stream->specific_ctx_id = upper_32_bits(ce->lrc_desc);
>>> - stream->specific_ctx_id &=
>>> - stream->specific_ctx_id_mask;
>>> + ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32);
>>> + stream->specific_ctx_id = stream->specific_ctx_id_mask;
>>> break;
>>> }
>>>
>>> @@ -1306,6 +1299,8 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>>> MISSING_CASE(INTEL_GEN(i915));
>>> }
>>>
>>> + ce->tag = stream->specific_ctx_id_mask;
>>> +
>>> DRM_DEBUG_DRIVER("filtering on ctx_id=0x%x ctx_id_mask=0x%x\n",
>>> stream->specific_ctx_id,
>>> stream->specific_ctx_id_mask);
>>> @@ -1324,12 +1319,14 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>>> {
>>> struct intel_context *ce;
>>>
>>> - stream->specific_ctx_id = INVALID_CTX_ID;
>>> - stream->specific_ctx_id_mask = 0;
>>> -
>>> ce = fetch_and_zero(&stream->pinned_ctx);
>>> - if (ce)
>>> + if (ce) {
>>> + ce->tag = 0;
>>> intel_context_unpin(ce);
>>> + }
>>> +
>>> + stream->specific_ctx_id = INVALID_CTX_ID;
>>> + stream->specific_ctx_id_mask = 0;
>>
>> OA hunks looks dodgy as well. ce->tag is set to
>> stream->specific_ctx_id_mask while I think it should be id. Furthermore
>> id is assigned from mask which is fixed and not unique for different
>> contexts.
>
> It is singular in the entire system. The mask is a guaranteed unique
> value.
What is singular? The OA context and it's id? I assumed what needs to be
done is fix hw_id for _all_ contexts while OA is sampling. But on a more
detailed look this assumption does not seem to hold in code. Okay, I
declare to not know how OA works well enough.
>
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> index 1c9db08f7c28..ac1ff558eb90 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
>>> @@ -170,7 +170,7 @@ static int igt_vma_create(void *arg)
>>> }
>>>
>>> nc = 0;
>>> - for_each_prime_number(num_ctx, MAX_CONTEXT_HW_ID) {
>>> + for_each_prime_number(num_ctx, 2 * NUM_CONTEXT_TAG) {
>>> for (; nc < num_ctx; nc++) {
>>> ctx = mock_context(i915, "mock");
>>> if (!ctx)
>>>
>>
>> Unless I am missing something I see this patch as simplification and not
>> a fix. If we can get away with it in the GuC world then it's quite a big
>> simplification so it's fine by me.
>
> Bspec says that for lrc_desc with matching tags (at least for gen8-gen11
> I believe), it performs a lite restore (regardless of lrca). Ergo it is
> quite a major fix.
Yes I see that now, my bad. This one will not lend itself well to
backporting.. But I don't think there is a lighter-weight solution.
Moving hw_id to intel_context would also end up large-ish I think.
Because of the locking requirement etc.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list