[Intel-gfx] [RFC] drm/i915/guc: New GuC stage descriptors
Daniele Ceraolo Spurio
daniele.ceraolospurio at intel.com
Thu Oct 18 21:07:32 UTC 2018
On 17/10/18 11:42, Lis, Tomasz wrote:
>
>
> On 2018-10-12 20:25, Daniele Ceraolo Spurio wrote:
>> With the new interface, GuC now requires every lrc to be registered in
>> one of the stage descriptors, which have been re-designed so that each
>> descriptor can store up to 64 lrc per class (i.e. equal to the possible
>> SW counter values).
>> Similarly to what happened with the previous legacy design, it is
>> possible
>> to have a single "proxy" descriptor that owns the workqueue and the
>> doorbell and use it for all submission. To distinguish the proxy
>> descriptors from the one used for lrc storage, the latter have been
>> called "principal". A descriptor can't be both a proxy and a principal
>> at the same time; to enforce this, since we only use 1 proxy descriptor
>> per client, we reserve enough descriptor from the bottom of the pool to
>> be used as proxy and leave the others as principals. For simplicity, we
>> currently map context IDs 1:1 to principal descriptors, but we could
>> have more contexts in flight if needed by using the SW counter.
> Does this have any influence on the concept of "default context" used by
> UMDs?
> Or is this completely separate?
No relationship at the moment, we only map gem_context to stage_desc, we
don't care about who those contexts belong to or if they're default or
not. We could potentially use an approach where each file descriptor
gets a stage descriptor and all the contexts opened by that FD use the
same stage_desc with different SW counters, but that would limit both
the number of open FDs and the number of contexts per FD, so doesn't
seem like a good idea IMO.
>> Note that the lrcs need to be mapped in the principal descriptor until
>> guc is done with them. This means that we can't release the HW id when
>> the user app closes the ctx because it might still be in flight with GuC
>> and that we need to be careful when unpinning because the fact that the
> s/the//
>> a request on the next context has completed doesn't mean that GuC is
>> done processing the first one. See in-code comments for details.
>>
>> NOTE: GuC is not going to look at lrcs that are not in flight, so we
>> could potentially skip the unpinning steps. However, the unpinnig steps
>> perform extra correctness check so better keep them until we're sure
>> that the flow is solid.
>>
>> Based on an initial patch by Oscar Mateo.
>>
>> RFC: this will be sent as part of the updated series once we have
>> the gen9 FW with the new interface, but since the flow is
>> significantly different compared to the previous version I'd like
>> to gather some early feedback to make sure there aren't any major
>> issues.
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Michel Thierry <michel.thierry at intel.com>
>> Cc: Michal Winiarski <michal.winiarski at intel.com>
>> Cc: Tomasz Lis <tomasz.lis at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_debugfs.c | 30 +-
>> drivers/gpu/drm/i915/i915_drv.h | 5 +-
>> drivers/gpu/drm/i915/i915_gem_context.c | 9 +-
>> drivers/gpu/drm/i915/intel_guc.h | 14 +-
>> drivers/gpu/drm/i915/intel_guc_fwif.h | 73 +++--
>> drivers/gpu/drm/i915/intel_guc_submission.c | 346 +++++++++++++++-----
>> drivers/gpu/drm/i915/intel_lrc.c | 18 +-
>> drivers/gpu/drm/i915/intel_lrc.h | 7 +
>> drivers/gpu/drm/i915/selftests/intel_guc.c | 4 +-
>> 9 files changed, 360 insertions(+), 146 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c
>> b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 00c551d3e409..04bbde4a38a6 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -2474,11 +2474,10 @@ static int i915_guc_stage_pool(struct seq_file
>> *m, void *data)
>> seq_printf(m, "GuC stage descriptor %u:\n", index);
>> seq_printf(m, "\tIndex: %u\n", desc->stage_id);
>> + seq_printf(m, "\tProxy Index: %u\n", desc->proxy_id);
>> seq_printf(m, "\tAttribute: 0x%x\n", desc->attribute);
>> seq_printf(m, "\tPriority: %d\n", desc->priority);
>> seq_printf(m, "\tDoorbell id: %d\n", desc->db_id);
>> - seq_printf(m, "\tEngines used: 0x%x\n",
>> - desc->engines_used);
>> seq_printf(m, "\tDoorbell trigger phy: 0x%llx, cpu: 0x%llx,
>> uK: 0x%x\n",
>> desc->db_trigger_phy,
>> desc->db_trigger_cpu,
>> @@ -2490,18 +2489,21 @@ static int i915_guc_stage_pool(struct seq_file
>> *m, void *data)
>> seq_putc(m, '\n');
>> for_each_engine_masked(engine, dev_priv, client->engines,
>> tmp) {
>> - u32 guc_engine_id = engine->guc_id;
>> - struct guc_execlist_context *lrc =
>> - &desc->lrc[guc_engine_id];
>> -
>> - seq_printf(m, "\t%s LRC:\n", engine->name);
>> - seq_printf(m, "\t\tContext desc: 0x%x\n",
>> - lrc->context_desc);
>> - seq_printf(m, "\t\tContext id: 0x%x\n", lrc->context_id);
>> - seq_printf(m, "\t\tLRCA: 0x%x\n", lrc->ring_lrca);
>> - seq_printf(m, "\t\tRing begin: 0x%x\n", lrc->ring_begin);
>> - seq_printf(m, "\t\tRing end: 0x%x\n", lrc->ring_end);
>> - seq_putc(m, '\n');
>> + u8 class = engine->class;
>> + u8 inst = engine->instance;
>> +
>> + if (desc->lrc_alloc_map[class].bitmap & BIT(inst)) {
>> + struct guc_execlist_context *lrc =
>> + &desc->lrc[class][inst];
>> + seq_printf(m, "\t%s LRC:\n", engine->name);
>> + seq_printf(m, "\t\tHW context desc: 0x%x:0x%x\n",
>> + lower_32_bits(lrc->hw_context_desc),
>> + upper_32_bits(lrc->hw_context_desc));
>> + seq_printf(m, "\t\tLRC: 0x%x\n", lrc->ring_lrc);
>> + seq_printf(m, "\t\tRing begin: 0x%x\n",
>> lrc->ring_begin);
>> + seq_printf(m, "\t\tRing end: 0x%x\n", lrc->ring_end);
>> + seq_putc(m, '\n');
>> + }
>> }
>> }
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index bd76931987ef..ce095d57e050 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1878,13 +1878,14 @@ struct drm_i915_private {
>> * (the SW Context ID field) but GuC limits it further so
>> * without taking advantage of part of the SW counter field the
>> * firmware only supports a max number of contexts equal to the
>> - * number of entries in the GuC stage descriptor pool.
>> + * number of entries in the GuC stage descriptor pool, minus
>> + * the descriptors reserved for proxy usage
>> */
>> struct ida hw_ida;
>> #define MAX_CONTEXT_HW_ID (1<<21) /* exclusive */
>> #define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */
>> #define GEN11_MAX_CONTEXT_HW_ID (1<<11) /* exclusive */
>> -#define GEN11_MAX_CONTEXT_HW_ID_WITH_GUC GUC_MAX_STAGE_DESCRIPTORS
>> +#define GEN11_MAX_CONTEXT_HW_ID_WITH_GUC GUC_MAX_PPAL_STAGE_DESCRIPTORS
>> struct list_head hw_id_list;
>> } contexts;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 552d2e108de4..c239d9b9307c 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -284,10 +284,15 @@ static void context_close(struct
>> i915_gem_context *ctx)
>> i915_gem_context_set_closed(ctx);
>> /*
>> - * This context will never again be assinged to HW, so we can
>> + * This context will never again be assigned to HW, so we can
>> * reuse its ID for the next context.
>> + *
>> + * if GuC is in use, we need to keep the ID until GuC has finished
>> + * processing all submitted requests because the ID is used by the
>> + * firmware to index the guc stage_desc pool.
>> */
>> - release_hw_id(ctx);
>> + if (!USES_GUC_SUBMISSION(ctx->i915))
>> + release_hw_id(ctx);
>> /*
>> * The LUT uses the VMA as a backpointer to unref the object,
>> diff --git a/drivers/gpu/drm/i915/intel_guc.h
>> b/drivers/gpu/drm/i915/intel_guc.h
>> index 11b3882482f4..05ee44fb66af 100644
>> --- a/drivers/gpu/drm/i915/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>> @@ -58,10 +58,14 @@ struct intel_guc {
>> bool interrupts_enabled;
>> unsigned int msg_enabled_mask;
>> + struct ida client_ids;
>> +#define GUC_MAX_CLIENT_IDS 2
>> +
>> struct i915_vma *ads_vma;
>> struct i915_vma *stage_desc_pool;
>> void *stage_desc_pool_vaddr;
>> - struct ida stage_ids;
>> +#define GUC_MAX_PPAL_STAGE_DESCRIPTORS (GUC_MAX_STAGE_DESCRIPTORS
>> - GUC_MAX_CLIENT_IDS)
>> +
>> struct i915_vma *shared_data;
>> void *shared_data_vaddr;
>> @@ -94,6 +98,14 @@ struct intel_guc {
>> /* GuC's FW specific notify function */
>> void (*notify)(struct intel_guc *guc);
>> +
>> + /*
>> + * Override the first stage_desc to be used as proxy
>> + * (Default: GUC_MAX_PPAL_STAGE_DESCRIPTORS). The max number of ppal
>> + * descriptors is not updated accordingly since the test using
>> this does
>> + * not allocate any context.
>> + */
>> + I915_SELFTEST_DECLARE(u32 starting_proxy_id);
>> };
>> static inline bool intel_guc_is_alive(struct intel_guc *guc)
>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h
>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>> index ce3ab6ed21d5..1a0f41a26173 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>> @@ -32,6 +32,8 @@
>> #define GUC_MAX_STAGE_DESCRIPTORS 1024
>> #define GUC_INVALID_STAGE_ID GUC_MAX_STAGE_DESCRIPTORS
>> +#define GUC_MAX_LRC_PER_CLASS 64
>> +
>> #define GUC_RENDER_ENGINE 0
>> #define GUC_VIDEO_ENGINE 1
>> #define GUC_BLITTER_ENGINE 2
>> @@ -66,9 +68,12 @@
>> #define GUC_DOORBELL_DISABLED 0
>> #define GUC_STAGE_DESC_ATTR_ACTIVE BIT(0)
>> -#define GUC_STAGE_DESC_ATTR_PENDING_DB BIT(1)
>> -#define GUC_STAGE_DESC_ATTR_KERNEL BIT(2)
>> -#define GUC_STAGE_DESC_ATTR_PREEMPT BIT(3)
>> +#define GUC_STAGE_DESC_ATTR_TYPE_SHIFT 1
>> +#define GUC_STAGE_DESC_ATTR_PRINCIPAL (0x0 <<
>> GUC_STAGE_DESC_ATTR_TYPE_SHIFT)
>> +#define GUC_STAGE_DESC_ATTR_PROXY (0x1 <<
>> GUC_STAGE_DESC_ATTR_TYPE_SHIFT)
>> +#define GUC_STAGE_DESC_ATTR_REAL (0x2 <<
>> GUC_STAGE_DESC_ATTR_TYPE_SHIFT)
>> +#define GUC_STAGE_DESC_ATTR_TYPE_MASK (0x3 <<
>> GUC_STAGE_DESC_ATTR_TYPE_SHIFT)
>> +#define GUC_STAGE_DESC_ATTR_KERNEL (1 << 3)
>> #define GUC_STAGE_DESC_ATTR_RESET BIT(4)
>> #define GUC_STAGE_DESC_ATTR_WQLOCKED BIT(5)
>> #define GUC_STAGE_DESC_ATTR_PCH BIT(6)
>> @@ -277,9 +282,10 @@ struct guc_process_desc {
>> u64 wq_base_addr;
>> u32 wq_size_bytes;
>> u32 wq_status;
>> - u32 engine_presence;
>> u32 priority;
>> - u32 reserved[30];
>> + u32 token;
>> + u32 queue_engine_error;
>> + u32 reserved[23];
>> } __packed;
>> /* engine id and context id is packed into
>> guc_execlist_context.context_id*/
>> @@ -288,18 +294,20 @@ struct guc_process_desc {
>> /* The execlist context including software and HW information */
>> struct guc_execlist_context {
>> - u32 context_desc;
>> - u32 context_id;
>> - u32 ring_status;
>> - u32 ring_lrca;
>> + u64 hw_context_desc;
>> + u32 reserved0;
>> + u32 ring_lrc;
>> u32 ring_begin;
>> u32 ring_end;
>> u32 ring_next_free_location;
>> u32 ring_current_tail_pointer_value;
>> - u8 engine_state_submit_value;
>> - u8 engine_state_wait_value;
>> - u16 pagefault_count;
>> - u16 engine_submit_queue_count;
>> + u32 engine_state_wait_value;
>> + u32 state_reserved;
>> + u32 is_present_in_sq;
>> + u32 sync_value;
>> + u32 sync_addr;
>> + u32 slpc_hints;
>> + u32 reserved1[4];
>> } __packed;
>> /*
>> @@ -312,36 +320,33 @@ struct guc_execlist_context {
>> * with the GuC, being allocated before the GuC is loaded with its
>> firmware.
>> */
>> struct guc_stage_desc {
>> - u32 sched_common_area;
>> + u64 desc_private;
>> u32 stage_id;
>> - u32 pas_id;
>> - u8 engines_used;
>> + u32 proxy_id;
>> u64 db_trigger_cpu;
>> u32 db_trigger_uk;
>> u64 db_trigger_phy;
>> - u16 db_id;
>> -
>> - struct guc_execlist_context lrc[GUC_MAX_ENGINES_NUM];
>> -
>> - u8 attribute;
>> -
>> - u32 priority;
>> -
>> + u32 db_id;
>> + struct guc_execlist_context
>> lrc[GUC_MAX_ENGINE_CLASSES][GUC_MAX_LRC_PER_CLASS];
>> + struct {
>> + u64 bitmap;
>> + u32 reserved0;
>> + } __packed lrc_alloc_map[GUC_MAX_ENGINE_CLASSES];
>> + u32 lrc_count;
>> + u32 max_lrc_per_class;
>> + u32 attribute; /* GUC_STAGE_DESC_ATTR_xxx */
>> + u32 priority; /* GUC_CLIENT_PRIORITY_xxx */
>> u32 wq_sampled_tail_offset;
>> u32 wq_total_submit_enqueues;
>> -
>> u32 process_desc;
>> u32 wq_addr;
>> u32 wq_size;
>> -
>> - u32 engine_presence;
>> -
>> - u8 engine_suspended;
>> -
>> - u8 reserved0[3];
>> - u64 reserved1[1];
>> -
>> - u64 desc_private;
>> + u32 feature0;
>> + u32 feature1;
>> + u32 feature2;
>> + u32 queue_engine_error;
>> + u32 reserved[2];
>> + u64 reserved3[12];
>> } __packed;
>> /**
>> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c
>> b/drivers/gpu/drm/i915/intel_guc_submission.c
>> index eae668442ebe..9bf8ebbc4de1 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
>> @@ -46,17 +46,22 @@
>> * that contains all required pages for these elements).
>> *
>> * GuC stage descriptor:
>> - * During initialization, the driver allocates a static pool of 1024
>> such
>> - * descriptors, and shares them with the GuC.
>> - * Currently, there exists a 1:1 mapping between a intel_guc_client
>> and a
>> - * guc_stage_desc (via the client's stage_id), so effectively only one
>> - * gets used. This stage descriptor lets the GuC know about the
>> doorbell,
>> - * workqueue and process descriptor. Theoretically, it also lets the GuC
>> - * know about our HW contexts (context ID, etc...), but we actually
>> - * employ a kind of submission where the GuC uses the LRCA sent via
>> the work
>> - * item instead (the single guc_stage_desc associated to execbuf client
>> - * contains information about the default kernel context only, but
>> this is
>> - * essentially unused). This is called a "proxy" submission.
>> + * During initialization, the driver allocates a static pool of
>> descriptors
>> + * and shares them with the GuC. This stage descriptor lets the GuC
>> know about
> Sentence missing definition of "this stage descriptor", ie. "Each entry
> at the beginning of the pool represents one guc_stage_desc. These stage
> descriptors..."
Will add
>> + * the doorbell, workqueue and process descriptor, additionally it
>> stores
>> + * information about all possible HW contexts that use it (64 x
>> number of
>> + * engine classes of guc_execlist_context structs).
>> + *
>> + * The idea is that every direct-submission GuC client gets one SW
>> Context ID
>> + * and every HW context created by that client gets one SW Counter.
>> The "SW
>> + * Context ID" and "SW Counter" to use now get passed on every work
>> queue item.
>> + *
>> + * But we don't have direct submission yet: does that mean we are
>> limited to 64
>> + * contexts in total (one client)? Not really: we can use extra GuC
>> context
>> + * descriptors to store more HW contexts. They are special in that
>> they don't
>> + * have their own work queue, doorbell or process descriptor.
>> Instead, these
>> + * "principal" GuC context descriptors use the one that belongs to
>> the client
>> + * as a "proxy" for submission (a generalization of the old proxy
>> submission).
>> *
>> * The Scratch registers:
>> * There are 16 MMIO-based registers start from 0xC180. The kernel
>> driver writes
>> @@ -164,11 +169,28 @@ static int __guc_deallocate_doorbell(struct
>> intel_guc *guc, u32 stage_id)
>> return intel_guc_send(guc, action, ARRAY_SIZE(action));
>> }
>> -static struct guc_stage_desc *__get_stage_desc(struct
>> intel_guc_client *client)
>> +static struct guc_stage_desc *__get_stage_desc(struct intel_guc *guc,
>> u32 index)
>> +{
>> + struct guc_stage_desc *base = guc->stage_desc_pool_vaddr;
>> +
>> + GEM_BUG_ON(!USES_GUC_SUBMISSION(guc_to_i915(guc)));
>> + GEM_BUG_ON(index >= GUC_MAX_STAGE_DESCRIPTORS);
>> +
>> + return &base[index];
>> +}
>> +
>> +static struct guc_stage_desc *__get_proxy_stage_desc(struct
>> intel_guc_client *client)
>> {
>> - struct guc_stage_desc *base = client->guc->stage_desc_pool_vaddr;
>> + GEM_BUG_ON(!I915_SELFTEST_ONLY(client->guc->starting_proxy_id) &&
>> + client->stage_id < GUC_MAX_PPAL_STAGE_DESCRIPTORS);
>> + return __get_stage_desc(client->guc, client->stage_id);
>> +}
>> - return &base[client->stage_id];
>> +static struct guc_stage_desc *__get_ppal_stage_desc(struct intel_guc
>> *guc,
>> + u32 index)
>> +{
>> + GEM_BUG_ON(index >= GUC_MAX_PPAL_STAGE_DESCRIPTORS);
>> + return __get_stage_desc(guc, index);
>> }
>> /*
>> @@ -183,7 +205,7 @@ static void __update_doorbell_desc(struct
>> intel_guc_client *client, u16 new_id)
>> struct guc_stage_desc *desc;
>> /* Update the GuC's idea of the doorbell ID */
>> - desc = __get_stage_desc(client);
>> + desc = __get_proxy_stage_desc(client);
>> desc->db_id = new_id;
>> }
>> @@ -329,14 +351,12 @@ static int guc_stage_desc_pool_create(struct
>> intel_guc *guc)
>> guc->stage_desc_pool = vma;
>> guc->stage_desc_pool_vaddr = vaddr;
>> - ida_init(&guc->stage_ids);
>> return 0;
>> }
>> static void guc_stage_desc_pool_destroy(struct intel_guc *guc)
>> {
>> - ida_destroy(&guc->stage_ids);
>> i915_vma_unpin_and_release(&guc->stage_desc_pool,
>> I915_VMA_RELEASE_MAP);
>> }
>> @@ -347,78 +367,26 @@ static void guc_stage_desc_pool_destroy(struct
>> intel_guc *guc)
>> * data structures relating to this client (doorbell, process
>> descriptor,
>> * write queue, etc).
>> */
>> -static void guc_stage_desc_init(struct intel_guc_client *client)
>> +static void guc_proxy_stage_desc_init(struct intel_guc_client *client)
>> {
>> - struct intel_guc *guc = client->guc;
>> - struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> - struct intel_engine_cs *engine;
>> - struct i915_gem_context *ctx = client->owner;
>> struct guc_stage_desc *desc;
>> - unsigned int tmp;
>> u32 gfx_addr;
>> - desc = __get_stage_desc(client);
>> + desc = __get_proxy_stage_desc(client);
>> memset(desc, 0, sizeof(*desc));
>> desc->attribute = GUC_STAGE_DESC_ATTR_ACTIVE |
>> + GUC_STAGE_DESC_ATTR_PROXY |
>> GUC_STAGE_DESC_ATTR_KERNEL;
>> - if (is_high_priority(client))
>> - desc->attribute |= GUC_STAGE_DESC_ATTR_PREEMPT;
>> desc->stage_id = client->stage_id;
>> desc->priority = client->priority;
>> desc->db_id = client->doorbell_id;
>> - for_each_engine_masked(engine, dev_priv, client->engines, tmp) {
>> - struct intel_context *ce = to_intel_context(ctx, engine);
>> - u32 guc_engine_id = engine->guc_id;
>> - struct guc_execlist_context *lrc = &desc->lrc[guc_engine_id];
>> -
>> - /* TODO: We have a design issue to be solved here. Only when we
>> - * receive the first batch, we know which engine is used by the
>> - * user. But here GuC expects the lrc and ring to be pinned. It
>> - * is not an issue for default context, which is the only one
>> - * for now who owns a GuC client. But for future owner of GuC
>> - * client, need to make sure lrc is pinned prior to enter here.
>> - */
>> - if (!ce->state)
>> - break; /* XXX: continue? */
>> -
>> - /*
>> - * XXX: When this is a GUC_STAGE_DESC_ATTR_KERNEL client (proxy
>> - * submission or, in other words, not using a direct submission
>> - * model) the KMD's LRCA is not used for any work submission.
>> - * Instead, the GuC uses the LRCA of the user mode context (see
>> - * guc_add_request below).
>> - */
>> - lrc->context_desc = lower_32_bits(ce->lrc_desc);
>> -
>> - /* The state page is after PPHWSP */
>> - lrc->ring_lrca = intel_guc_ggtt_offset(guc, ce->state) +
>> - LRC_STATE_PN * PAGE_SIZE;
>> -
>> - /* XXX: In direct submission, the GuC wants the HW context id
>> - * here. In proxy submission, it wants the stage id
>> - */
>> - lrc->context_id = (client->stage_id << GUC_ELC_CTXID_OFFSET) |
>> - (guc_engine_id << GUC_ELC_ENGINE_OFFSET);
>> -
>> - lrc->ring_begin = intel_guc_ggtt_offset(guc, ce->ring->vma);
>> - lrc->ring_end = lrc->ring_begin + ce->ring->size - 1;
>> - lrc->ring_next_free_location = lrc->ring_begin;
>> - lrc->ring_current_tail_pointer_value = 0;
>> -
>> - desc->engines_used |= (1 << guc_engine_id);
>> - }
>> -
>> - DRM_DEBUG_DRIVER("Host engines 0x%x => GuC engines used 0x%x\n",
>> - client->engines, desc->engines_used);
>> - WARN_ON(desc->engines_used == 0);
>> -
>> /*
>> * The doorbell, process descriptor, and workqueue are all parts
>> * of the client object, which the GuC will reference via the GGTT
>> */
>> - gfx_addr = intel_guc_ggtt_offset(guc, client->vma);
>> + gfx_addr = intel_guc_ggtt_offset(client->guc, client->vma);
>> desc->db_trigger_phy = sg_dma_address(client->vma->pages->sgl) +
>> client->doorbell_offset;
>> desc->db_trigger_cpu = ptr_to_u64(__get_doorbell(client));
>> @@ -430,11 +398,11 @@ static void guc_stage_desc_init(struct
>> intel_guc_client *client)
>> desc->desc_private = ptr_to_u64(client);
>> }
>> -static void guc_stage_desc_fini(struct intel_guc_client *client)
>> +static void guc_proxy_stage_desc_fini(struct intel_guc_client *client)
>> {
>> struct guc_stage_desc *desc;
>> - desc = __get_stage_desc(client);
>> + desc = __get_proxy_stage_desc(client);
>> memset(desc, 0, sizeof(*desc));
>> }
>> @@ -553,7 +521,7 @@ static void inject_preempt_context(struct
>> work_struct *work)
>> struct intel_guc *guc = container_of(preempt_work, typeof(*guc),
>> preempt_work[engine->id]);
>> struct intel_guc_client *client = guc->preempt_client;
>> - struct guc_stage_desc *stage_desc = __get_stage_desc(client);
>> + struct guc_stage_desc *stage_desc = __get_proxy_stage_desc(client);
>> struct intel_context *ce = to_intel_context(client->owner, engine);
>> u32 data[7];
>> @@ -919,6 +887,7 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
>> struct i915_vma *vma;
>> void *vaddr;
>> int ret;
>> + u32 starting_id;
>> client = kzalloc(sizeof(*client), GFP_KERNEL);
>> if (!client)
>> @@ -931,8 +900,11 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
>> client->doorbell_id = GUC_DOORBELL_INVALID;
>> spin_lock_init(&client->wq_lock);
>> - ret = ida_simple_get(&guc->stage_ids, 0, GUC_MAX_STAGE_DESCRIPTORS,
>> - GFP_KERNEL);
>> + if (!I915_SELFTEST_ONLY(starting_id = guc->starting_proxy_id))
>> + starting_id = GUC_MAX_PPAL_STAGE_DESCRIPTORS;
>> +
>> + ret = ida_simple_get(&guc->client_ids, starting_id,
>> + GUC_MAX_STAGE_DESCRIPTORS, GFP_KERNEL);
>> if (ret < 0)
>> goto err_client;
>> @@ -983,7 +955,7 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
>> err_vma:
>> i915_vma_unpin_and_release(&client->vma, 0);
>> err_id:
>> - ida_simple_remove(&guc->stage_ids, client->stage_id);
>> + ida_simple_remove(&guc->client_ids, client->stage_id);
>> err_client:
>> kfree(client);
>> return ERR_PTR(ret);
>> @@ -993,7 +965,7 @@ static void guc_client_free(struct
>> intel_guc_client *client)
>> {
>> unreserve_doorbell(client);
>> i915_vma_unpin_and_release(&client->vma, I915_VMA_RELEASE_MAP);
>> - ida_simple_remove(&client->guc->stage_ids, client->stage_id);
>> + ida_simple_remove(&client->guc->client_ids, client->stage_id);
>> kfree(client);
>> }
>> @@ -1063,7 +1035,7 @@ static int __guc_client_enable(struct
>> intel_guc_client *client)
>> int ret;
>> guc_proc_desc_init(client);
>> - guc_stage_desc_init(client);
>> + guc_proxy_stage_desc_init(client);
>> ret = create_doorbell(client);
>> if (ret)
>> @@ -1072,7 +1044,7 @@ static int __guc_client_enable(struct
>> intel_guc_client *client)
>> return 0;
>> fail:
>> - guc_stage_desc_fini(client);
>> + guc_proxy_stage_desc_fini(client);
>> guc_proc_desc_fini(client);
>> return ret;
>> }
>> @@ -1089,7 +1061,7 @@ static void __guc_client_disable(struct
>> intel_guc_client *client)
>> else
>> __destroy_doorbell(client);
>> - guc_stage_desc_fini(client);
>> + guc_proxy_stage_desc_fini(client);
>> guc_proc_desc_fini(client);
>> }
>> @@ -1145,6 +1117,9 @@ int intel_guc_submission_init(struct intel_guc
>> *guc)
>> GEM_BUG_ON(!guc->stage_desc_pool);
>> WARN_ON(!guc_verify_doorbells(guc));
>> +
>> + ida_init(&guc->client_ids);
>> +
>> ret = guc_clients_create(guc);
>> if (ret)
>> goto err_pool;
>> @@ -1157,6 +1132,7 @@ int intel_guc_submission_init(struct intel_guc
>> *guc)
>> return 0;
>> err_pool:
>> + ida_destroy(&guc->client_ids);
>> guc_stage_desc_pool_destroy(guc);
>> return ret;
>> }
>> @@ -1173,6 +1149,8 @@ void intel_guc_submission_fini(struct intel_guc
>> *guc)
>> guc_clients_destroy(guc);
>> WARN_ON(!guc_verify_doorbells(guc));
>> + ida_destroy(&guc->client_ids);
>> +
>> if (guc->stage_desc_pool)
>> guc_stage_desc_pool_destroy(guc);
>> }
>> @@ -1257,6 +1235,203 @@ static void guc_submission_unpark(struct
>> intel_engine_cs *engine)
>> intel_engine_pin_breadcrumbs_irq(engine);
>> }
>> +static void guc_map_gem_ctx_to_ppal_stage(struct intel_guc *guc,
>> + struct guc_stage_desc *desc,
>> + u32 id)
>> +{
>> + GEM_BUG_ON(desc->attribute & GUC_STAGE_DESC_ATTR_ACTIVE);
>> +
>> + desc->attribute = GUC_STAGE_DESC_ATTR_ACTIVE |
>> + GUC_STAGE_DESC_ATTR_PRINCIPAL |
>> + GUC_STAGE_DESC_ATTR_KERNEL;
>> + desc->stage_id = id;
>> +
>> + /* all ppal contexts will be submitted trough the execbuf client */
>> + desc->proxy_id = guc->execbuf_client->stage_id;
>> +
>> + /*
>> + * max_lrc_per_class is used in GuC to cut short loops over the
>> + * lrc_bitmap when only a small amount of lrcs are used. We could
>> + * recalculate this value every time an lrc is added or removed, but
>> + * given the fact that we only have a max number of lrcs per
>> stage_desc
>> + * equal to the max number of instances of a class (because we map
>> + * gem_context 1:1 with stage_desc) and that the GuC loops only in
>> + * specific cases, redoing the calculation each time doesn't give
>> us a
>> + * big benefit for the cost so we can just use a static value.
>> + */
>> + desc->max_lrc_per_class = MAX_ENGINE_INSTANCE + 1;
>> +}
>> +
>> +static void guc_unmap_gem_ctx_from_ppal_stage(struct intel_guc *guc,
>> + struct guc_stage_desc *desc)
>> +{
>> + GEM_BUG_ON(!(desc->attribute & GUC_STAGE_DESC_ATTR_ACTIVE));
>> + GEM_BUG_ON(desc->lrc_count > 0);
>> +
>> + memset(desc, 0, sizeof(*desc));
>> +}
>> +
>> +static inline void guc_ppal_stage_lrc_pin(struct intel_engine_cs
>> *engine,
>> + struct i915_gem_context *ctx,
>> + struct intel_context *ce)
>> +{
>> + struct intel_guc *guc = &ctx->i915->guc;
>> + struct guc_stage_desc *desc;
>> + struct guc_execlist_context *lrc;
>> + u8 guc_class = engine->class;
>> +
>> + /* 1:1 gem_context to ppal mapping */
>> + GEM_BUG_ON(ce->sw_counter > MAX_ENGINE_INSTANCE);
>> +
>> + desc = __get_ppal_stage_desc(guc, ce->sw_context_id);
>> + GEM_BUG_ON(desc->lrc_alloc_map[guc_class].bitmap &
>> BIT(ce->sw_counter));
>> +
>> + if (!desc->lrc_count++)
>> + guc_map_gem_ctx_to_ppal_stage(guc, desc, ce->sw_context_id);
>> +
>> + lrc = &desc->lrc[guc_class][ce->sw_counter];
>> + lrc->hw_context_desc = ce->lrc_desc;
>> + lrc->ring_lrc = intel_guc_ggtt_offset(guc, ce->state) +
>> + LRC_STATE_PN * PAGE_SIZE;
>> + lrc->ring_begin = intel_guc_ggtt_offset(guc, ce->ring->vma);
>> + lrc->ring_end = lrc->ring_begin + ce->ring->size - 1;
>> +
>> + desc->lrc_alloc_map[guc_class].bitmap |= BIT(ce->sw_counter);
>> +}
>> +
>> +static inline void guc_ppal_stage_lrc_unpin(struct intel_context *ce)
>> +{
>> + struct i915_gem_context *ctx = ce->gem_context;
>> + struct intel_guc *guc = &ctx->i915->guc;
>> + struct intel_engine_cs *engine = ctx->i915->engine[ce -
>> ctx->__engine];
>> + struct guc_stage_desc *desc;
>> + struct guc_execlist_context *lrc;
>> + u8 guc_class = engine->class;
>> +
>> + desc = __get_ppal_stage_desc(guc, ce->sw_context_id);
>> + GEM_BUG_ON(!(desc->lrc_alloc_map[guc_class].bitmap &
>> BIT(ce->sw_counter)));
>> +
>> + lrc = &desc->lrc[guc_class][ce->sw_counter];
>> +
>> + /*
>> + * GuC needs us to keep the lrc mapped until it has finished
>> processing
>> + * the ctx switch interrupt. When executing nop or very small
>> workloads
>> + * it is possible (but quite unlikely) that 2 contexts on different
>> + * ELSPs of the same engine complete before the GuC manages to
>> process
>> + * the interrupt for the first completion. Experiments show this
>> happens
>> + * for ~0.2% of contexts when executing nop workloads on different
>> + * contexts back to back on the same engine. When submitting nop
>> + * workloads on all engines at the same time the hit-rate goes up to
>> + * ~0.7%. In all the observed cases GuC required < 100us to catch
>> up,
>> + * with the single engine case being always below 20us.
>> + *
>> + * The completion of the request on the second lrc will reduce our
>> + * pin_count on the first lrc to zero, thus triggering a call to
>> this
>> + * function potentially before GuC has had time to process the
>> + * interrupt. To avoid this, we could get an extra pin on the
>> context or
>> + * delay the unpin when guc is in use, but given that the issue is
>> + * limited to pathological scenarios and has very low hit rate even
>> + * there, we can just introduce a small delay when it happens to
>> give
>> + * time to GuC to catch up. Also to be noted that since the requests
>> + * have completed on the HW we've most likely already sent GuC
>> the next
>> + * contexts to be executed, so it is unlikely that by waiting
>> we'll add
>> + * bubbles in the HW execution.
>> + */
>> + WARN_ON(wait_for_us(lrc->is_present_in_sq == 0, 1000));
>> +
>> + desc->lrc_alloc_map[guc_class].bitmap &= ~BIT(ce->sw_counter);
>> + memset(lrc, 0, sizeof(*lrc));
>> +
>> + if (!--desc->lrc_count)
>> + guc_unmap_gem_ctx_from_ppal_stage(guc, desc);
>> +}
>> +
>> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>> +{
>> + struct drm_i915_private *i915 = guc_to_i915(guc);
>> + struct intel_engine_cs *engine;
>> + struct i915_gem_context *ctx;
>> + struct intel_context *ce;
>> + enum intel_engine_id id;
>> +
>> + /*
>> + * Some context (e.g. kernel_context) might have been pinned
>> before we
>> + * enabled GuC submission, so we need to add them to the GuC
>> bookeping.
>> + * Also, after a reset the GuC we want to make sure that the
>> information
>> + * shared with GuC is properly reset.
>> + *
>> + * NOTE: the code below assumes 1:1 mapping between ppal
>> descriptors and
>> + * gem contexts for simplicity.
>> + */
>> + list_for_each_entry(ctx, &i915->contexts.list, link) {
>> + if (atomic_read(&ctx->hw_id_pin_count)) {
>> + struct guc_stage_desc *desc;
>> +
>> + /* make sure the descriptor is clean... */
>> + GEM_BUG_ON(ctx->hw_id > GUC_MAX_PPAL_STAGE_DESCRIPTORS);
>> + desc = __get_ppal_stage_desc(guc, ctx->hw_id);
>> + memset(desc, 0, sizeof(*desc));
>> +
>> + /* ...and the (re-)pin all the lrcs */
>> + for_each_engine(engine, i915, id) {
>> + ce = to_intel_context(ctx, engine);
>> + if (ce->pin_count > 0)
>> + guc_ppal_stage_lrc_pin(engine, ctx, ce);
>> + }
>> + }
>> + }
>> +}
>> +
>> +static inline void guc_fini_lrc_mapping(struct intel_guc *guc)
>> +{
>> + struct drm_i915_private *i915 = guc_to_i915(guc);
>> + struct intel_engine_cs *engine;
>> + struct i915_gem_context *ctx;
>> + struct intel_context *ce;
>> + enum intel_engine_id id;
>> +
>> + list_for_each_entry(ctx, &i915->contexts.list, link) {
>> + if (atomic_read(&ctx->hw_id_pin_count)) {
>> + for_each_engine(engine, i915, id) {
>> + ce = to_intel_context(ctx, engine);
>> + if (ce->pin_count > 0)
>> + guc_ppal_stage_lrc_unpin(ce);
>> + }
>> + }
>> + }
>> +}
>> +
>> +static void guc_context_unpin(struct intel_context *ce)
>> +{
>> + guc_ppal_stage_lrc_unpin(ce);
>> + intel_execlists_context_unpin(ce);
>> +}
>> +
>> +static const struct intel_context_ops guc_context_ops = {
>> + .unpin = guc_context_unpin,
>> + .destroy = intel_execlists_context_destroy,
>> +};
>> +
>> +static struct intel_context *guc_context_pin(struct intel_engine_cs
>> *engine,
>> + struct i915_gem_context *ctx)
>> +{
>> + struct intel_context *ce = to_intel_context(ctx, engine);
>> +
>> + lockdep_assert_held(&ctx->i915->drm.struct_mutex);
>> +
>> + if (likely(ce->pin_count++))
>> + return ce;
>> + GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
>> +
>> + ce->ops = &guc_context_ops;
>> +
>> + ce = intel_execlists_context_pin(engine, ctx, ce);
>> + if (!IS_ERR(ce))
>> + guc_ppal_stage_lrc_pin(engine, ctx, ce);
>> +
>> + return ce;
>> +}
>> +
>> static void guc_set_default_submission(struct intel_engine_cs *engine)
>> {
>> /*
>> @@ -1274,6 +1449,8 @@ static void guc_set_default_submission(struct
>> intel_engine_cs *engine)
>> engine->execlists.tasklet.func = guc_submission_tasklet;
>> + engine->context_pin = guc_context_pin;
>> +
>> engine->park = guc_submission_park;
>> engine->unpark = guc_submission_unpark;
>> @@ -1320,6 +1497,8 @@ int intel_guc_submission_enable(struct intel_guc
>> *guc)
>> engine->set_default_submission(engine);
>> }
>> + guc_init_lrc_mapping(guc);
>> +
>> return 0;
>> }
>> @@ -1329,6 +1508,7 @@ void intel_guc_submission_disable(struct
>> intel_guc *guc)
>> GEM_BUG_ON(dev_priv->gt.awake); /* GT should be parked first */
>> + guc_fini_lrc_mapping(guc);
>> guc_interrupts_release(dev_priv);
>> guc_clients_disable(guc);
>> }
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 48e0cdf42221..444bc83554c5 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1071,7 +1071,7 @@ static void execlists_submit_request(struct
>> i915_request *request)
>> spin_unlock_irqrestore(&engine->timeline.lock, flags);
>> }
>> -static void execlists_context_destroy(struct intel_context *ce)
>> +void intel_execlists_context_destroy(struct intel_context *ce)
>> {
>> GEM_BUG_ON(ce->pin_count);
>> @@ -1084,7 +1084,7 @@ static void execlists_context_destroy(struct
>> intel_context *ce)
>> i915_gem_object_put(ce->state->obj);
>> }
>> -static void execlists_context_unpin(struct intel_context *ce)
>> +void intel_execlists_context_unpin(struct intel_context *ce)
>> {
>> struct intel_engine_cs *engine;
>> @@ -1141,10 +1141,10 @@ static int __context_pin(struct
>> i915_gem_context *ctx, struct i915_vma *vma)
>> return i915_vma_pin(vma, 0, 0, flags);
>> }
>> -static struct intel_context *
>> -__execlists_context_pin(struct intel_engine_cs *engine,
>> - struct i915_gem_context *ctx,
>> - struct intel_context *ce)
>> +struct intel_context *
>> +intel_execlists_context_pin(struct intel_engine_cs *engine,
>> + struct i915_gem_context *ctx,
>> + struct intel_context *ce)
>> {
>> void *vaddr;
>> int ret;
>> @@ -1205,8 +1205,8 @@ __execlists_context_pin(struct intel_engine_cs
>> *engine,
>> }
>> static const struct intel_context_ops execlists_context_ops = {
>> - .unpin = execlists_context_unpin,
>> - .destroy = execlists_context_destroy,
>> + .unpin = intel_execlists_context_unpin,
>> + .destroy = intel_execlists_context_destroy,
>> };
>> static struct intel_context *
>> @@ -1224,7 +1224,7 @@ execlists_context_pin(struct intel_engine_cs
>> *engine,
>> ce->ops = &execlists_context_ops;
>> - return __execlists_context_pin(engine, ctx, ce);
>> + return intel_execlists_context_pin(engine, ctx, ce);
>> }
>> static int execlists_request_alloc(struct i915_request *request)
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h
>> b/drivers/gpu/drm/i915/intel_lrc.h
>> index f5a5502ecf70..178b181ea651 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -104,4 +104,11 @@ void intel_lr_context_resume(struct
>> drm_i915_private *dev_priv);
>> void intel_execlists_set_default_submission(struct intel_engine_cs
>> *engine);
>> +struct intel_context *
>> +intel_execlists_context_pin(struct intel_engine_cs *engine,
>> + struct i915_gem_context *ctx,
>> + struct intel_context *ce);
>> +void intel_execlists_context_unpin(struct intel_context *ce);
>> +void intel_execlists_context_destroy(struct intel_context *ce);
>> +
>> #endif /* _INTEL_LRC_H_ */
>> diff --git a/drivers/gpu/drm/i915/selftests/intel_guc.c
>> b/drivers/gpu/drm/i915/selftests/intel_guc.c
>> index bf27162fb327..eb4e8bbe8c82 100644
>> --- a/drivers/gpu/drm/i915/selftests/intel_guc.c
>> +++ b/drivers/gpu/drm/i915/selftests/intel_guc.c
>> @@ -301,6 +301,7 @@ static int igt_guc_doorbells(void *arg)
>> if (err)
>> goto unlock;
>> + guc->starting_proxy_id = GUC_MAX_PPAL_STAGE_DESCRIPTORS - ATTEMPTS;
> maybe GEM_BUG_ON(GUC_MAX_PPAL_STAGE_DESCRIPTORS < ATTEMPTS) ?
ack, but BUILD_BUG_ON
>> for (i = 0; i < ATTEMPTS; i++) {
>> clients[i] = guc_client_alloc(dev_priv,
>> INTEL_INFO(dev_priv)->ring_mask,
>> @@ -334,7 +335,8 @@ static int igt_guc_doorbells(void *arg)
>> * The check below is only valid because we keep a doorbell
>> * assigned during the whole life of the client.
>> */
>> - if (clients[i]->stage_id >= GUC_NUM_DOORBELLS) {
>> + if ((clients[i]->stage_id - guc->starting_proxy_id) >=
>> + GUC_NUM_DOORBELLS) {
> I always get a bit nervous when I see unsigned numbers subtracted..
The check you asked for above should ensure we don't underflow here.
>> pr_err("[%d] more clients than doorbells (%d >= %d)\n",
> %u ?
ack
Daniele
>> i, clients[i]->stage_id, GUC_NUM_DOORBELLS);
>> err = -EINVAL;
>
More information about the Intel-gfx
mailing list