[Intel-gfx] [PATCH v2] drm/i915: Sanitize GuC client initialization

Joonas Lahtinen joonas.lahtinen at linux.intel.com
Tue Feb 21 15:38:24 UTC 2017


On ke, 2017-02-15 at 18:28 -0800, Daniele Ceraolo Spurio wrote:
> 
> On 14/02/17 05:53, Joonas Lahtinen wrote:
> > 
> > Started adding proper teardown to guc_client_alloc, ended up removing
> > quite a few dead ends where errors communicating with the GuC were
> > silently ignored. There also seemed to be quite a few erronous
> > teardown actions performed in case of an error (ordering wrong).
> > 
> > v2:
> > 	- Increase function symmetry/proximity (Michal/Daniele)
> > 	- Fix __reserve_doorbell accounting for high priority (Daniele)
> > 	- Call __update_doorbell_desc! (Daniele)
> > 	- Isolate __guc_{,de}allocate_doorbell (Michal/Daniele)
> > 
> > Signed-off-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > Cc: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
> > Cc: Oscar Mateo <oscar.mateo at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>

<SNIP>

> > +static bool has_doorbell(struct i915_guc_client *client)
> > +{
> > +	if (client->doorbell_id == GUC_DOORBELL_INVALID)
> > +		return false;
> > 
> > -	/* Activate the new doorbell */
> > -	__set_bit(new_id, doorbell_bitmap);
> > +	return test_bit(client->doorbell_id, client->guc->doorbell_bitmap);
> 
> Should we warn/gem_bug if client->doorbell_id != GUC_DOORBELL_INVALID 
> and the bit in guc->doorbell_bitmap is not set? Not sure if you plan to 
> decouple them more in the future, but with the current code it should 
> always be impossible to have a valid doorbell without the bit in the 
> bitmask being set. Maybe we can just return client->doorbell_id != 
> GUC_DOORBELL_INVALID here if we have no plan for a case where they can 
> be out of sync.

Kinda a leftover from restructuring, the selection and reservation were
a different thing at some point. Changed into GEM_BUG_ON.

> > +static int __destroy_doorbell(struct i915_guc_client *client)
> >  {
> > -	(void)guc_update_doorbell_id(guc, client, GUC_INVALID_DOORBELL_ID);
> > +	struct guc_doorbell_info *doorbell;
> > 
> > -	/* XXX: wait for any interrupts */
> > -	/* XXX: wait for workqueue to drain */
> > +	doorbell = __get_doorbell(client);
> > +	doorbell->db_status = GUC_DOORBELL_DISABLED;
> > +	doorbell->cookie = 0;
> > +
> 
> Not from this patch (but since we're at it...), I did a bit of digging 
> and I've found that doorbell release flow requires SW to poll the 
> GEN8_DRBREGL(db_id) register after updating doorbell->db_status to wait 
> for the GEN8_DRB_VALID bit to go to zero. This ensures that any pending 
> events are processed before we call into GuC to release the doorbell. 
> Note that GuC will fail the DEALLOCATE_DOORBELL action if the bit has 
> not gone to zero yet. This is currently not an issue, probably because 
> we use a single doorbell and we don't ring it near release time, but the 
> situation could change in the future so I believe it is worth to fix it 
> now. I can follow up with an additional patch if you want to keep this 
> one as refactoring only.

Ack, add a follow-up on top of your series.

> > -static uint32_t select_doorbell_cacheline(struct intel_guc *guc)
> > +	return 0;
> > +}
> > +
> > +static unsigned long __reserve_cacheline(struct intel_guc* guc)
> 
> "reserve_cacheline" doesn't really reflect what happens, because more 
> doorbells can use the same cacheline (while they are on different 
> physical pages) and there is no concept of unreserving the cacheline. 
> The idea is to try to avoid having lots of doorbells on the same 
> cacheline if possible to make the monitoring more efficient on the HW, 
> so I'd keep the "select_cacheline" naming for this function. We should 
> probably also look at modifying the function to use something more 
> elaborated than a simple round robin scheme to ensure doorbells are 
> equally distribuited on cachelines, but that can probably wait until we 
> use more doorbells.

Renamed.


> >  /*
> >   * Borrow the first client to set up & tear down each unused doorbell
> >   * in turn, to ensure that all doorbell h/w is (re)initialised.
> >   */
> > -static void guc_init_doorbell_hw(struct intel_guc *guc)
> > +static int guc_init_doorbell_hw(struct intel_guc *guc)
> >  {
> >  	struct i915_guc_client *client = guc->execbuf_client;
> > -	uint16_t db_id;
> > -	int i, err;
> > +	int err;
> > +	int i;
> > 
> > -	guc_disable_doorbell(guc, client);
> > +	if (has_doorbell(client))
> > +		destroy_doorbell(client);
> > 
> > -	for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
> > -		/* Skip if doorbell is OK */
> > -		if (guc_doorbell_check(guc, i))
> > +	for (i = 0; i < GUC_NUM_DOORBELLS; ++i) {
> > +		if (doorbell_ok(guc, i))
> >  			continue;
> > 
> 
> As mentioned in the previous patch version, here we assume that all the 
> doorbells should be disabled an we want to reset HW that has been left 
> enabled from previous users, so the doorbell_bitmap should be clear. 
> Maybe adding a simple
> 
> 	GEM_BUG_ON(test_bit(i, guc->doorbell_bitmap));
> 
> will help making sure that we're not leaving bits set when 
> releasing/resetting.

The functions don't actually even touch doorbell_bitmap, they just nuke
all active clients. That's what the previous code did, as I read it.


> > -		err = guc_update_doorbell_id(guc, client, i);
> > -		if (err)
> > -			DRM_DEBUG_DRIVER("Doorbell %d update failed, err %d\n",
> > -					i, err);
> > +		err = __reset_doorbell(client, i);
> 
> If the reset fails the doorbell is in a bad state, so it might be worth 
> to ensure that the bit is set in the bitmask to make sure we don't 
> assign that doorbell to any client, but we'd have to make sure to clear 
> the bitmask on GuC reset (see comment above).

Clearing the bits now in guc_init_doorbell_hw().


> > +		WARN(err, "Doorbell %d reset failed, err %d\n", i, err);
> >  	}
> > 
> > -	db_id = select_doorbell_register(guc, client->priority);
> > -	WARN_ON(db_id == GUC_INVALID_DOORBELL_ID);
> > -
> > -	err = guc_update_doorbell_id(guc, client, db_id);
> > +	err = __reserve_doorbell(client);
> >  	if (err)
> > -		DRM_WARN("Failed to restore doorbell to %d, err %d\n",
> > -			 db_id, err);
> > +		return err;
> 
> Continuing the discussion from the previous patch revision:
> 
> <snip>
>  > Shouldn't this be create_doorbell() instead of reserve?
> 
> That would end up calling the __update_doorbell, which we can't as the
> desc is for some strange reason not mapped yet.
> </snip>
> 
> As far as I can see, everything should already be allocated and mapped 
> here. Aren't we anyway already calling __update_doorbell both in the 
> client_alloc and in __reset_doorbell above?
> Also, where are we re-creating the doorbell that we destroyed at the 
> beginning of the function?
> 

Hmm, refactored the extra destroy cycle out.

<SNIP>

> > @@ -753,27 +781,35 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
> >  	guc_proc_desc_init(guc, client);
> >  	guc_ctx_desc_init(guc, client);
> > 
> > -	/* For runtime client allocation we need to enable the doorbell. Not
> > -	 * required yet for the static execbuf_client as this special kernel
> > -	 * client is enabled from i915_guc_submission_enable().
> > -	 *
> > -	 * guc_update_doorbell_id(guc, client, db_id);
> > -	 */
> > +	/* For runtime client allocation we need to enable the doorbell. */
> 
> this comment is a bit unclear now because you're not deferring the 
> initialization of execbuf_client to guc_init_doorbell_hw anymore, so 
> there is no difference between execbuf_client and "runtime" clients. 
> Maybe we can just remove the comment.

Removed.

> Note that creating the doorbell here will cause it to be destroyed and 
> re-allocated when i915_guc_submission_enable is called. A worth 
> compromise IMO to avoid special cases, but worth pointing out :)

Refactored a bit.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


More information about the Intel-gfx mailing list