[Intel-gfx] [PATCH v2] drm/i915: Sanitize GuC client initialization
Joonas Lahtinen
joonas.lahtinen at linux.intel.com
Tue Feb 21 15:38:24 UTC 2017
On ke, 2017-02-15 at 18:28 -0800, Daniele Ceraolo Spurio wrote:
>
> On 14/02/17 05:53, Joonas Lahtinen wrote:
> >
> > Started adding proper teardown to guc_client_alloc, ended up removing
> > quite a few dead ends where errors communicating with the GuC were
> > silently ignored. There also seemed to be quite a few erronous
> > teardown actions performed in case of an error (ordering wrong).
> >
> > v2:
> > - Increase function symmetry/proximity (Michal/Daniele)
> > - Fix __reserve_doorbell accounting for high priority (Daniele)
> > - Call __update_doorbell_desc! (Daniele)
> > - Isolate __guc_{,de}allocate_doorbell (Michal/Daniele)
> >
> > Signed-off-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > Cc: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
> > Cc: Oscar Mateo <oscar.mateo at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
<SNIP>
> > +static bool has_doorbell(struct i915_guc_client *client)
> > +{
> > + if (client->doorbell_id == GUC_DOORBELL_INVALID)
> > + return false;
> >
> > - /* Activate the new doorbell */
> > - __set_bit(new_id, doorbell_bitmap);
> > + return test_bit(client->doorbell_id, client->guc->doorbell_bitmap);
>
> Should we warn/gem_bug if client->doorbell_id != GUC_DOORBELL_INVALID
> and the bit in guc->doorbell_bitmap is not set? Not sure if you plan to
> decouple them more in the future, but with the current code it should
> always be impossible to have a valid doorbell without the bit in the
> bitmask being set. Maybe we can just return client->doorbell_id !=
> GUC_DOORBELL_INVALID here if we have no plan for a case where they can
> be out of sync.
Kinda a leftover from restructuring, the selection and reservation were
a different thing at some point. Changed into GEM_BUG_ON.
> > +static int __destroy_doorbell(struct i915_guc_client *client)
> > {
> > - (void)guc_update_doorbell_id(guc, client, GUC_INVALID_DOORBELL_ID);
> > + struct guc_doorbell_info *doorbell;
> >
> > - /* XXX: wait for any interrupts */
> > - /* XXX: wait for workqueue to drain */
> > + doorbell = __get_doorbell(client);
> > + doorbell->db_status = GUC_DOORBELL_DISABLED;
> > + doorbell->cookie = 0;
> > +
>
> Not from this patch (but since we're at it...), I did a bit of digging
> and I've found that doorbell release flow requires SW to poll the
> GEN8_DRBREGL(db_id) register after updating doorbell->db_status to wait
> for the GEN8_DRB_VALID bit to go to zero. This ensures that any pending
> events are processed before we call into GuC to release the doorbell.
> Note that GuC will fail the DEALLOCATE_DOORBELL action if the bit has
> not gone to zero yet. This is currently not an issue, probably because
> we use a single doorbell and we don't ring it near release time, but the
> situation could change in the future so I believe it is worth to fix it
> now. I can follow up with an additional patch if you want to keep this
> one as refactoring only.
Ack, add a follow-up on top of your series.
> > -static uint32_t select_doorbell_cacheline(struct intel_guc *guc)
> > + return 0;
> > +}
> > +
> > +static unsigned long __reserve_cacheline(struct intel_guc* guc)
>
> "reserve_cacheline" doesn't really reflect what happens, because more
> doorbells can use the same cacheline (while they are on different
> physical pages) and there is no concept of unreserving the cacheline.
> The idea is to try to avoid having lots of doorbells on the same
> cacheline if possible to make the monitoring more efficient on the HW,
> so I'd keep the "select_cacheline" naming for this function. We should
> probably also look at modifying the function to use something more
> elaborated than a simple round robin scheme to ensure doorbells are
> equally distribuited on cachelines, but that can probably wait until we
> use more doorbells.
Renamed.
> > /*
> > * Borrow the first client to set up & tear down each unused doorbell
> > * in turn, to ensure that all doorbell h/w is (re)initialised.
> > */
> > -static void guc_init_doorbell_hw(struct intel_guc *guc)
> > +static int guc_init_doorbell_hw(struct intel_guc *guc)
> > {
> > struct i915_guc_client *client = guc->execbuf_client;
> > - uint16_t db_id;
> > - int i, err;
> > + int err;
> > + int i;
> >
> > - guc_disable_doorbell(guc, client);
> > + if (has_doorbell(client))
> > + destroy_doorbell(client);
> >
> > - for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
> > - /* Skip if doorbell is OK */
> > - if (guc_doorbell_check(guc, i))
> > + for (i = 0; i < GUC_NUM_DOORBELLS; ++i) {
> > + if (doorbell_ok(guc, i))
> > continue;
> >
>
> As mentioned in the previous patch version, here we assume that all the
> doorbells should be disabled an we want to reset HW that has been left
> enabled from previous users, so the doorbell_bitmap should be clear.
> Maybe adding a simple
>
> GEM_BUG_ON(test_bit(i, guc->doorbell_bitmap));
>
> will help making sure that we're not leaving bits set when
> releasing/resetting.
The functions don't actually even touch doorbell_bitmap, they just nuke
all active clients. That's what the previous code did, as I read it.
> > - err = guc_update_doorbell_id(guc, client, i);
> > - if (err)
> > - DRM_DEBUG_DRIVER("Doorbell %d update failed, err %d\n",
> > - i, err);
> > + err = __reset_doorbell(client, i);
>
> If the reset fails the doorbell is in a bad state, so it might be worth
> to ensure that the bit is set in the bitmask to make sure we don't
> assign that doorbell to any client, but we'd have to make sure to clear
> the bitmask on GuC reset (see comment above).
Clearing the bits now in guc_init_doorbell_hw().
> > + WARN(err, "Doorbell %d reset failed, err %d\n", i, err);
> > }
> >
> > - db_id = select_doorbell_register(guc, client->priority);
> > - WARN_ON(db_id == GUC_INVALID_DOORBELL_ID);
> > -
> > - err = guc_update_doorbell_id(guc, client, db_id);
> > + err = __reserve_doorbell(client);
> > if (err)
> > - DRM_WARN("Failed to restore doorbell to %d, err %d\n",
> > - db_id, err);
> > + return err;
>
> Continuing the discussion from the previous patch revision:
>
> <snip>
> > Shouldn't this be create_doorbell() instead of reserve?
>
> That would end up calling the __update_doorbell, which we can't as the
> desc is for some strange reason not mapped yet.
> </snip>
>
> As far as I can see, everything should already be allocated and mapped
> here. Aren't we anyway already calling __update_doorbell both in the
> client_alloc and in __reset_doorbell above?
> Also, where are we re-creating the doorbell that we destroyed at the
> beginning of the function?
>
Hmm, refactored the extra destroy cycle out.
<SNIP>
> > @@ -753,27 +781,35 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
> > guc_proc_desc_init(guc, client);
> > guc_ctx_desc_init(guc, client);
> >
> > - /* For runtime client allocation we need to enable the doorbell. Not
> > - * required yet for the static execbuf_client as this special kernel
> > - * client is enabled from i915_guc_submission_enable().
> > - *
> > - * guc_update_doorbell_id(guc, client, db_id);
> > - */
> > + /* For runtime client allocation we need to enable the doorbell. */
>
> this comment is a bit unclear now because you're not deferring the
> initialization of execbuf_client to guc_init_doorbell_hw anymore, so
> there is no difference between execbuf_client and "runtime" clients.
> Maybe we can just remove the comment.
Removed.
> Note that creating the doorbell here will cause it to be destroyed and
> re-allocated when i915_guc_submission_enable is called. A worth
> compromise IMO to avoid special cases, but worth pointing out :)
Refactored a bit.
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
More information about the Intel-gfx
mailing list