[Intel-gfx] [PATCH v2] drm/i915/guc: Move GuC wq_reserve_space to alloc_request_extras
Yu Dai
yu.dai at intel.com
Thu Dec 10 16:22:51 PST 2015
On 12/10/2015 09:14 AM, Dave Gordon wrote:
> On 09/12/15 18:50, yu.dai at intel.com wrote:
> > From: Alex Dai <yu.dai at intel.com>
> >
> > Split GuC work queue space reserve from submission and move it to
> > ring_alloc_request_extras. The reason is that failure in later
> > i915_add_request() won't be handled. In the case timeout happens,
> > driver can return early in order to handle the error.
> >
> > v1: Move wq_reserve_space to ring_reserve_space
> > v2: Move wq_reserve_space to alloc_request_extras (Chris Wilson)
> >
> > Signed-off-by: Alex Dai <yu.dai at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_guc_submission.c | 21 +++++++++------------
> > drivers/gpu/drm/i915/intel_guc.h | 1 +
> > drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++
> > 3 files changed, 16 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> > index 226e9c0..f7bd038 100644
> > --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> > @@ -472,25 +472,21 @@ static void guc_fini_ctx_desc(struct intel_guc *guc,
> > sizeof(desc) * client->ctx_index);
> > }
> >
> > -/* Get valid workqueue item and return it back to offset */
> > -static int guc_get_workqueue_space(struct i915_guc_client *gc, u32 *offset)
> > +int i915_guc_wq_reserve_space(struct i915_guc_client *gc)
>
> I think the name is misleading, because we don't actually reserve
> anything here, just check that there is some free space in the WQ.
>
> (We certainly don't WANT to reserve anything, because that would be
> difficult to clean up in the event of submission failure for any other
> reason. So I think it's only the name needs changing. Although ...
I was trying to use similar name as ring_reserve_space. It follows the
same pattern as filling ring buffer - reserve, emit and advance. The
only difference is it reserves 4 dwords (or checks space) rather than
various size of bytes for ring buffer case. Maybe using 'check' is
better for this case because 4 dwords in wq is what we need for
submission via GuC.
> > {
> > struct guc_process_desc *desc;
> > void *base;
> > u32 size = sizeof(struct guc_wq_item);
> > int ret = -ETIMEDOUT, timeout_counter = 200;
> >
> > + if (!gc)
> > + return 0;
> > +
> > base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
> > desc = base + gc->proc_desc_offset;
> >
> > while (timeout_counter-- > 0) {
> > if (CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size) >= size) {
>
> ... as an alternative strategy, we could cache the calculated freespace
> in the client structure; then if we already know there's at least 1 slot
> free from last time we checked, we could then just decrement the cached
> value and avoid the kmap+spinwait overhead. Only when we reach 0 would
> we have to go through this code to refresh our view of desc->head and
> recalculate the actual current freespace. [NB: clear cached value on reset?]
>
> Does that sound like a useful optimisation?
I think it is a good idea.
> > - *offset = gc->wq_tail;
> > -
> > - /* advance the tail for next workqueue item */
> > - gc->wq_tail += size;
> > - gc->wq_tail &= gc->wq_size - 1;
> > -
> > /* this will break the loop */
> > timeout_counter = 0;
> > ret = 0;
> > @@ -512,11 +508,12 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
> > struct guc_wq_item *wqi;
> > void *base;
> > u32 tail, wq_len, wq_off = 0;
> > - int ret;
> >
> > - ret = guc_get_workqueue_space(gc, &wq_off);
> > - if (ret)
> > - return ret;
> > + wq_off = gc->wq_tail;
> > +
> > + /* advance the tail for next workqueue item */
> > + gc->wq_tail += sizeof(struct guc_wq_item);
> > + gc->wq_tail &= gc->wq_size - 1;
>
> I was a bit unhappy about this code just assuming that there *must* be
> space (because we KNOW we've checked above) -- unless someone violated
> the proper calling sequence (TDR?). OTOH, it would be too expensive to
> go through the map-and-calculate code all over again just to catch an
> unlikely scenario. But, if we cache the last-calculated value as above,
> then the check could be cheap :) For example, just add a pre_checked
> size field that's set by the pre-check and then checked and decremented
> on submission; there shouldn't be more than one submission in progress
> at a time, because dev->struct_mutex is held across the whole sequence
> (but it's not an error to see two pre-checks in a row, because a request
> can be abandoned partway).
I don't understand the concerns here. As I said above, filling GuC WQ is
same as filling ring buffer - reserve (check), emit and advance. These
two lines are GuC version of intel_logical_ring_emit and
intel_logical_ring_advance. Maybe if I move these two lines to end of
guc_add_workqueue_item, people can understand it more easily.
> > /* For now workqueue item is 4 DWs; workqueue buffer is 2 pages. So we
> > * should not have the case where structure wqi is across page, neither
> > diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> > index 0e048bf..59c8e21 100644
> > --- a/drivers/gpu/drm/i915/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/intel_guc.h
> > @@ -123,5 +123,6 @@ int i915_guc_submit(struct i915_guc_client *client,
> > struct drm_i915_gem_request *rq);
> > void i915_guc_submission_disable(struct drm_device *dev);
> > void i915_guc_submission_fini(struct drm_device *dev);
> > +int i915_guc_wq_reserve_space(struct i915_guc_client *client);
> >
> > #endif
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index f96fb51..7d53d27 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -667,6 +667,12 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
> > return ret;
> > }
> >
> > + /* Reserve GuC WQ space here (one request needs one WQ item) because
> > + * the later i915_add_request() call can't fail. */
> > + ret = i915_guc_wq_reserve_space(request->i915->guc.execbuf_client);
> > + if (ret)
> > + return ret;
> > +
> > return 0;
> > }
>
> Worth checking for GuC submission before that call? Maybe not ...
It is purely code alignment issue if I add a if() block. The
execbuf_client is valid only when GuC submission is enabled and the
check is done inside that function call.
Alex
More information about the Intel-gfx
mailing list