[Intel-gfx] [PATCH v2] drm/i915/guc: Move GuC wq_reserve_space to alloc_request_extras

Dave Gordon david.s.gordon at intel.com
Thu Dec 10 09:14:28 PST 2015


On 09/12/15 18:50, yu.dai at intel.com wrote:
> From: Alex Dai <yu.dai at intel.com>
>
> Split GuC work queue space reserve from submission and move it to
> ring_alloc_request_extras. The reason is that failure in later
> i915_add_request() won't be handled. In the case timeout happens,
> driver can return early in order to handle the error.
>
> v1: Move wq_reserve_space to ring_reserve_space
> v2: Move wq_reserve_space to alloc_request_extras (Chris Wilson)
>
> Signed-off-by: Alex Dai <yu.dai at intel.com>
> ---
>   drivers/gpu/drm/i915/i915_guc_submission.c | 21 +++++++++------------
>   drivers/gpu/drm/i915/intel_guc.h           |  1 +
>   drivers/gpu/drm/i915/intel_lrc.c           |  6 ++++++
>   3 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 226e9c0..f7bd038 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -472,25 +472,21 @@ static void guc_fini_ctx_desc(struct intel_guc *guc,
>   			     sizeof(desc) * client->ctx_index);
>   }
>
> -/* Get valid workqueue item and return it back to offset */
> -static int guc_get_workqueue_space(struct i915_guc_client *gc, u32 *offset)
> +int i915_guc_wq_reserve_space(struct i915_guc_client *gc)

I think the name is misleading, because we don't actually reserve 
anything here, just check that there is some free space in the WQ.

(We certainly don't WANT to reserve anything, because that would be 
difficult to clean up in the event of submission failure for any other 
reason. So I think it's only the name needs changing. Although ...

>   {
>   	struct guc_process_desc *desc;
>   	void *base;
>   	u32 size = sizeof(struct guc_wq_item);
>   	int ret = -ETIMEDOUT, timeout_counter = 200;
>
> +	if (!gc)
> +		return 0;
> +
>   	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
>   	desc = base + gc->proc_desc_offset;
>
>   	while (timeout_counter-- > 0) {
>   		if (CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size) >= size) {

... as an alternative strategy, we could cache the calculated freespace 
in the client structure; then if we already know there's at least 1 slot 
free from last time we checked, we could then just decrement the cached 
value and avoid the kmap+spinwait overhead. Only when we reach 0 would 
we have to go through this code to refresh our view of desc->head and 
recalculate the actual current freespace. [NB: clear cached value on reset?]

Does that sound like a useful optimisation?

> -			*offset = gc->wq_tail;
> -
> -			/* advance the tail for next workqueue item */
> -			gc->wq_tail += size;
> -			gc->wq_tail &= gc->wq_size - 1;
> -
>   			/* this will break the loop */
>   			timeout_counter = 0;
>   			ret = 0;
> @@ -512,11 +508,12 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
>   	struct guc_wq_item *wqi;
>   	void *base;
>   	u32 tail, wq_len, wq_off = 0;
> -	int ret;
>
> -	ret = guc_get_workqueue_space(gc, &wq_off);
> -	if (ret)
> -		return ret;
> +	wq_off = gc->wq_tail;
> +
> +	/* advance the tail for next workqueue item */
> +	gc->wq_tail += sizeof(struct guc_wq_item);
> +	gc->wq_tail &= gc->wq_size - 1;

I was a bit unhappy about this code just assuming that there *must* be 
space (because we KNOW we've checked above) -- unless someone violated 
the proper calling sequence (TDR?). OTOH, it would be too expensive to 
go through the map-and-calculate code all over again just to catch an 
unlikely scenario. But, if we cache the last-calculated value as above, 
then the check could be cheap :) For example, just add a pre_checked 
size field that's set by the pre-check and then checked and decremented 
on submission; there shouldn't be more than one submission in progress 
at a time, because dev->struct_mutex is held across the whole sequence 
(but it's not an error to see two pre-checks in a row, because a request 
can be abandoned partway).

>   	/* For now workqueue item is 4 DWs; workqueue buffer is 2 pages. So we
>   	 * should not have the case where structure wqi is across page, neither
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index 0e048bf..59c8e21 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -123,5 +123,6 @@ int i915_guc_submit(struct i915_guc_client *client,
>   		    struct drm_i915_gem_request *rq);
>   void i915_guc_submission_disable(struct drm_device *dev);
>   void i915_guc_submission_fini(struct drm_device *dev);
> +int i915_guc_wq_reserve_space(struct i915_guc_client *client);
>
>   #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f96fb51..7d53d27 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -667,6 +667,12 @@ int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request
>   			return ret;
>   	}
>
> +	/* Reserve GuC WQ space here (one request needs one WQ item) because
> +	 * the later i915_add_request() call can't fail. */
> +	ret = i915_guc_wq_reserve_space(request->i915->guc.execbuf_client);
> +	if (ret)
> +		return ret;
> +
>   	return 0;
>   }

Worth checking for GuC submission before that call? Maybe not ...

.Dave.


More information about the Intel-gfx mailing list