[Intel-gfx] [PATCH 14/18] drm/i915/guc: Prepare for nonblocking execbuf submission

Mon Sep 5 08:08:25 UTC 2016

On Fri, Sep 02, 2016 at 07:20:52PM +0100, Dave Gordon wrote:
> On 30/08/16 09:18, Chris Wilson wrote:
> >Currently the presumption is that the request construction and its
> >submission to the GuC are all under the same holding of struct_mutex. We
> >wish to relax this to separate the request construction and the later
> >submission to the GuC. This requires us to reserve some space in the
> >GuC command queue for the future submission. For flexibility to handle
> >out-of-order request submission we do not preallocate the next slot in
> >the GuC command queue during request construction, just ensuring that
> >there is enough space later.
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_guc_submission.c | 55 ++++++++++++++----------------
> > drivers/gpu/drm/i915/intel_guc.h           |  3 ++
> > 2 files changed, 29 insertions(+), 29 deletions(-)
> 
> Hmm .. the functional changes look mostly OK, apart from some
> locking, but there seems to be a great deal of unnecessary churn,
> such as combining statements which had been kept separate for
> clarity :(
> 
> >diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> >index 2332f9c98bdd..a047e61adc2a 100644
> >--- a/drivers/gpu/drm/i915/i915_guc_submission.c
> >+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> >@@ -434,20 +434,23 @@ int i915_guc_wq_check_space(struct drm_i915_gem_request *request)
> > {
> > 	const size_t wqi_size = sizeof(struct guc_wq_item);
> > 	struct i915_guc_client *gc = request->i915->guc.execbuf_client;
> >-	struct guc_process_desc *desc;
> >+	struct guc_process_desc *desc = gc->client_base + gc->proc_desc_offset;
> > 	u32 freespace;
> >+	int ret;
> >
> >-	GEM_BUG_ON(gc == NULL);
> >-
> >-	desc = gc->client_base + gc->proc_desc_offset;
> >-
> >+	spin_lock(&gc->lock);
> > 	freespace = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
> >-	if (likely(freespace >= wqi_size))
> >-		return 0;
> >-
> >-	gc->no_wq_space += 1;
> >+	freespace -= gc->wq_rsvd;
> >+	if (likely(freespace >= wqi_size)) {
> >+		gc->wq_rsvd += wqi_size;
> >+		ret = 0;
> >+	} else {
> >+		gc->no_wq_space++;
> >+		ret = -EAGAIN;
> >+	}
> >+	spin_unlock(&gc->lock);
> >
> >-	return -EAGAIN;
> >+	return ret;
> > }
> >
> > static void guc_add_workqueue_item(struct i915_guc_client *gc,
> >@@ -457,22 +460,9 @@ static void guc_add_workqueue_item(struct i915_guc_client *gc,
> > 	const size_t wqi_size = sizeof(struct guc_wq_item);
> > 	const u32 wqi_len = wqi_size/sizeof(u32) - 1;
> > 	struct intel_engine_cs *engine = rq->engine;
> >-	struct guc_process_desc *desc;
> > 	struct guc_wq_item *wqi;
> > 	void *base;
> >-	u32 freespace, tail, wq_off, wq_page;
> >-
> >-	desc = gc->client_base + gc->proc_desc_offset;
> >-
> >-	/* Free space is guaranteed, see i915_guc_wq_check_space() above */
> 
> This comment seems to have been lost. It still applies (mutatis
> mutandis), so it should be relocated to some part of the new version
> ...
> 
> >-	freespace = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
> >-	GEM_BUG_ON(freespace < wqi_size);
> >-
> >-	/* The GuC firmware wants the tail index in QWords, not bytes */
> >-	tail = rq->tail;
> >-	GEM_BUG_ON(tail & 7);
> >-	tail >>= 3;
> >-	GEM_BUG_ON(tail > WQ_RING_TAIL_MAX);
> 
> This *commented* sequence of statements seems clearer than the
> replacement below ...
> 
> >+	u32 wq_off, wq_page;
> >
> > 	/* For now workqueue item is 4 DWs; workqueue buffer is 2 pages. So we
> > 	 * should not have the case where structure wqi is across page, neither
> >@@ -482,18 +472,19 @@ static void guc_add_workqueue_item(struct i915_guc_client *gc,
> > 	 * workqueue buffer dw by dw.
> > 	 */
> > 	BUILD_BUG_ON(wqi_size != 16);
> 
> This would be a good place to note that:
> 
> /* Reserved space is guaranteed, see i915_guc_wq_check_space() above */
> 
> >+	GEM_BUG_ON(gc->wq_rsvd < wqi_size);
> >
> > 	/* postincrement WQ tail for next time */
> > 	wq_off = gc->wq_tail;
> >+	GEM_BUG_ON(wq_off & (wqi_size - 1));
> > 	gc->wq_tail += wqi_size;
> > 	gc->wq_tail &= gc->wq_size - 1;
> >-	GEM_BUG_ON(wq_off & (wqi_size - 1));
> >+	gc->wq_rsvd -= wqi_size;
> >
> > 	/* WQ starts from the page after doorbell / process_desc */
> > 	wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
> >-	wq_off &= PAGE_SIZE - 1;
> > 	base = kmap_atomic(i915_gem_object_get_page(gc->vma->obj, wq_page));
> >-	wqi = (struct guc_wq_item *)((char *)base + wq_off);
> >+	wqi = (struct guc_wq_item *)((char *)base + offset_in_page(wq_off));
> >
> > 	/* Now fill in the 4-word work queue item */
> > 	wqi->header = WQ_TYPE_INORDER |
> >@@ -504,7 +495,7 @@ static void guc_add_workqueue_item(struct i915_guc_client *gc,
> > 	/* The GuC wants only the low-order word of the context descriptor */
> > 	wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx, engine);
> >
> >-	wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
> >+	wqi->ring_tail = rq->tail >> 3 << WQ_RING_TAIL_SHIFT;
> 
> This line is particularly ugly. I think it's the >> chevron << effect.
> Parenthesisation would help, but it would be nicer as separate lines.
> Also, there's no explanation of the "3" here, unlike the original
> version above.

Correct. The code was and still is ugly.

> > 	wqi->fence_id = rq->fence.seqno;
> >
> > 	kunmap_atomic(base);
> >@@ -591,8 +582,10 @@ static void i915_guc_submit(struct drm_i915_gem_request *rq)
> > 	struct i915_guc_client *client = guc->execbuf_client;
> > 	int b_ret;
> >
> >+	spin_lock(&client->lock);
> > 	guc_add_workqueue_item(client, rq);
> > 	b_ret = guc_ring_doorbell(client);
> >+	spin_unlock(&client->lock);
> >
> > 	client->submissions[engine_id] += 1;
> 
> Outside the spinlock? Do we still have the BKL during submit(), just
> not during i915_guc_wq_check_space() ? If so, then
> guc_ring_doorbell() doesn't strictly need to be inside the spinlock
> (or the lock could be inside guc_add_workqueue_item()); but if not
> then the update of submissions[] should be inside.

Nope, this code is just garbage, so I didn't bother polishing it.

> > 	client->retcode = b_ret;
> >@@ -770,6 +763,8 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
> > 	if (!client)
> > 		return NULL;
> >
> >+	spin_lock_init(&client->lock);
> >+
> > 	client->owner = ctx;
> > 	client->guc = guc;
> > 	client->engines = engines;
> >@@ -1019,9 +1014,11 @@ int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
> > 		engine->submit_request = i915_guc_submit;
> >
> > 		/* Replay the current set of previously submitted requests */
> >-		list_for_each_entry(request, &engine->request_list, link)
> >+		list_for_each_entry(request, &engine->request_list, link) {
> >+			client->wq_rsvd += sizeof(struct guc_wq_item);
> 
> Presumably this is being called in some context that ensures that we
> don't need to hold the spinlock while updating wq_rsvd ? Maybe that
> should be mentioned ...

Obvious.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre