[Intel-gfx] [PATCH v6 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

Fri Jun 19 10:50:36 PDT 2015

On Fri, Jun 19, 2015 at 06:37:10PM +0100, Arun Siluvery wrote:
> Some of the WA are to be applied during context save but before restore and
> some at the end of context save/restore but before executing the instructions
> in the ring, WA batch buffers are created for this purpose and these WA cannot
> be applied using normal means. Each context has two registers to load the
> offsets of these batch buffers. If they are non-zero, HW understands that it
> need to execute these batches.
> 
> v1: In this version two separate ring_buffer objects were used to load WA
> instructions for indirect and per context batch buffers and they were part
> of every context.
> 
> v2: Chris suggested to include additional page in context and use it to load
> these WA instead of creating separate objects. This will simplify lot of things
> as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
> is planning to use a similar setup to share data between GuC and driver and
> WA batch buffers can probably share that page. However after discussions with
> Dave who is implementing GuC changes, he suggested to use an independent page
> for the reasons - GuC area might grow and these WA are initialized only once and
> are not changed afterwards so we can share them share across all contexts.
> 
> The page is updated with WA during render ring init. This has an advantage of
> not adding more special cases to default_context.
> 
> We don't know upfront the number of WA we will applying using these batch buffers.
> For this reason the size was fixed earlier but it is not a good idea. To fix this,
> the functions that load instructions are modified to report the no of commands
> inserted and the size is now calculated after the batch is updated. A macro is
> introduced to add commands to these batch buffers which also checks for overflow
> and returns error.
> We have a full page dedicated for these WA so that should be sufficient for
> good number of WA, anything more means we have major issues.
> The list for Gen8 is small, same for Gen9 also, maybe few more gets added
> going forward but not close to filling entire page. Chris suggested a two-pass
> approach but we agreed to go with single page setup as it is a one-off routine
> and simpler code wins.
> 
> One additional option is offset field which is helpful if we would like to
> have multiple batches at different offsets within the page and select them
> based on some criteria. This is not a requirement at this point but could
> help in future (Dave).
> 
> Chris provided some helpful macros and suggestions which further simplified
> the code, they will also help in reducing code duplication when WA for
> other Gen are added. Add detailed comments explaining restrictions.
> 
> (Many thanks to Chris, Dave and Thomas for their reviews and inputs)
> 
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Dave Gordon <david.s.gordon at intel.com>
> Signed-off-by: Rafael Barbalho <rafael.barbalho at intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>

Sigh, after all that, I found one minor thing, but nevertheless
Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>

> +#define wa_ctx_emit(batch, cmd) {	\
> +		if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t)))) {	\
> +			return -ENOSPC;					\
> +		}							\
> +		batch[index++] = (cmd);					\
> +	}

We should have wrapped this in do { } while(0) - think of all those
trialing semicolons we have in the code! Fortunately we haven't used
this in a if (foo) wa_ctx_emit(bar); else wa_ctx_emit(baz); yet.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre