[Intel-gfx] [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

Siluvery, Arun arun.siluvery at linux.intel.com
Wed Jun 17 14:36:17 PDT 2015


On 17/06/2015 21:21, Chris Wilson wrote:
> On Wed, Jun 17, 2015 at 07:48:16PM +0100, Siluvery, Arun wrote:
>> On 16/06/2015 21:25, Chris Wilson wrote:
>>> On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote:
>>>> +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
>>>> +				    uint32_t offset,
>>>> +				    uint32_t *num_dwords)
>>>> +{
>>>> +	uint32_t index;
>>>> +	struct page *page;
>>>> +	uint32_t *cmd;
>>>> +
>>>> +	page = i915_gem_object_get_page(ring->wa_ctx.obj, 0);
>>>> +	cmd = kmap_atomic(page);
>>>> +
>>>> +	index = offset;
>>>> +
>>>> +	/* FIXME: fill one cacheline with NOOPs.
>>>> +	 * Replace these instructions with WA
>>>> +	 */
>>>> +	while (index < (offset + 16))
>>>> +		cmd[index++] = MI_NOOP;
>>>> +
>>>> +	/*
>>>> +	 * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
>>>> +	 * execution depends on the length specified in terms of cache lines
>>>> +	 * in the register CTX_RCS_INDIRECT_CTX
>>>> +	 */
>>>> +
>>>> +	kunmap_atomic(cmd);
>>>> +
>>>> +	if (index > (PAGE_SIZE / sizeof(uint32_t)))
>>>> +		return -EINVAL;
>>>
>>> Check before you GPF!
>>>
>>> You just overran the buffer and corrupted memory, if you didn't succeed
>>> in trapping a segfault.
>>>
>>> To be generic, align to the cacheline then check you have enough room
>>> for your own data.
>>> -Chris
>>>
>> Hi Chris,
>>
>> The placement of condition is not correct. I don't completely follow
>> your suggestion, could you please elaborate; here we don't know
>> upfront how much more data to be written.
>
> Hmm, are we anticipating an unbounded number of workarounds? At some
> point you have to have a rough upper bound in order to do the bo
> allocation. If we are really unsure, then we do need to split this into
> two passes, one to count the number of dwords and the second to allocate
> and actually fill the cmd[].
>
Since we have a full page dedicated for this, that should be sufficient 
for good number of WA; if we need more than one page means we have major 
issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets 
added going forward but not close to filling entire page. Some of them 
will even be restricted to specific steppings/revisions. For these 
reasons I think a single page setup is sufficient.
Do you anticipate any other use cases that require allocating more than 
one page?

Two pass approach can be implemented but it adds unnecessary complexity 
which may not be required in this case. please let me know your thoughts.

>> I have made below changes to check after writing every command and
>> return error as soon as we reach the end.
>>
>> #define wa_ctx_emit(batch, cmd) {       \
>>                 if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t)))) { \
>>                          kunmap_atomic(batch);                          \
>>                          return -ENOSPC;                                \
>>                  }                                                      \
>>                  batch[index++] = (cmd);                                \
>>          }
>> is this acceptable?
>> I think this is the only one issue, all other comments are addressed.
>
> It's the lesser of evils for sure. Still feel dubious that we don't know
> upfront how much data we need to allocate.
yes, but with single pass approach do you see any way it can be improved?

regards
Arun

> -Chris
>



More information about the Intel-gfx mailing list