[Intel-gfx] [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

Dave Gordon david.s.gordon at intel.com
Mon Jun 15 10:29:18 PDT 2015


On 15/06/15 15:10, Siluvery, Arun wrote:
> On 12/06/2015 18:03, Dave Gordon wrote:
>> On 12/06/15 12:58, Siluvery, Arun wrote:
>>> On 09/06/2015 19:43, Dave Gordon wrote:
>>>> On 05/06/15 14:57, Arun Siluvery wrote:
>>>>> In Per context w/a batch buffer,
>>>>> WaRsRestoreWithPerCtxtBb
>>>>>
>>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>>>> so as to not break any future users of existing definitions (Michel)
>>>>>
>>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho at intel.com>
>>>>> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>>>    drivers/gpu/drm/i915/intel_lrc.c | 59
>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>    2 files changed, 85 insertions(+)

[snip]

>>>>> +    /*
>>>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>>>> +     * in the same cacheline.
>>>>> +     */
>>>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>>>> +        cmd[index++] = MI_NOOP;
>>>>> +
>>>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>>>> +        MI_LRM_USE_GLOBAL_GTT |
>>>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>>>> +    cmd[index++] = INSTPM;
>>>>> +    cmd[index++] = scratch_addr;
>>>>> +    cmd[index++] = 0;
>>>>> +
>>>>> +    /*
>>>>> +     * BSpec says there should not be any commands programmed
>>>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>>>> +     * do not add any new commands
>>>>> +     */
>>>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>> +
>>>>>        /* padding */
>>>>>            while (index < end)
>>>>>            cmd[index++] = MI_NOOP;
>>>>>
>>>>
>>>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
>>>
>>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
>>> Since the diff context is only few lines it didn't showup in the diff.
>>
>> The second comment above says "no commands between LOAD_REG_REG and
>> BB_END", so the point of my comment was that the BB_END is *NOT*
>> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
>
> true, but they are no-ops so they shouldn't really affect anything. I
> guess the spec means no valid commands.

I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command,
and I've even seen cases where a workaround specifically requires "a
NOOP with the set-no-op-id-register bit set" to fix some particular bug.
The only special thing about NOOP is that it doesn't get captured in IPEHR.

I think the w/a requires this:

0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE)

no gaps, no insertions, all together, all on one cacheline. Those
instructions take up 8 DWords (32 bytes) so the sequence doesn't
necessarily have to start on a cacheline boundary, as long as it's
entirely within the same line. But it's simpler to start on a new line.
You seem to have:

0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END

so the condition in the comment is not fulfilled. If this works, maybe
the comment is wrong.

>> And therefore also, these instructions do *not* all end up in the same
>> cacheline, thus contradicting the first comment above.
>
> I don't understand why. As per the requirement the commands from the
> first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be
> part of same cacheline (in this case second line).

OK, they're all in the same line; I didn't look back at the full context
enough and thought 'end' would point to the end of the buffer, rather
than the end of the cacheline .. because it /does/ point to the end of
the buffer, it just happens to be the end of the very same cacheline as
well.

So I really don't like the way the sizes of the two workaround batches
have been defined in terms of cache lines. Also I'm not keen on one bit
of code allocating the object and defining the sizes of the sub-areas
within it, and then separate functions filling in each of the sequences
within these areas, "knowing" that the areas are /just the right size/.
It would be simpler to maintain if the "size in cachelines" values in
lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync
with the number of instructions written by gen8_init_perctx_bb() and
gen8_init_indirectctx_bb().

How about having each of these return the number of bytes they've
appended to the (u32 *)buffer that they've been given, and let the
caller manage mapping/unmapping, alignment, padding, etc, and fill in
the size fields accordingly *after* the content has been defined?

.Dave.

>> Padding *after* a BB_END would be redundant.
> 
> yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of
> abruptly terminating the batch which is why I am padding with no-ops, I
> can change this if that is preferred.
>>
>> .Dave.



More information about the Intel-gfx mailing list