[Intel-gfx] [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

Mon Jun 15 11:09:57 PDT 2015

On 15/06/2015 18:29, Dave Gordon wrote:
> On 15/06/15 15:10, Siluvery, Arun wrote:
>> On 12/06/2015 18:03, Dave Gordon wrote:
>>> On 12/06/15 12:58, Siluvery, Arun wrote:
>>>> On 09/06/2015 19:43, Dave Gordon wrote:
>>>>> On 05/06/15 14:57, Arun Siluvery wrote:
>>>>>> In Per context w/a batch buffer,
>>>>>> WaRsRestoreWithPerCtxtBb
>>>>>>
>>>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>>>>> so as to not break any future users of existing definitions (Michel)
>>>>>>
>>>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho at intel.com>
>>>>>> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 59
>>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>>     2 files changed, 85 insertions(+)
>
> [snip]
>
>>>>>> +    /*
>>>>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>>>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>>>>> +     * in the same cacheline.
>>>>>> +     */
>>>>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>>>>> +        cmd[index++] = MI_NOOP;
>>>>>> +
>>>>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>>>>> +        MI_LRM_USE_GLOBAL_GTT |
>>>>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>>>>> +    cmd[index++] = INSTPM;
>>>>>> +    cmd[index++] = scratch_addr;
>>>>>> +    cmd[index++] = 0;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * BSpec says there should not be any commands programmed
>>>>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>>>>> +     * do not add any new commands
>>>>>> +     */
>>>>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>>> +
>>>>>>         /* padding */
>>>>>>             while (index < end)
>>>>>>             cmd[index++] = MI_NOOP;
>>>>>>
>>>>>
>>>>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
>>>>
>>>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
>>>> Since the diff context is only few lines it didn't showup in the diff.
>>>
>>> The second comment above says "no commands between LOAD_REG_REG and
>>> BB_END", so the point of my comment was that the BB_END is *NOT*
>>> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
>>
>> true, but they are no-ops so they shouldn't really affect anything. I
>> guess the spec means no valid commands.
>
> I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command,
> and I've even seen cases where a workaround specifically requires "a
> NOOP with the set-no-op-id-register bit set" to fix some particular bug.
> The only special thing about NOOP is that it doesn't get captured in IPEHR.
>
> I think the w/a requires this:
>
> 0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE)
>
> no gaps, no insertions, all together, all on one cacheline. Those
> instructions take up 8 DWords (32 bytes) so the sequence doesn't
> necessarily have to start on a cacheline boundary, as long as it's
> entirely within the same line. But it's simpler to start on a new line.
> You seem to have:
>
> 0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END
>
> so the condition in the comment is not fulfilled. If this works, maybe
> the comment is wrong.
>
>>> And therefore also, these instructions do *not* all end up in the same
>>> cacheline, thus contradicting the first comment above.
>>
>> I don't understand why. As per the requirement the commands from the
>> first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be
>> part of same cacheline (in this case second line).
>
> OK, they're all in the same line; I didn't look back at the full context
> enough and thought 'end' would point to the end of the buffer, rather
> than the end of the cacheline .. because it /does/ point to the end of
> the buffer, it just happens to be the end of the very same cacheline as
> well.
>
> So I really don't like the way the sizes of the two workaround batches
> have been defined in terms of cache lines. Also I'm not keen on one bit
> of code allocating the object and defining the sizes of the sub-areas
> within it, and then separate functions filling in each of the sequences
> within these areas, "knowing" that the areas are /just the right size/.
> It would be simpler to maintain if the "size in cachelines" values in
> lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync
> with the number of instructions written by gen8_init_perctx_bb() and
> gen8_init_indirectctx_bb().
>
> How about having each of these return the number of bytes they've
> appended to the (u32 *)buffer that they've been given, and let the
> caller manage mapping/unmapping, alignment, padding, etc, and fill in
> the size fields accordingly *after* the content has been defined?

This is an issue, editing the size if more WA are added is not good, it 
can be changed as you suggested.

regards
Arun

>
> .Dave.
>
>>> Padding *after* a BB_END would be redundant.
>>
>> yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of
>> abruptly terminating the batch which is why I am padding with no-ops, I
>> can change this if that is preferred.
>>>
>>> .Dave.
>
>
>