[Intel-xe] [PATCH 2/3] drm/xe: Add missing TLB invalidation to emit_pipe_invalidate()

Fri Jun 2 08:20:14 UTC 2023

On 02/06/2023 00:09, Souza, Jose wrote:
> On Thu, 2023-06-01 at 12:26 -0700, Matt Roper wrote:
>> On Wed, May 31, 2023 at 10:46:20AM +0200, Thomas Hellström wrote:
>>> On 5/30/23 20:40, Souza, Jose wrote:
>>>> On Mon, 2023-05-29 at 17:07 +0200, Thomas Hellström wrote:
>>>>> On 5/29/23 16:56, Thomas Hellström wrote:
>>>>>> On 5/29/23 16:48, Souza, Jose wrote:
>>>>>>> On Mon, 2023-05-29 at 11:08 +0200, Thomas Hellström wrote:
>>>>>>>> On Fri, 2023-05-26 at 12:06 -0700, José Roberto de Souza wrote:
>>>>>>>>> i915 invalidates TLB before emit BB start, so doing the same in Xe.
>>>>>>>> Hi, José,
>>>>>>>>
>>>>>>>> Do you see an issue because of missing TLB flushes? In that case that
>>>>>>>> needs to be added to the commit message. We do TLB flushes on unbind,
>>>>>>>> but not sure if we do them on rebind, so if that's the issue we need to
>>>>>>>> figure out whether we should do them also on rebind or, like in this
>>>>>>>> patch, on each exec.
>>>>>>> I have a group of tests that results flips randomly. It fails when
>>>>>>> the rendered buffer is compared to expected result.
>>>>>>> Anything that add a bit of delay after the exec fixes those tests so
>>>>>>> I was looking for any missing flush in Xe KMD and Mesa.
>>>>>>>
>>>>>>> This one did not fixed it but as i915 was doing it I thought would be
>>>>>>> good to do in Xe too.
>>>>>> I think missing TLB invalidations are more likely to cause random
>>>>>> overwrites of freed memory. Let me do a quick check on these. But the
>>>>>> problem you're describing indeed sounds more like a missing render
>>>>>> cache flush.
>>>>>>
>>>>> It indeed looks like we're doing proper TLB invalidation on both rebind
>>>>> and unbind, so this patch shouldn't really be needed.
>>>>>
>>>>> (look for "invalidation_fence_init()")
>>>> With this patch + PIPE_CONTROL flush at the end of batch buffer in Mesa, fixed the groups of the tests that were flipping results.
>>>> Do a XE_GUC_ACTION_TLB_INVALIDATION is the same as a PIPE_CONTROL_TLB_INVALIDATE?
>> +Cc Matt Brost and Niranjana
>>
>> It's worth noting that the removal of PIPE_CONTROL_TLB_INVALIDATE was a
>> deliberate act in this patch:
>>
>>          commit 41db3304cff95475c6f6667ae27fab3c144a49df
>>          Author:     Matthew Brost <matthew.brost at intel.com>
>>          AuthorDate: Thu Jan 26 10:40:41 2023 -0800
>>          Commit:     Rodrigo Vivi <rodrigo.vivi at intel.com>
>>          CommitDate: Tue May 23 14:13:47 2023 -0400
>>
>>              drm/xe: Drop TLB invalidation from ring operations
>>
>> so the expectation is that this specific flag shouldn't be necessary due
>> to the GuC-based invalidation we're doing.
>>
>> At the hardware level, there are a lot of different TLBs that cache page
>> table lookups.  Each engine has a TLB, the GuC itself has another TLB,
>> the OA unit has another one, and on igpu platforms, there are also
>> additional AuxCCS TLBs for the compression lookups that most engines can
>> do.  You can trigger TLB invalidations in various ways:  via GPU
>> instructions (like PIPE_CONTROL), MMIO registers (like
>> XEHP_GFX_TLB_INV_CR), or via a GuC request
>> (XE_GUC_ACTION_TLB_INVALIDATION).  The details of invalidation vary by
>> platform (e.g., you can do selective range-based invalidation on some
>> platforms, but not on others).  The GuC just provides a single
>> interface, but that interface lets you choose whether you're doing full
>> vs range invalidation, whether you're invalidating engine TLBs or
>> the GuC's own TLB, etc.  If using the GuC interface to invalidate engine
>> TLBs, it just requires a single request and takes care of invalidating
>> all of the individual engines on its end.
>>
> Huum then commit 1d759ab5967cfc ("drm/xe: Reinstate render / compute cache invalidation in ring ops") brought the emit_pipe_invalidate() back in
> __emit_job_gen12_render_compute() but not the rest.
>
> I reverted this patch in Xe kernel and started to emit a pipe_control at the end of batch buffer with:
>
> enum anv_pipe_bits flush = ANV_PIPE_FLUSH_BITS |
> 			   ANV_PIPE_L3_FABRIC_FLUSH |
> 			   ANV_PIPE_TLB_INVALIDATE;
>
> #define ANV_PIPE_FLUSH_BITS ( \
>     ANV_PIPE_DEPTH_CACHE_FLUSH_BIT | \
>     ANV_PIPE_DATA_CACHE_FLUSH_BIT | \
>     ANV_PIPE_HDC_PIPELINE_FLUSH_BIT | \
>     ANV_PIPE_UNTYPED_DATAPORT_CACHE_FLUSH_BIT | \
>     ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | \
>     ANV_PIPE_TILE_CACHE_FLUSH_BIT)
>
> And it fixed the tests.
> TLB_INVALIDATE is definitely needed as I also tested with below set of bits and it did not worked:
>
>
> enum anv_pipe_bits flush = ANV_PIPE_FLUSH_BITS |
> 			   ANV_PIPE_L3_FABRIC_FLUSH |
> 			   ANV_PIPE_DEPTH_STALL_BIT |
>                             ANV_PIPE_CONTROL_FLUSH_ENABLE |
>                             ANV_PIPE_STORE_DATA_INDEX |
>                             ANV_PIPE_END_OF_PIPE_SYNC_BIT;
>
> enum anv_pipe_bits flush = ANV_PIPE_FLUSH_BITS |
> 			   ANV_PIPE_L3_FABRIC_FLUSH |
> 			   ANV_PIPE_DEPTH_STALL_BIT |
>                             ANV_PIPE_CONTROL_FLUSH_ENABLE |
>                             ANV_PIPE_STORE_DATA_INDEX |
>                             ANV_PIPE_END_OF_PIPE_SYNC_BIT |
>                             ANV_PIPE_COMMAND_CACHE_INVALIDATE_ENABLE |
>                             ANV_PIPE_CS_STALL_BIT |
>                             ANV_PIPE_INVALIDATE_BITS;
>
> So it might be leaving something in the engine TLB that could also be exploited by malicious software.
>
>> Are the mesa issues being seen on all platforms or just specific ones?
> I can reproduce on gfx12.0 platforms and DG2 but did not tested the fix in DG2 yet.

I just wanted to point out this PIPE_CONTROL bit has a bit of a 
misleading name.

It does more than invalidation :

     "If ENABLED, PIPE_CONTROL command will flush the in flight data 
written out by render engine to Global Observation point on flush done. 
Also Requires stall bit ([20] of DW1) set."

Not sure if that changes you mind about whether it should be in 
emit_flush or not :)

-Lionel

>
>>
>> Matt
>>
>>> I would think so, yes, except that the GUC TLB invalidation is GT-wide and
>>> I'm not sure whether PIPE_CONTROL_TLB_INVALIDATE is per hw engine or per-GT.
>>> But given this really has an impact, it might be that we need to invalidate
>>> TLB also after a bind where we previously pointed to the scratch page.
>>>
>>> If that's indeed the culprit we should look at issuing a
>>> PIPE_CONTROL_TLB_INVALIDATE on the exec following a bind or rebind, and
>>> leave the GuC TLB invalidations for unbinds, and then this patch makes sense
>>> as a start.
>>>
>>>
>>>> Do you see any PIPE_CONTROL flush at the end of batch buffers that i915 does but Xe don't?
>>> The emit_fini_breadcrumb() called from __i915_request_submit() indeed seems
>>> to emit the flushes needed, whereas the corresponding
>>> emit_pipe_imm_ggtt() in xe doesn't.
>>>
>>> /Thomas
>>>
>>>
>>>>> /Thomas
>>>>>
>>>>>> /Thomas
>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>> Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 1 +
>>>>>>>>>     drivers/gpu/drm/xe/xe_ring_ops.c          | 6 ++++--
>>>>>>>>>     2 files changed, 5 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
>>>>>>>>> b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
>>>>>>>>> index 0f9c5b0b8a3ba..7c7320efea739 100644
>>>>>>>>> --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
>>>>>>>>> +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
>>>>>>>>> @@ -73,6 +73,7 @@
>>>>>>>>>     #define
>>>>>>>>> PIPE_CONTROL_STORE_DATA_INDEX                        (1<<21)
>>>>>>>>>     #define
>>>>>>>>> PIPE_CONTROL_CS_STALL                                (1<<20)
>>>>>>>>>     #define PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET           (1<<19)
>>>>>>>>> +#define PIPE_CONTROL_TLB_INVALIDATE                  (1<<18)
>>>>>>>>>     #define
>>>>>>>>> PIPE_CONTROL_PSD_SYNC                                (1<<17)
>>>>>>>>>     #define
>>>>>>>>> PIPE_CONTROL_QW_WRITE                                (1<<14)
>>>>>>>>>     #define PIPE_CONTROL_DEPTH_STALL                     (1<<13)
>>>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
>>>>>>>>> b/drivers/gpu/drm/xe/xe_ring_ops.c
>>>>>>>>> index d2fa0b4c8bcc0..4f3ef39109b9b 100644
>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
>>>>>>>>> @@ -37,7 +37,8 @@
>>>>>>>>>                    PIPE_CONTROL_INDIRECT_STATE_DISABLE | \
>>>>>>>>>                    PIPE_CONTROL_FLUSH_ENABLE | \
>>>>>>>>>                    PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | \
>>>>>>>>> -               PIPE_CONTROL_DC_FLUSH_ENABLE)
>>>>>>>>> +               PIPE_CONTROL_DC_FLUSH_ENABLE | \
>>>>>>>>> +               PIPE_CONTROL_TLB_INVALIDATE)
>>>>>>>>>       static u32 preparser_disable(bool state)
>>>>>>>>>     {
>>>>>>>>> @@ -117,7 +118,8 @@ static int emit_pipe_invalidate(u32 mask_flags,
>>>>>>>>> u32 *dw, int i)
>>>>>>>>>                    PIPE_CONTROL_CONST_CACHE_INVALIDATE |
>>>>>>>>>                    PIPE_CONTROL_STATE_CACHE_INVALIDATE |
>>>>>>>>>                    PIPE_CONTROL_QW_WRITE |
>>>>>>>>> -               PIPE_CONTROL_STORE_DATA_INDEX;
>>>>>>>>> +               PIPE_CONTROL_STORE_DATA_INDEX |
>>>>>>>>> +               PIPE_CONTROL_TLB_INVALIDATE;
>>>>>>>>>              flags &= ~mask_flags;