[Intel-xe] [PATCH 1/2] drm/xe: Invalidate TLB also on bind if in scratch page mode
Souza, Jose
jose.souza at intel.com
Fri Jun 9 15:54:52 UTC 2023
On Fri, 2023-06-09 at 15:51 +0000, Matthew Brost wrote:
> On Wed, Jun 07, 2023 at 07:47:28PM +0200, Thomas Hellström wrote:
> > For scratch table mode we need to cover the case where a scratch PTE might
> > have been pre-fetched and cached and used instead of that of the newly
> > bound vma.
> > For compute vms, invalidate TLB globally using GuC before signalling
> > bind complete. For !long-running vms, invalidate TLB at batch start.
> >
> > Also document how TLB invalidation works.
> >
> > Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > ---
> > drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 1 +
> > drivers/gpu/drm/xe/xe_pt.c | 17 +++++++++++++++--
> > drivers/gpu/drm/xe/xe_ring_ops.c | 15 ++++++++++++---
> > 3 files changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > index 0f9c5b0b8a3b..d2d41f717525 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > @@ -73,6 +73,7 @@
> > #define PIPE_CONTROL_STORE_DATA_INDEX (1<<21)
> > #define PIPE_CONTROL_CS_STALL (1<<20)
> > #define PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET (1<<19)
> > +#define PIPE_CONTROL_TLB_INVALIDATE (1<<18)
> > #define PIPE_CONTROL_PSD_SYNC (1<<17)
> > #define PIPE_CONTROL_QW_WRITE (1<<14)
> > #define PIPE_CONTROL_DEPTH_STALL (1<<13)
> > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > index bef265715000..e817fa9fe65e 100644
> > --- a/drivers/gpu/drm/xe/xe_pt.c
> > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > @@ -1297,7 +1297,20 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
> >
> > xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries);
> >
> > - if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
> > + /*
> > + * If rebind, we have to invalidate TLB on !LR vms to invalidate
> > + * cached PTEs point to freed memory. on LR vms this is done
> > + * automatically when the context is re-enabled by the rebind worker,
> > + * or in fault mode it was invalidated on PTE zapping.
> > + *
> > + * If !rebind, and scratch enabled VMs, there is a chance the scratch
> > + * PTE is already cached in the TLB so it needs to be invalidated.
> > + * on !LR VMs this is done in the ring ops preceding a batch, but on
> > + * non-faulting LR, in particular on user-space batch buffer chaining,
> > + * it needs to be done here.
> > + */
> > + if ((rebind && !xe_vm_no_dma_fences(vm)) ||
> > + (!rebind && vm->scratch_bo[tile->id] && xe_vm_in_compute_mode(vm))) {
> > ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
> > if (!ifence)
> > return ERR_PTR(-ENOMEM);
> > @@ -1313,7 +1326,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_engine *e,
> > LLIST_HEAD(deferred);
> >
> > /* TLB invalidation must be done before signaling rebind */
> > - if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
> > + if (ifence) {
> > int err = invalidation_fence_init(tile->primary_gt, ifence, fence,
> > vma);
> > if (err) {
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index 2deee7a2bb14..c20fe41c0729 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -15,6 +15,7 @@
> > #include "xe_macros.h"
> > #include "xe_sched_job.h"
> > #include "xe_vm_types.h"
> > +#include "xe_vm.h"
> >
> > /*
> > * 3D-related flags that can't be set on _engines_ that lack access to the 3D
> > @@ -107,7 +108,7 @@ static int emit_flush_invalidate(u32 flag, u32 *dw, int i)
> > return i;
> > }
> >
> > -static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
> > +static int emit_pipe_invalidate(u32 mask_flags, u32 extra_flags, u32 *dw, int i)
> > {
> > u32 flags = PIPE_CONTROL_CS_STALL |
> > PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> > @@ -117,7 +118,8 @@ static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
> > PIPE_CONTROL_CONST_CACHE_INVALIDATE |
> > PIPE_CONTROL_STATE_CACHE_INVALIDATE |
> > PIPE_CONTROL_QW_WRITE |
> > - PIPE_CONTROL_STORE_DATA_INDEX;
> > + PIPE_CONTROL_STORE_DATA_INDEX |
> > + extra_flags;
> >
> > flags &= ~mask_flags;
> >
> > @@ -250,14 +252,21 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
> > struct xe_gt *gt = job->engine->gt;
> > struct xe_device *xe = gt_to_xe(gt);
> > bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> > + struct xe_vm *vm = job->engine->vm;
> > u32 mask_flags = 0;
> > + u32 extra_flags = 0;
> >
> > dw[i++] = preparser_disable(true);
> > if (lacks_render)
> > mask_flags = PIPE_CONTROL_3D_ARCH_FLAGS;
> > else if (job->engine->class == XE_ENGINE_CLASS_COMPUTE)
> > mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
> > - i = emit_pipe_invalidate(mask_flags, dw, i);
> > +
> > + /* See xe_pt.c for a discussion on TLB invalidations. */
> > + if (!xe_vm_no_dma_fences(vm) && vm->scratch_bo[gt_to_tile(gt)->id])
> > + extra_flags = PIPE_CONTROL_TLB_INVALIDATE;
>
> I think we need a similar if statement + emit_flush_invalidate call in
> the functions that emit jobs for different classes too, right?
Handled in the new version: https://patchwork.freedesktop.org/series/119124/
>
> e.g. emit_job_gen12_copy, emit_job_gen12_video
>
> Matt
>
> > +
> > + i = emit_pipe_invalidate(mask_flags, extra_flags, dw, i);
> >
> > /* hsdes: 1809175790 */
> > if (has_aux_ccs(xe))
> > --
> > 2.39.2
> >
More information about the Intel-xe
mailing list