[PATCH v2 23/25] drm/xe/device: implement transient flush
Nirmoy Das
nirmoy.das at linux.intel.com
Wed Apr 3 12:13:52 UTC 2024
Hi Bala,
On 4/3/2024 1:22 PM, Balasubramani Vivekanandan wrote:
> From: Nirmoy Das <nirmoy.das at intel.com>
>
> Display surfaces can be tagged as transient by mapping it using one of
> the various L3:XD PAT index modes on Xe2. The expectation is that KMD
> needs to request transient data flush at the start of flip sequence to
> ensure all transient data in L3 cache is flushed to memory. Add a
> routine for this which we can then call from the display code.
>
> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
> Co-developed-by: Matthew Auld <matthew.auld at intel.com>
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan at intel.com>
> ---
> drivers/gpu/drm/xe/regs/xe_gt_regs.h | 3 ++
> drivers/gpu/drm/xe/xe_device.c | 52 ++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_device.h | 2 ++
> 3 files changed, 57 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index 6617c86a096b..7afe810b3441 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -306,6 +306,9 @@
>
> #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658)
>
> +#define XE2_TDF_CTRL XE_REG(0xb418)
> +#define TRANSIENT_FLUSH_REQUEST REG_BIT(0)
> +
> #define XEHP_MERT_MOD_CTRL XE_REG_MCR(0xcf28)
> #define RENDER_MOD_CTRL XE_REG_MCR(0xcf2c)
> #define COMP_MOD_CTRL XE_REG_MCR(0xcf30)
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 01bd5ccf05ca..0c9769fe04f6 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -641,6 +641,58 @@ void xe_device_wmb(struct xe_device *xe)
> xe_mmio_write32(gt, SOFTWARE_FLAGS_SPR33, 0);
> }
>
> +/**
> + * xe_device_td_flush() - Flush transient L3 cache entries
> + * @xe: The device
> + *
> + * Display engine has direct access to memory and is never coherent with L3/L4
> + * caches (or CPU caches), however KMD is responsible for specifically flushing
> + * transient L3 GPU cache entries prior to the flip sequence to ensure scanout
> + * can happen from such a surface without seeing corruption.
> + *
> + * Display surfaces can be tagged as transient by mapping it using one of the
> + * various L3:XD PAT index modes on Xe2.
> + *
> + * Note: On non-discrete xe2 platforms, like LNL, the entire L3 cache is flushed
> + * at the end of each submission via PIPE_CONTROL for compute/render, since SA
> + * Media is not coherent with L3 and we want to support render-vs-media
> + * usescases. For other engines like copy/blt the HW internally forces uncached
> + * behaviour, hence why we can skip the TDF on such platforms.
> + */
> +void xe_device_td_flush(struct xe_device *xe)
> +{
> + struct xe_gt *gt;
> + int err;
> + u8 id;
> +
> + if (!IS_DGFX(xe) || GRAPHICS_VER(xe) < 20)
> + return;
> +
> + for_each_gt(gt, xe, id) {
> + if (xe_gt_is_media_type(gt))
> + continue;
> +
> + err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
> + if (err)
> + return;
This can be if (xe_force_wake_get()..) without needing the err variable.
Sorry, this was my oversight from this morning.
Regards,
Nirmoy
> +
> + xe_mmio_write32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST);
> + /*
> + * FIXME: We can likely do better here with our choice of
> + * timeout. Currently we just assume the worst case, but really
> + * we should make this dependent on how much actual L3 there is
> + * for this system. Recomendation is to allow ~64us in the worst
> + * case for 8M of L3 (assumes all entries are transient and need
> + * to be flushed).
> + */
> + if (xe_mmio_wait32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST, 0,
> + 150, NULL, false))
> + xe_gt_err_once(gt, "TD flush timeout\n");
> +
> + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
> + }
> +}
> +
> u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
> {
> return xe_device_has_flat_ccs(xe) ?
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index d413bc2c6be5..d3430f4b820a 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -176,4 +176,6 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p);
> u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address);
> u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address);
>
> +void xe_device_td_flush(struct xe_device *xe);
> +
> #endif
More information about the Intel-gfx
mailing list