[Intel-xe] [PATCH] drm/xe: Fix and clarify PAT Index selection.
Matt Roper
matthew.d.roper at intel.com
Fri Jul 28 21:26:42 UTC 2023
On Thu, Jul 27, 2023 at 05:37:45PM -0400, Rodrigo Vivi wrote:
> PAT selection here in the Xe driver had inherited a lot
> of history from i915 and ended up with some strange cases
> like doing logical 'or' with 0, or even unnecessarily selecting
> pat_index = 7, instead of simply opting for the 0.
>
> XXX: More work needs to be done in MTL and PVC for the right
> selection, starting from the fact that pat_index = 2 is now the
> uncached and pat_index = 3 is WB with some kind of coherence.
>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
> drivers/gpu/drm/xe/xe_bo.h | 24 ++++++++++++++++++++----
> drivers/gpu/drm/xe/xe_migrate.c | 4 ++--
> drivers/gpu/drm/xe/xe_pt.c | 10 +++++-----
> 3 files changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index b29750a47d23..8c80c0784010 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -37,10 +37,26 @@
> #define XE_BO_INTERNAL_TEST BIT(30)
> #define XE_BO_INTERNAL_64K BIT(31)
>
> -#define PPAT_UNCACHED GENMASK_ULL(4, 3)
> -#define PPAT_CACHED_PDE 0
> -#define PPAT_CACHED BIT_ULL(7)
> -#define PPAT_DISPLAY_ELLC BIT_ULL(4)
> +/*
> + * PAT Index Mechanism
> + *
> + * Some bits encoded in the page table entries and directories,
> + * are put together to form the "PAT_INDEX". The value of
> + * PAT Index points to some cache options such as:
> + *
> + * PAT_INDEX = 0 --> WB Cache
> + * PAT_INDEX = 1 --> WC Cache
> + * PAT_INDEX = 2 --> WT Cache
> + * PAT_INDEX = 3 --> Uncached
> + * (The full per-platform table can be found at xe_pat.c)
> + *
> + * The bits that form the PAT index for PPGTT are:
> + * PTE - [62][ 7][4][3]
> + * PDE/PDP - [62][12][4][3]
> + */
> +#define PPAT_INDEX (BIT_ULL(62) | BIT_ULL(7) | BIT_ULL(4) | BIT_ULL(3))
> +#define PPAT_WT BIT_ULL(4) /* PAT_INDEX = 2 */
> +#define PPAT_UNCACHED GENMASK_ULL(4, 3) /* PAT_INDEX = 3 */
Which index corresponds to what behavior is platform-specific and will
likely change with every major hardware change. We should eliminate
things like PPAT_WT or PPAT_UNCACHED completely. If there's a
platform-specific PAT index which is used internally by the driver in a
bunch of places (e.g., "give me fully uncached behavior") we should
cache that index in something like xe_gt->pat_uc. It can't be a
driver-wide constant.
Generally the PAT selection is either going to come from userspace (and
winds up getting passed all the way down to the pte/pde encoding
functions), or we're working with some kernel-allocated buffer (like a
GuC CTB or a golden LRC) and the exact characteristics that make sense
need to be selected on a platform-by-platform basis at init time.
>
> #define XE_PTE_SHIFT 12
> #define XE_PAGE_SIZE (1 << XE_PTE_SHIFT)
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index aad76a6a8094..5d33ee1aa3a0 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -257,7 +257,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>
> level = 2;
> ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
> - flags = XE_PAGE_RW | XE_PAGE_PRESENT | PPAT_CACHED |
> + flags = XE_PAGE_RW | XE_PAGE_PRESENT |
Although a constant PPAT_CACHED isn't correct, it also isn't correct to
just leave the PAT index as 0 either since that will result in different
behavior on different platforms.
> XE_PPGTT_PTE_DM | XE_PDPE_PS_1G;
>
> /*
> @@ -465,7 +465,7 @@ static void emit_pte(struct xe_migrate *m,
> addr += vram_region_gpu_offset(bo->ttm.resource);
> addr |= XE_PPGTT_PTE_DM;
> }
> - addr |= PPAT_CACHED | XE_PAGE_PRESENT | XE_PAGE_RW;
> + addr |= XE_PAGE_PRESENT | XE_PAGE_RW;
> bb->cs[bb->len++] = lower_32_bits(addr);
> bb->cs[bb->len++] = upper_32_bits(addr);
>
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 3129fdb0551c..20f2d1139e8c 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -65,10 +65,10 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
> pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
>
> - /* FIXME: I don't think the PPAT handling is correct for MTL */
> + /* FIXME: the PPAT handling is incorrect for MTL and PVC */
>
> if (level != XE_CACHE_NONE)
The important thing to accomplish for these encoding functions is to
eliminate these symbolic values like XE_CACHE_NONE completely.
Eventually we should just take the PAT index as a parameter and encode
it directly without trying to interpret what it means. The upper layers
of the code (and/or userspace) are the ones responsible for selecting
which index is actually appropriate.
Matt
> - pde |= PPAT_CACHED_PDE;
> + pde &= ~PPAT_INDEX;
> else
> pde |= PPAT_UNCACHED;
>
> @@ -86,17 +86,17 @@ static u64 __pte_encode(u64 pte, enum xe_cache_level cache,
> if (unlikely(vma && xe_vma_is_null(vma)))
> pte |= XE_PTE_NULL;
>
> - /* FIXME: I don't think the PPAT handling is correct for MTL */
> + /* FIXME: the PPAT handling is incorrect for MTL and PVC */
>
> switch (cache) {
> case XE_CACHE_NONE:
> pte |= PPAT_UNCACHED;
> break;
> case XE_CACHE_WT:
> - pte |= PPAT_DISPLAY_ELLC;
> + pte |= PPAT_WT;
> break;
> default:
> - pte |= PPAT_CACHED;
> + pte &= ~PPAT_INDEX;
> break;
> }
>
> --
> 2.41.0
>
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
More information about the Intel-xe
mailing list