[Intel-xe] [PATCH] drm/xe: Fix and clarify PAT Index selection.

Matt Roper matthew.d.roper at intel.com
Fri Jul 28 21:26:42 UTC 2023


On Thu, Jul 27, 2023 at 05:37:45PM -0400, Rodrigo Vivi wrote:
> PAT selection here in the Xe driver had inherited a lot
> of history from i915 and ended up with some strange cases
> like doing logical 'or' with 0, or even unnecessarily selecting
> pat_index = 7, instead of simply opting for the 0.
> 
> XXX: More work needs to be done in MTL and PVC for the right
> selection, starting from the fact that pat_index = 2 is now the
> uncached and pat_index = 3 is WB with some kind of coherence.
> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.h      | 24 ++++++++++++++++++++----
>  drivers/gpu/drm/xe/xe_migrate.c |  4 ++--
>  drivers/gpu/drm/xe/xe_pt.c      | 10 +++++-----
>  3 files changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index b29750a47d23..8c80c0784010 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -37,10 +37,26 @@
>  #define XE_BO_INTERNAL_TEST		BIT(30)
>  #define XE_BO_INTERNAL_64K		BIT(31)
>  
> -#define PPAT_UNCACHED                   GENMASK_ULL(4, 3)
> -#define PPAT_CACHED_PDE                 0
> -#define PPAT_CACHED                     BIT_ULL(7)
> -#define PPAT_DISPLAY_ELLC               BIT_ULL(4)
> +/*
> + * PAT Index Mechanism
> + *
> + * Some bits encoded in the page table entries and directories,
> + * are put together to form the "PAT_INDEX". The value of
> + * PAT Index points to some cache options such as:
> + *
> + *   PAT_INDEX = 0 --> WB Cache
> + *   PAT_INDEX = 1 --> WC Cache
> + *   PAT_INDEX = 2 --> WT Cache
> + *   PAT_INDEX = 3 --> Uncached
> + *   (The full per-platform table can be found at xe_pat.c)
> + *
> + * The bits that form the PAT index for PPGTT are:
> + *   PTE     - [62][ 7][4][3]
> + *   PDE/PDP - [62][12][4][3]
> + */
> +#define PPAT_INDEX			(BIT_ULL(62) | BIT_ULL(7) | BIT_ULL(4) | BIT_ULL(3))
> +#define PPAT_WT				BIT_ULL(4) /* PAT_INDEX = 2 */
> +#define PPAT_UNCACHED			GENMASK_ULL(4, 3) /* PAT_INDEX = 3 */

Which index corresponds to what behavior is platform-specific and will
likely change with every major hardware change.  We should eliminate
things like PPAT_WT or PPAT_UNCACHED completely.  If there's a
platform-specific PAT index which is used internally by the driver in a
bunch of places (e.g., "give me fully uncached behavior") we should
cache that index in something like xe_gt->pat_uc.  It can't be a
driver-wide constant.

Generally the PAT selection is either going to come from userspace (and
winds up getting passed all the way down to the pte/pde encoding
functions), or we're working with some kernel-allocated buffer (like a
GuC CTB or a golden LRC) and the exact characteristics that make sense
need to be selected on a platform-by-platform basis at init time.

>  
>  #define XE_PTE_SHIFT			12
>  #define XE_PAGE_SIZE			(1 << XE_PTE_SHIFT)
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index aad76a6a8094..5d33ee1aa3a0 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -257,7 +257,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  
>  		level = 2;
>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
> -		flags = XE_PAGE_RW | XE_PAGE_PRESENT | PPAT_CACHED |
> +		flags = XE_PAGE_RW | XE_PAGE_PRESENT |

Although a constant PPAT_CACHED isn't correct, it also isn't correct to
just leave the PAT index as 0 either since that will result in different
behavior on different platforms.

>  			XE_PPGTT_PTE_DM | XE_PDPE_PS_1G;
>  
>  		/*
> @@ -465,7 +465,7 @@ static void emit_pte(struct xe_migrate *m,
>  				addr += vram_region_gpu_offset(bo->ttm.resource);
>  				addr |= XE_PPGTT_PTE_DM;
>  			}
> -			addr |= PPAT_CACHED | XE_PAGE_PRESENT | XE_PAGE_RW;
> +			addr |= XE_PAGE_PRESENT | XE_PAGE_RW;
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 3129fdb0551c..20f2d1139e8c 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -65,10 +65,10 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
>  	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>  	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
>  
> -	/* FIXME: I don't think the PPAT handling is correct for MTL */
> +	/* FIXME: the PPAT handling is incorrect for MTL and PVC */
>  
>  	if (level != XE_CACHE_NONE)

The important thing to accomplish for these encoding functions is to
eliminate these symbolic values like XE_CACHE_NONE completely.
Eventually we should just take the PAT index as a parameter and encode
it directly without trying to interpret what it means.  The upper layers
of the code (and/or userspace) are the ones responsible for selecting
which index is actually appropriate.


Matt

> -		pde |= PPAT_CACHED_PDE;
> +		pde &= ~PPAT_INDEX;
>  	else
>  		pde |= PPAT_UNCACHED;
>  
> @@ -86,17 +86,17 @@ static u64 __pte_encode(u64 pte, enum xe_cache_level cache,
>  	if (unlikely(vma && xe_vma_is_null(vma)))
>  		pte |= XE_PTE_NULL;
>  
> -	/* FIXME: I don't think the PPAT handling is correct for MTL */
> +	/* FIXME: the PPAT handling is incorrect for MTL and PVC */
>  
>  	switch (cache) {
>  	case XE_CACHE_NONE:
>  		pte |= PPAT_UNCACHED;
>  		break;
>  	case XE_CACHE_WT:
> -		pte |= PPAT_DISPLAY_ELLC;
> +		pte |= PPAT_WT;
>  		break;
>  	default:
> -		pte |= PPAT_CACHED;
> +		pte &= ~PPAT_INDEX;
>  		break;
>  	}
>  
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list