[PATCH v3] drm/xe: Align all 64k VRAM buffers physically when multiple of 64k.
Souza, Jose
jose.souza at intel.com
Thu Aug 22 16:16:19 UTC 2024
On Thu, 2024-08-22 at 10:55 -0400, Rodrigo Vivi wrote:
> On Thu, Aug 22, 2024 at 04:23:46PM +0200, Maarten Lankhorst wrote:
> > For CCS formats on affected platforms, CCS can be used freely, but
> > display engine requires a multiple of 64k physical pages. No other
> > changes are needed.
> >
> > At the BO creation time we don't know if the BO will be used for CCS
> > or not. If the scanout flag is set, and the BO is a multiple of 64k,
>
> I don't see this happening in the code anymore. Where's the check for
> the scanout flag? What am I missing?
>
> > we take the safe route and force the physical alignment of 64k pages.
>
> If I had understood it correctly, this is one of the things that
> Jose was asking us to avoid since the alignment on the address
> was not actually needed. But I might have understood him.
>
> Cc: José Roberto de Souza <jose.souza at intel.com>
Physical address alignment don't affect Mesa, well it could affect performance but asking around it would probably improve performance but at the cost
of fragment physical memory placement.
+ if ((bo->flags & XE_BO_FLAG_INTERNAL_64K) &&
+ (xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)) {
This skips the block that checks if VMA address is aligned to 64k as agreed but I think it deserve some comment, it is not simple to understand it now
and will only get more complicated with time.
Will ask Zhang from Mesa team to try it out.
>
> >
> > If the BO is not a multiple of 64k, or the scanout flag was not set
> > at BO creation, we reject it for usage as CCS in display. The physical
> > pages are likely not aligned correctly, and this will cause corruption
> > when used as FB.
> >
> > This is a slightly different approach from my previous patch. Instead
> > of requiring a scanout flag at FB creation, we now make all buffers of
> > the right size physically aligned correctly, so no change from userspace
> > is needed.
>
> could be a v3: mark and a more imperative language. Something like:
>
> v3: Instead of requiring scanout flag at FB creation, we now make...
>
> >
> > It will be interesting to see if it affects performance in any way,
> > could potentially even improve things with 64k PTE's.
> >
> > Inspired by Zbigniews patch.
>
> This 2 phrases above should probably be added below the '---' mark,
> and not as part of the commit message.
>
> >
> > Signed-off-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> > Co-developed-by: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
>
> Zbigniew, we need your signoff-by here as well if this is the case.
>
> > Cc: Matthew Auld <matthew.auld at intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> > Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila at intel.com>
> > ---
> > Changes since previous version:
> > - Drop DISPLAY_NEED64K.
> > - Hardcode check for I915_FORMAT_MOD_4_TILED_BMG_CCS, only one affected.
>
> I like these 2 changes for a cleaner code.
>
> >
> > drivers/gpu/drm/xe/display/intel_fb_bo.c | 5 +++++
> > drivers/gpu/drm/xe/xe_bo.c | 10 ++++++++++
> > drivers/gpu/drm/xe/xe_vm.c | 3 ++-
> > 3 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/xe/display/intel_fb_bo.c b/drivers/gpu/drm/xe/display/intel_fb_bo.c
> > index f835492f73fb4..de613325e0bb0 100644
> > --- a/drivers/gpu/drm/xe/display/intel_fb_bo.c
> > +++ b/drivers/gpu/drm/xe/display/intel_fb_bo.c
> > @@ -28,6 +29,10 @@ int intel_fb_bo_framebuffer_init(struct intel_framebuffer *intel_fb,
> > struct xe_device *xe = to_xe_device(bo->ttm.base.dev);
> > int ret;
> >
> > + if (XE_IOCTL_DBG(xe, mode_cmd->modifier[0] == I915_FORMAT_MOD_4_TILED_BMG_CCS &&
> > + !(bo->flags & XE_BO_FLAG_NEEDS_64K)))
>
> parenthesis misallignment
>
> > + return -EINVAL;
> > +
> > xe_bo_get(bo);
> >
> > ret = ttm_bo_reserve(&bo->ttm, true, false, NULL);
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > index 6ed0e19552159..dd54cbc14e9d8 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -2019,6 +2019,16 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> >
> > bo_flags |= args->placement << (ffs(XE_BO_FLAG_SYSTEM) - 1);
> >
> > + /*
> > + * Lets see what happens if we simply align any buffer that's
> > + * a multiple of 64k to 64k in places where it's not officially
> > + * needed.
> > + */
> > + if ((bo_flags & XE_BO_FLAG_VRAM_MASK) &&
> > + !(xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) &&
> > + !(args->size % SZ_64K))
>
> shouldn't we be checking for the scanout here as well?
>
> > + bo_flags |= XE_BO_FLAG_NEEDS_64K;
> > +
> > if (args->flags & DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM) {
> > if (XE_IOCTL_DBG(xe, !(bo_flags & XE_BO_FLAG_VRAM_MASK)))
> > return -EINVAL;
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > index d1bfd0b6e9558..af215f6d6588b 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -2878,7 +2878,8 @@ static int xe_vm_bind_ioctl_validate_bo(struct xe_device *xe, struct xe_bo *bo,
> > return -EINVAL;
> > }
> >
> > - if (bo->flags & XE_BO_FLAG_INTERNAL_64K) {
> > + if ((bo->flags & XE_BO_FLAG_INTERNAL_64K) &&
> > + (xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)) {
> > if (XE_IOCTL_DBG(xe, obj_offset &
> > XE_64K_PAGE_MASK) ||
> > XE_IOCTL_DBG(xe, addr & XE_64K_PAGE_MASK) ||
> > --
> > 2.45.2
> >
More information about the Intel-xe
mailing list