BO alignment for kernel page size > 4kB

Tue Aug 5 09:24:08 UTC 2025

Hi Simon,

Probably best to open an issue on gitlab, this is all Anv specific stuff.
Let's not bother the entire project with it.

-Lionel

On 05/08/2025 11:13, Simon Richter wrote:
> Hi,
>
> there is a proposed patch[1] to the xe driver to make it work for 
> larger kernel page sizes. Part of this patch is to return the CPU page 
> size as DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT, so Mesa will pad size 
> requests accordingly.
>
> However, that is necessary only for CPU visible BOs, local-only is 
> still fine with 4kB. The query parameter has no context of whether the 
> allocation will be CPU visible, so I think it's the wrong place for it.
>
> We can (and do) also fix up the size inside the kernel with the 
> detected alignment, but that means that Mesa doesn't know about it.
>
> I've looked into the xe_gem_create function, and inserting an extra 
> alignment requirement there seems somewhat doable, but I'm not 
> entirely sure if that is sufficient, and I don't entirely follow the 
> meaning of all the relevant flags here.
>
> My proposed strategy:
>
> 1. the xe driver will report 4kB as DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT, 
> even if CPU page size is larger, because that is the requirement from 
> the GPU.
>
> 2. the xe driver will silently fix up the size if 
> DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM is set, to allow older 
> versions of Mesa to work.
>
> 3. xe_gem_create will align the size to the result of 
> sysconf(_SC_PAGESIZE) if ANV_BO_ALLOC_MAPPED or 
> ANV_BO_ALLOC_LOCAL_MEM_CPU_VISIBLE is set
>
> However: DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM is not set in the 
> ioctl if device->physical->vram_non_mappable.size is zero, or 
> ANV_BO_ALLOC_NO_LOCAL_MEM is set.
>
> So if we found an aperture for all of VRAM (which is quite likely on 
> platforms that have larger kernel page sizes), then Mesa will not tell 
> us that the memory must be CPU visible -- so we need to fix up the 
> size of all allocations, and we're achieving the same result as just 
> reporting a larger DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT.
>
> Does it make sense to set vram_non_mappable.size to 1 here, to force 
> Mesa to tell us if the mapping is meant to be CPU accessible?
>
> What is the role of ANV_BO_ALLOC_NO_LOCAL_MEM? To me, the logic looks 
> reversed -- if we're told not to use (device) local memory, we *don't* 
> tell the kernel that the memory should be CPU visible. Is that a bug, 
> am I misinterpreting the function of this flag, or is there some other 
> mechanism I'm unaware of that makes this work (I see this flag is used 
> for fences, which most certainly are CPU visible)?
>
> Or should we just not care to optimize device local allocations, and 
> pad everything both in the kernel and in userspace?
>
>    Simon
>
> [1] 
> https://lore.kernel.org/all/20250604-upstream-xe-non-4k-v2-v2-0-ce7905da7b08@aosc.io/