[PATCH 3/3] drm/xe: Allow scratch page under fault mode for certain platform

Matthew Brost matthew.brost at intel.com
Tue Feb 25 22:05:36 UTC 2025


On Tue, Feb 04, 2025 at 01:45:58PM -0500, Oak Zeng wrote:
> Normally scratch page is not allowed when a vm is operate under page
> fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
> and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
> is fault mode relies on recoverable page to work, while scratch page
> can mute recoverable page fault.
> 
> On xe2 and xe3, out of bound prefetch can cause page fault and further
> system hang because xekmd can't resolve such page fault. SYCL and OCL
> language runtime requires out of bound prefetch to be silently dropped
> without causing any functional problem, thus the existing behavior
> doesn't meet language runtime requirement.
> 
> At the same time, HW prefetching can cause page fault interrupt. Due to
> page fault interrupt overhead (i.e., need Guc and KMD involved to fix
> the page fault), HW prefetching can be slowed by many orders of magnitude.
> 
> Fix those problems by allowing scratch page under fault mode for xe2 and
> xe3. With scratch page in place, HW prefetching could always hit scratch
> page instead of causing interrupt.
> 
> A side effect is, scratch page could hide application program error.
> Application out of bound accesses are hided by scratch page mapping,
> instead of get reported to user.
> 
> igt test: https://patchwork.freedesktop.org/series/144334/. Test result on
> BMG:
> 

I'd include the IGT information in the cover letter, not the patch.

> root at DUT1130BMGFRD:/home/szeng/dii-tools/igt-public/build/tests# ./xe_exec_fault_mode --run-subtest scratch-fault
> IGT-Version: 1.30-gde1a3cb42 (x86_64) (Linux: 6.13.0-xe x86_64)
> Using IGT_SRANDOM=1738684805 for randomisation
> Opened device: /dev/dri/card0
> Starting subtest: scratch-fault
> Subtest scratch-fault: SUCCESS (0.080s)
> 
> Without this series, the test result is:
> 
> root at DUT1130BMGFRD:/home/szeng/dii-tools/igt-public/build/tests# ./xe_exec_fault_mode --run-subtest scratch-fault
> IGT-Version: 1.30-gde1a3cb42 (x86_64) (Linux: 6.13.0-xe x86_64)
> Using IGT_SRANDOM=1738686046 for randomisation
> Opened device: /dev/dri/card0
> Starting subtest: scratch-fault
> (xe_exec_fault_mode:5047) CRITICAL: Test assertion failure function test_exec, file ../tests/intel/xe_exec_fault_mode.c:349:
> (xe_exec_fault_mode:5047) CRITICAL: Failed assertion: __xe_wait_ufence(fd, &exec_sync[i], 0xdeadbeefdeadbeefull, exec_queues[i % n_exec_queues], &timeout) == 0
> (xe_exec_fault_mode:5047) CRITICAL: Last errno: 62, Timer expired
> (xe_exec_fault_mode:5047) CRITICAL: error: -62 != 0
> Stack trace:
>   #0 ../lib/igt_core.c:2266 __igt_fail_assert()
>   #1 ../tests/intel/xe_exec_fault_mode.c:346 test_exec()
>   #2 ../tests/intel/xe_exec_fault_mode.c:537 __igt_unique____real_main407()
>   #3 ../tests/intel/xe_exec_fault_mode.c:407 main()
>   #4 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
>   #5 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
>   #6 [_start+0x2e]
> Subtest scratch-fault failed.
> 
> v2: Refine commit message (Thomas)
> 
> Signed-off-by: Oak Zeng <oak.zeng at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_vm.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 813d893d9b63..c0372f083d42 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1752,6 +1752,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
>  	if (XE_IOCTL_DBG(xe, args->extensions))
>  		return -EINVAL;
>  
> +	if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> +			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE &&
> +			 !(NEEDS_SCRATCH(xe))))

Same comment as patch #1, I'd drop this macro.

Do we need a query uAPI so the UMD can test upon process open if the VM
supports scratch page + faults? Or should we just not restrict VM
scratch page + faults ever and have it choose based on platform
recommnedation?

Matt

> +		return -EINVAL;
> +
>  	if (XE_WA(xe_root_mmio_gt(xe), 14016763929))
>  		args->flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE;
>  
> @@ -1765,10 +1770,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
>  	if (XE_IOCTL_DBG(xe, args->flags & ~ALL_DRM_XE_VM_CREATE_FLAGS))
>  		return -EINVAL;
>  
> -	if (XE_IOCTL_DBG(xe, args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE &&
> -			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
> -		return -EINVAL;
> -
>  	if (XE_IOCTL_DBG(xe, !(args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE) &&
>  			 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE))
>  		return -EINVAL;
> -- 
> 2.26.3
> 


More information about the Intel-xe mailing list