[PATCH] tests/intel/xe_exec_system_allocator: Add prefetch-sys-benchmark

Matthew Brost matthew.brost at intel.com
Fri Aug 8 18:21:30 UTC 2025


On Fri, Aug 08, 2025 at 12:57:57PM +0200, Francois Dugast wrote:
> On Thu, Aug 07, 2025 at 08:24:02PM -0700, Matthew Brost wrote:
> > Add prefetch-sys-benchmark to uses prefetch to system rather than
> > faulting it back and measures prefetch to system bandwidth in GB/s.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> >  tests/intel/xe_exec_system_allocator.c | 42 +++++++++++++++++++++++---
> >  1 file changed, 38 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tests/intel/xe_exec_system_allocator.c b/tests/intel/xe_exec_system_allocator.c
> > index dd5303855d..2703b7f1e1 100644
> > --- a/tests/intel/xe_exec_system_allocator.c
> > +++ b/tests/intel/xe_exec_system_allocator.c
> > @@ -775,6 +775,7 @@ partial(int fd, struct drm_xe_engine_class_instance *eci, unsigned int flags)
> >  #define THREADS			(0x1 << 23)
> >  #define PROCESSES		(0x1 << 24)
> >  #define PREFETCH_BENCHMARK	(0x1 << 25)
> > +#define PREFETCH_SYS_BENCHMARK	(0x1 << 26)
> 
> Nits: It could help keep things unambiguous to use the same term here and
> in printf() for SYS/SRAM, and also to rename PREFETCH_BENCHMARK to add the
> direction.
>

Let me clean this up at merge.
 
> Besides, test_exec() is becoming huge. This code section might be a good
> candidate to extract as helper as it is almost a duplicate. Anyway, this
> larger refactoring would be for a follow-up, so:
> 

I agree. Let me take a pass at cleaning up this function, I want to
retain the functionality of one large data path called in many different
ways, but clean it up for readablity.

Matt

> Reviewed-by: Francois Dugast <francois.dugast at intel.com>
> 
> >  
> >  #define N_MULTI_FAULT		4
> >  
> > @@ -955,6 +956,10 @@ partial(int fd, struct drm_xe_engine_class_instance *eci, unsigned int flags)
> >   * Description: Prefetch a 64M buffer 128 times, measure bandwidth of prefetch
> >   * Test category: performance test
> >   *
> > + * SUBTEST: prefetch-sys-benchmark
> > + * Description: Prefetch a 64M buffer 128 times, measure bandwidth of prefetch in both directions
> > + * Test category: performance test
> > + *
> >   * SUBTEST: threads-shared-vm-shared-alloc-many-stride-malloc
> >   * Description: Create multiple threads with a shared VM triggering faults on different hardware engines to same addresses
> >   * Test category: stress test
> > @@ -1022,7 +1027,7 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
> >  	struct aligned_alloc_type aligned_alloc_type;
> >  	uint32_t mem_region = vram_if_possible(fd, eci->gt_id);
> >  	uint32_t region = mem_region & 4 ? 2 : mem_region & 2 ? 1 : 0;
> > -	uint64_t prefetch_ns = 0;
> > +	uint64_t prefetch_ns = 0, prefetch_system_ns = 0;
> >  	const char *pf_count_stat = "svm_pagefault_count";
> >  
> >  	if (flags & MULTI_FAULT) {
> > @@ -1350,8 +1355,25 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
> >  				} else {
> >  					igt_assert_eq(data[idx].data,
> >  						      READ_VALUE(&data[idx]));
> > -					if (flags & PREFETCH_BENCHMARK)
> > +					if (flags & PREFETCH_SYS_BENCHMARK) {
> > +						struct timespec tv = {};
> > +						u64 start, end;
> > +
> > +						sync[0].addr = to_user_pointer(bind_ufence);
> > +
> > +						start = igt_nsec_elapsed(&tv);
> > +						xe_vm_prefetch_async(fd, vm, 0, 0, addr, bo_size, sync,
> > +								     1, 0);
> > +						end = igt_nsec_elapsed(&tv);
> > +
> > +						xe_wait_ufence(fd, bind_ufence, USER_FENCE_VALUE, 0,
> > +							       FIVE_SEC);
> > +						bind_ufence[0] = 0;
> > +
> > +						prefetch_system_ns += (end - start);
> > +					} else if (flags & PREFETCH_BENCHMARK) {
> >  						memset(data, 5, bo_size);
> > +					}
> >  
> >  					if (flags & MULTI_FAULT) {
> >  						for (j = 1; j < N_MULTI_FAULT; ++j) {
> > @@ -1438,11 +1460,17 @@ test_exec(int fd, struct drm_xe_engine_class_instance *eci,
> >  		prev_idx = idx;
> >  	}
> >  
> > -	if (flags & PREFETCH_BENCHMARK)
> > -		igt_info("Prefetch execution took %.3fms, %.1f5 GB/s\n",
> > +	if (flags & PREFETCH_BENCHMARK) {
> > +		igt_info("Prefetch VRAM execution took %.3fms, %.1f5 GB/s\n",
> >  			 1e-6 * prefetch_ns,
> >  			 bo_size * n_execs  / (float)prefetch_ns);
> >  
> > +		if (flags & PREFETCH_SYS_BENCHMARK)
> > +			igt_info("Prefetch SRAM execution took %.3fms, %.1f5 GB/s\n",
> > +				 1e-6 * prefetch_system_ns,
> > +				 bo_size * n_execs  / (float)prefetch_system_ns);
> > +	}
> > +
> >  	if (!(flags & FAULT) && flags & PREFETCH &&
> >  	    (flags & MMAP || !(flags & (NEW | THREADS | PROCESSES)))) {
> >  		int pf_count_after = xe_gt_stats_get_count(fd, eci->gt_id,
> > @@ -1917,6 +1945,12 @@ igt_main
> >  			test_exec(fd, hwe, 1, 128, SZ_64M, 0, 0, NULL,
> >  				  NULL, PREFETCH | PREFETCH_BENCHMARK);
> >  
> > +	igt_subtest_f("prefetch-sys-benchmark")
> > +		xe_for_each_engine(fd, hwe)
> > +			test_exec(fd, hwe, 1, 128, SZ_64M, 0, 0, NULL,
> > +				  NULL, PREFETCH | PREFETCH_BENCHMARK |
> > +				  PREFETCH_SYS_BENCHMARK);
> > +
> >  	igt_subtest("threads-shared-vm-shared-alloc-many-stride-malloc")
> >  		threads(fd, 1, 128, 0, 256, SHARED_ALLOC, true);
> >  
> > -- 
> > 2.34.1
> > 


More information about the igt-dev mailing list