[PATCH i-g-t 1/1] lib/xe/xe_query: Wait for xe_supports_faults

Fri May 10 15:58:05 UTC 2024

Hi Cavitt,,
On 2024-05-09 at 16:30:46 +0000, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Kamil Konieczny <kamil.konieczny at linux.intel.com> 
> Sent: Thursday, May 9, 2024 9:21 AM
> To: igt-dev at lists.freedesktop.org
> Cc: Cavitt, Jonathan <jonathan.cavitt at intel.com>; Gupta, saurabhg <saurabhg.gupta at intel.com>; Welty, Brian <brian.welty at intel.com>; Mistat, Tomasz <tomasz.mistat at intel.com>; Girotra, Himanshu <himanshu.girotra at intel.com>
> Subject: Re: [PATCH i-g-t 1/1] lib/xe/xe_query: Wait for xe_supports_faults
> > 
> > Hi Jonathan,
> > On 2024-05-08 at 12:35:45 -0700, Jonathan Cavitt wrote:
> > 
> > could you update subject of the patch? Or split it into two
> > patches.
> 
> How does "xe: Check xe_supports_faults for EBUSY" sound?
> 

tests/intel/xe_exec_fault_mode: account for EBUSY in support check

looks ok. If you want to use lib/ something like:

lib/xe/xe_query: return errno from xe_supports_faults check

or some other better description...

> > 
> > > It's possible for xe_supports_faults to return false if the system is
> > > busy with multiple running tests.  This is because the check looks for
> > > all active VMs and searches for VMs that do not have faults enabled,
> > > returning false if any exist.  Recently, this check has been changed to
> > > return EBUSY when the check fails in this way, so wait for up to ten
> > > seconds for all the active VMs to flush out before proceeding.
> > > 
> > > Suggested-by: Brian Welty <brian.welty at intel.com>
> > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > > ---
> > >  lib/xe/xe_query.c                | 15 ++++++++-------
> > >  lib/xe/xe_query.h                |  2 +-
> > >  tests/intel/xe_exec_fault_mode.c |  9 ++++++++-
> > >  3 files changed, 17 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/lib/xe/xe_query.c b/lib/xe/xe_query.c
> > > index 6df8f42649..145dee8142 100644
> > > --- a/lib/xe/xe_query.c
> > > +++ b/lib/xe/xe_query.c
> > > @@ -300,27 +300,28 @@ void xe_device_put(int fd)
> > >   * xe_supports_faults:
> > >   * @fd: xe device fd
> > >   *
> > > - * Returns true if xe device @fd allows creating vm in fault mode otherwise
> > > - * false.
> > > + * Returns the return value of the ioctl.  This can either be 0 if the
> > > + * xe device @fd allows creating a vm in fault mode, or an error value
> > > + * if it does not.
> > 
> > It is not consistent with your description above, as you can get
> > non-zero return -EBUSY and after a wait it will return 0.
> > 

Drop my comment from here.

> > >   *
> > >   * NOTE: This function temporarily creates a VM in fault mode. Hence, while
> > >   * this function is executing, no non-fault mode VMs can be created.
> > >   */
> > > -bool xe_supports_faults(int fd)
> > > +int xe_supports_faults(int fd)
> > >  {
> > > -	bool supports_faults;
> > > +	int ret;
> > >  
> > >  	struct drm_xe_vm_create create = {
> > >  		.flags = DRM_XE_VM_CREATE_FLAG_LR_MODE |
> > >  			 DRM_XE_VM_CREATE_FLAG_FAULT_MODE,
> > >  	};
> > >  
> > > -	supports_faults = !igt_ioctl(fd, DRM_IOCTL_XE_VM_CREATE, &create);
> > > +	ret = igt_ioctl(fd, DRM_IOCTL_XE_VM_CREATE, &create);
> > >  
> > > -	if (supports_faults)
> > > +	if (!ret)
> > >  		xe_vm_destroy(fd, create.vm_id);
> > >  
> > > -	return supports_faults;
> > > +	return ret;
> > >  }
> > 
> > Above part looks ok.
> > 
> > >  
> > >  static void xe_device_destroy_cache(void)
> > > diff --git a/lib/xe/xe_query.h b/lib/xe/xe_query.h
> > > index f91d16bdf5..54115f8f7c 100644
> > > --- a/lib/xe/xe_query.h
> > > +++ b/lib/xe/xe_query.h
> > > @@ -94,7 +94,7 @@ uint64_t xe_visible_available_vram_size(int fd, int gt);
> > >  uint32_t xe_get_default_alignment(int fd);
> > >  uint32_t xe_va_bits(int fd);
> > >  uint16_t xe_dev_id(int fd);
> > > -bool xe_supports_faults(int fd);
> > > +int xe_supports_faults(int fd);
> > >  const char *xe_engine_class_string(uint32_t engine_class);
> > >  bool xe_has_engine_class(int fd, uint16_t engine_class);
> > >  bool xe_has_media_gt(int fd);
> > > diff --git a/tests/intel/xe_exec_fault_mode.c b/tests/intel/xe_exec_fault_mode.c
> > > index 0b3f4cb8de..c1402889d9 100644
> > > --- a/tests/intel/xe_exec_fault_mode.c
> > > +++ b/tests/intel/xe_exec_fault_mode.c
> > > @@ -406,8 +406,15 @@ igt_main
> > >  	int fd;
> > >  
> > >  	igt_fixture {
> > > +		struct timespec tv = {};
> > > +		bool supports_faults;
> > > +		int ret;
> > 
> > Add newline.
> > 
> > >  		fd = drm_open_driver(DRIVER_XE);
> > > -		igt_require(xe_supports_faults(fd));
> > > +		do {
> > > +			ret = xe_supports_faults(fd);
> > > +		} while (ret == -EBUSY && igt_seconds_elapsed(&tv) < 10);
> > 
> > Add newline. Btw 10 seconds seems a lot, could you make it
> > lower (2 seconds?) or make it depend on simulation?
> 
> I was under the impression that igt_run_in_simulation was not available
> upstream.  I just checked and this is apparently false.  My bad.  I'll try to
> remember this for the next revision.
> 
> > 
> > > +		supports_faults = !ret;
> > > +		igt_require(supports_faults);
> > 
> > This is also acceptable, I would prefer to fail in case
> > hardware supports faults but test cannot proceed due to
> > some background VM activity.
> > 
> > Regards,
> > Kamil
> 
> Thank you for the revision notes.  I'll get to work on them soon, but I'm currently
> waiting on a reply to some other revision notes before I proceed to make any major
> changes.  I want to make sure everyone is okay with the changes before I proceed.
> -Jonathan Cavitt
> 

No problem, if you could link to lore.kernel.org for discussion
(if there is some in drm/xe) it could help.

Regards,
Kamil

> > 
> > >  	}
> > >  
> > >  	for (const struct section *s = sections; s->name; s++) {
> > > -- 
> > > 2.25.1
> > > 
> >