[Intel-xe] [PATCH i-g-t] test/xe_pm: Assert on unbounded VSP

Rodrigo Vivi rodrigo.vivi at kernel.org
Tue Apr 25 14:32:40 UTC 2023


On Mon, Apr 24, 2023 at 09:50:00AM -0400, Gupta, Anshuman wrote:
> 
> 
> > -----Original Message-----
> > From: Rodrigo Vivi <rodrigo.vivi at kernel.org>
> > Sent: Monday, April 24, 2023 5:52 PM
> > To: Gupta, Anshuman <anshuman.gupta at intel.com>
> > Cc: Chehab, Mauro <mauro.chehab at intel.com>; intel-
> > xe at lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > Subject: Re: [Intel-xe] [PATCH i-g-t] test/xe_pm: Assert on unbounded VSP
> > 
> > On Fri, Apr 21, 2023 at 02:30:42PM +0000, Gupta, Anshuman wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Rodrigo Vivi <rodrigo.vivi at kernel.org>
> > > > Sent: Friday, April 21, 2023 7:25 PM
> > > > To: Gupta, Anshuman <anshuman.gupta at intel.com>
> > > > Cc: intel-xe at lists.freedesktop.org; Chehab, Mauro
> > > > <mauro.chehab at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > > > Subject: Re: [Intel-xe] [PATCH i-g-t] test/xe_pm: Assert on
> > > > unbounded VSP
> > > >
> > > > On Fri, Apr 21, 2023 at 03:45:11PM +0530, Anshuman Gupta wrote:
> > > > > Discrete GFX cards gfx may have multiple PCIe endpoints, they
> > > > > connects to root port via pcie upstream switch port(USP) and
> > > > > virtual pcie switch port(VSP), sometimes VSP pcie devices doesn't
> > > > > bind to pcieport driver. Without pcieport driver pcie PM comes
> > > > > without any warranty and with unbounded VSP runtime pm and system
> > > > > wide suspend/resume IGT
> > > > test
> > > > > cases may get pass as false positive.
> > > >
> > > > I'm not sure if I'm following here... you mean that in cases like
> > > > this we cannot suspend/resume at all? What are the false positive?
> > > > I'm not following the whole scenario here...
> > > Recently Linux core kernel revert the L1 substate save/restore feature
> > from Linux.
> > > That eventually requires L1 substate to be  disable from BIOS,
> > > otherwise s2idle fails on a HOST with DG2 card as DG2 was not able to exit
> > form L1 substate. But due to the unbounded VSP our CI missed to report
> > such issue.
> > > As VSP were unbounded and DG2 Link was not transitioning to low power
> > > Link State and our tests were passing instead of failing.
> > 
> > But then on this case we are just avoiding the test now?
> > Why don't we cherry-pick the linux core kernel fix as a temporary topic/core-
> > for-CI instead of blocking the test?
> We don't have any fix yet, Atm Linux core Kernel does not support L1 substate during Suspend/Resume.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/pcie/aspm.c?id=a7152be79b627428c628da2a887ca4b2512a78fd
> > 
> > Or why are we not handling this case in kernel side then?
> > Why block even s3 tests if this is only about the s2idle?
> This is about D0/D3 and Link state for our discrete graphics card not directly related to s2idle/S3.
> Due to unbounded VSP, PM core will always keep the bridge devices in D0, L0 state,
> There is no point to test D3 and Suspend/Resume with unbounded VSP.
> Unbounded VSP was the one of reason on i915 that we missed runtime pm wakeref before accessing lmem. 

I have 2 problems with this patch:

1. It will fail the PM operations, while the problem is not the PM operations itself.
2. It doesn't protect the PM operations. I'm failing to see any protection in the kernel level.
It will be just errors that will be ignored.

So, we need:
1. A test that tackles the right issue.
2. A protection at the kernel level. If we cannot do PM operations in this case,
   we cannot simply fail in the test and ignore the kernel failure.

> 
> Br,
> Anshuman Gupta.
> 
> > 
> > >
> > > Br,
> > > Anshuman Gupta.
> > > >
> > > > >
> > > > > Therefore assert the xe_pm igt tests on unbounded VSP.
> > > > >
> > > > > Signed-off-by: Anshuman Gupta <anshuman.gupta at intel.com>
> > > > > ---
> > > > >  tests/xe/xe_pm.c | 11 ++++++++++-
> > > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/tests/xe/xe_pm.c b/tests/xe/xe_pm.c index
> > > > > 23b8246ed..78d9374cf 100644
> > > > > --- a/tests/xe/xe_pm.c
> > > > > +++ b/tests/xe/xe_pm.c
> > > > > @@ -33,6 +33,7 @@
> > > > >  typedef struct {
> > > > >  	int fd_xe;
> > > > >  	struct pci_device *pci_xe;
> > > > > +	struct pci_device *pci_parent;
> > > > >  	struct pci_device *pci_root;
> > > > >  } device_t;
> > > > >
> > > > > @@ -344,6 +345,7 @@ NULL));
> > > > >  igt_main
> > > > >  {
> > > > >  	struct drm_xe_engine_class_instance *hwe;
> > > > > +	struct drm_xe_query_config *config;
> > > > >  	device_t device;
> > > > >  	char d3cold_allowed[2];
> > > > >  	const struct s_state {
> > > > > @@ -369,8 +371,15 @@ igt_main
> > > > >  		device.fd_xe = drm_open_driver(DRIVER_XE);
> > > > >  		device.pci_xe = igt_device_get_pci_device(device.fd_xe);
> > > > >  		device.pci_root =
> > > > igt_device_get_pci_root_port(device.fd_xe);
> > > > > +		device.pci_parent =
> > > > pci_device_get_parent_bridge(device.pci_xe);
> > > > >
> > > > > -		xe_device_get(device.fd_xe);
> > > > > +		config = xe_device_get(device.fd_xe)->config;
> > > > > +
> > > > > +		if (config->info[XE_QUERY_CONFIG_FLAGS] &
> > > > XE_QUERY_CONFIG_FLAGS_HAS_VRAM) {
> > > > > +			igt_assert(device.pci_parent);
> > > > > +
> > > > 	igt_assert_f(!pci_device_has_kernel_driver(device.pci_parent),
> > > > > +				     "pci bridge device does not bind with
> > > > pcieport driver\n");
> > > > > +		}
> > > > >
> > > > >  		/* Always perform initial once-basic exec checking for health
> > > > */
> > > > >  		xe_for_each_hw_engine(device.fd_xe, hwe)
> > > > > --
> > > > > 2.25.1
> > > > >


More information about the Intel-xe mailing list