[Intel-xe] [PATCH i-g-t] test/xe_pm: Assert on unbounded VSP

Gupta, Anshuman anshuman.gupta at intel.com
Thu Apr 27 09:06:47 UTC 2023



> -----Original Message-----
> From: Rodrigo Vivi <rodrigo.vivi at kernel.org>
> Sent: Tuesday, April 25, 2023 8:03 PM
> To: Gupta, Anshuman <anshuman.gupta at intel.com>
> Cc: Chehab, Mauro <mauro.chehab at intel.com>; intel-
> xe at lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi at intel.com>
> Subject: Re: [Intel-xe] [PATCH i-g-t] test/xe_pm: Assert on unbounded VSP
> 
> On Mon, Apr 24, 2023 at 09:50:00AM -0400, Gupta, Anshuman wrote:
> >
> >
> > > -----Original Message-----
> > > From: Rodrigo Vivi <rodrigo.vivi at kernel.org>
> > > Sent: Monday, April 24, 2023 5:52 PM
> > > To: Gupta, Anshuman <anshuman.gupta at intel.com>
> > > Cc: Chehab, Mauro <mauro.chehab at intel.com>; intel-
> > > xe at lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > > Subject: Re: [Intel-xe] [PATCH i-g-t] test/xe_pm: Assert on
> > > unbounded VSP
> > >
> > > On Fri, Apr 21, 2023 at 02:30:42PM +0000, Gupta, Anshuman wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Rodrigo Vivi <rodrigo.vivi at kernel.org>
> > > > > Sent: Friday, April 21, 2023 7:25 PM
> > > > > To: Gupta, Anshuman <anshuman.gupta at intel.com>
> > > > > Cc: intel-xe at lists.freedesktop.org; Chehab, Mauro
> > > > > <mauro.chehab at intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > > > > Subject: Re: [Intel-xe] [PATCH i-g-t] test/xe_pm: Assert on
> > > > > unbounded VSP
> > > > >
> > > > > On Fri, Apr 21, 2023 at 03:45:11PM +0530, Anshuman Gupta wrote:
> > > > > > Discrete GFX cards gfx may have multiple PCIe endpoints, they
> > > > > > connects to root port via pcie upstream switch port(USP) and
> > > > > > virtual pcie switch port(VSP), sometimes VSP pcie devices
> > > > > > doesn't bind to pcieport driver. Without pcieport driver pcie
> > > > > > PM comes without any warranty and with unbounded VSP runtime
> > > > > > pm and system wide suspend/resume IGT
> > > > > test
> > > > > > cases may get pass as false positive.
> > > > >
> > > > > I'm not sure if I'm following here... you mean that in cases
> > > > > like this we cannot suspend/resume at all? What are the false
> positive?
> > > > > I'm not following the whole scenario here...
> > > > Recently Linux core kernel revert the L1 substate save/restore
> > > > feature
> > > from Linux.
> > > > That eventually requires L1 substate to be  disable from BIOS,
> > > > otherwise s2idle fails on a HOST with DG2 card as DG2 was not able
> > > > to exit
> > > form L1 substate. But due to the unbounded VSP our CI missed to
> > > report such issue.
> > > > As VSP were unbounded and DG2 Link was not transitioning to low
> > > > power Link State and our tests were passing instead of failing.
> > >
> > > But then on this case we are just avoiding the test now?
> > > Why don't we cherry-pick the linux core kernel fix as a temporary
> > > topic/core- for-CI instead of blocking the test?
> > We don't have any fix yet, Atm Linux core Kernel does not support L1
> substate during Suspend/Resume.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/com
> >
> mit/drivers/pci/pcie/aspm.c?id=a7152be79b627428c628da2a887ca4b2512a78
> f
> > d
> > >
> > > Or why are we not handling this case in kernel side then?
> > > Why block even s3 tests if this is only about the s2idle?
> > This is about D0/D3 and Link state for our discrete graphics card not directly
> related to s2idle/S3.
> > Due to unbounded VSP, PM core will always keep the bridge devices in
> > D0, L0 state, There is no point to test D3 and Suspend/Resume with
> unbounded VSP.
> > Unbounded VSP was the one of reason on i915 that we missed runtime pm
> wakeref before accessing lmem.
> 
> I have 2 problems with this patch:
> 
> 1. It will fail the PM operations, while the problem is not the PM operations
> itself.
> 2. It doesn't protect the PM operations. I'm failing to see any protection in
> the kernel level.
> It will be just errors that will be ignored.
> 
> So, we need:
> 1. A test that tackles the right issue.
> 2. A protection at the kernel level. If we cannot do PM operations in this
> case,
>    we cannot simply fail in the test and ignore the kernel failure.
Thanks for the review , considering above having a WARN_ON inside
Xe driver for unbounded bridge device will be sufficient ?
Thanks,
Anshuman Gupta.
> 
> >
> > Br,
> > Anshuman Gupta.
> >
> > >
> > > >
> > > > Br,
> > > > Anshuman Gupta.
> > > > >
> > > > > >
> > > > > > Therefore assert the xe_pm igt tests on unbounded VSP.
> > > > > >
> > > > > > Signed-off-by: Anshuman Gupta <anshuman.gupta at intel.com>
> > > > > > ---
> > > > > >  tests/xe/xe_pm.c | 11 ++++++++++-
> > > > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/tests/xe/xe_pm.c b/tests/xe/xe_pm.c index
> > > > > > 23b8246ed..78d9374cf 100644
> > > > > > --- a/tests/xe/xe_pm.c
> > > > > > +++ b/tests/xe/xe_pm.c
> > > > > > @@ -33,6 +33,7 @@
> > > > > >  typedef struct {
> > > > > >  	int fd_xe;
> > > > > >  	struct pci_device *pci_xe;
> > > > > > +	struct pci_device *pci_parent;
> > > > > >  	struct pci_device *pci_root;  } device_t;
> > > > > >
> > > > > > @@ -344,6 +345,7 @@ NULL));
> > > > > >  igt_main
> > > > > >  {
> > > > > >  	struct drm_xe_engine_class_instance *hwe;
> > > > > > +	struct drm_xe_query_config *config;
> > > > > >  	device_t device;
> > > > > >  	char d3cold_allowed[2];
> > > > > >  	const struct s_state {
> > > > > > @@ -369,8 +371,15 @@ igt_main
> > > > > >  		device.fd_xe = drm_open_driver(DRIVER_XE);
> > > > > >  		device.pci_xe =
> igt_device_get_pci_device(device.fd_xe);
> > > > > >  		device.pci_root =
> > > > > igt_device_get_pci_root_port(device.fd_xe);
> > > > > > +		device.pci_parent =
> > > > > pci_device_get_parent_bridge(device.pci_xe);
> > > > > >
> > > > > > -		xe_device_get(device.fd_xe);
> > > > > > +		config = xe_device_get(device.fd_xe)->config;
> > > > > > +
> > > > > > +		if (config->info[XE_QUERY_CONFIG_FLAGS] &
> > > > > XE_QUERY_CONFIG_FLAGS_HAS_VRAM) {
> > > > > > +			igt_assert(device.pci_parent);
> > > > > > +
> > > > > 	igt_assert_f(!pci_device_has_kernel_driver(device.pci_parent),
> > > > > > +				     "pci bridge device does not bind
> with
> > > > > pcieport driver\n");
> > > > > > +		}
> > > > > >
> > > > > >  		/* Always perform initial once-basic exec checking for
> > > > > > health
> > > > > */
> > > > > >  		xe_for_each_hw_engine(device.fd_xe, hwe)
> > > > > > --
> > > > > > 2.25.1
> > > > > >


More information about the Intel-xe mailing list