[PATCH i-g-t 4/4] lib/igt_device_scan: Fix scan vs bind/unbind/reload

Zbigniew Kempczyński zbigniew.kempczynski at intel.com
Fri Dec 20 06:59:33 UTC 2024


On Thu, Dec 19, 2024 at 10:35:00AM -0600, Lucas De Marchi wrote:
> On Wed, Dec 18, 2024 at 07:34:19AM +0100, Zbigniew Kempczyński wrote:
> > On Tue, Dec 17, 2024 at 09:13:24PM -0800, Lucas De Marchi wrote:
> > > There's no guarantee a card will end up with the same device node when
> > > modules are loaded/unloaded and drivers bound/unbound. There's some
> > > fundamental issue with the igt's the way it is and it's also puzzling
> > > from the logs it looks like the device vanished from the bus, when in
> > > reality is just the SW state out of sync with what the kernel is
> > > exporting.
> > > 
> > > Re-scanning when trying to match a device is not expensive compared to
> > > what most tests are doing, so simply force it to occur whenever trying
> > > to match a card.
> > 
> > I also should comment the above. It is generally true, but I've noticed
> > getting attributes might be expensive. Even it may take up to few
> > seconds, that's why I've added some attributes we don't fetch from udev
> > (see is_on_blacklist()). If I'm not wrong getting 'config' was a cause
> > to limit attributes we fetch.
> 
> why would we get all attributes and exclude some?  Shouldn't we get only
> the attributes we actually use? AFAIK this logic is basically used by
> --device/IGT_DEVICE, right? What filters we normally use?

At moment of developing I didn't know how this code might be used by others
to write filters. I had only in my mind sriov scenario and I assumed
more attributes might be useful in the meaning - lets see how it goes.

Currently we use three attributes: driver, sriov_numvfs and physfn.
I would rewrite this for support both - fetching all attributes without
excluding anything (for lsgpu -p) and second for normal use so we would
get only attributes we use in filters. We would achieve two goals - keep
lsgpu list of props/attrs almost intact (with exception we'll fetch them
all now) and have current state of drm device nodes in the test.

Question - will you write this or you want from me to do this? I can
do this on the beginning of the year. If you're wondering what for
-p in lsgpu - it is convinient for me to use it instead of udevadm.

> 
> I usually pass the pci slot (because I know that won't change
> dynamically and cause surprises). Apparently CI passes vendor/devid:
> 
> 	export IGT_DEVICE=pci:vendor=$1,device=$2
> 
> (but it seems to vary depending on pipeline)
> 
> Some devs pass the device node directly too as in a lot of places
> there's only ever card0 possible.
> 
> Are there other uses?

In Intel likely not, outside - I don't know.

BTW I've checked the problematic attribute 'config' and it seems it
is immediately fetched on nvidia card now. Still it is binary, so it
will require to be ignored (...) or ascii encoded.

--
Zbigniew

> 
> thanks
> Lucas De Marchi
> 
> > 
> > --
> > Zbigniew
> > 
> > > 
> > > Example for xe_wedged:
> > > 
> > > 	$ sudo ./build/tests/xe_wedged --device pci:0000:03:00.0 --r wedged-at-any-timeout
> > > 	(xe_wedged:11173) drmtest-DEBUG: Looking for devices to open using filter 0: pci:0000:03:00.0
> > > 	(xe_wedged:11173) drmtest-DEBUG: Filter matched /dev/dri/card1 | /dev/dri/renderD129
> > > 
> > > 	... [ wedge and rebind here ]
> > > 
> > > 	(xe_wedged:11173) drmtest-DEBUG: Looking for devices to open using filter 0: pci:0000:03:00.0
> > > 	(xe_wedged:11173) drmtest-DEBUG: Filter matched /dev/dri/card2 | /dev/dri/renderD130
> > > 
> > > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > ---
> > >  lib/igt_device_scan.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/lib/igt_device_scan.c b/lib/igt_device_scan.c
> > > index 8e2297087..956719fba 100644
> > > --- a/lib/igt_device_scan.c
> > > +++ b/lib/igt_device_scan.c
> > > @@ -1983,7 +1983,7 @@ static bool __igt_device_card_match(const char *filter,
> > >  	 * Scan devices in case the user hasn't yet,
> > >  	 * but leave a decision on forced rescan on the user side.
> > >  	 */
> > > -	igt_devices_scan(false);
> > > +	igt_devices_scan(true);
> > > 
> > >  	if (igt_device_filter_apply(filter) == false)
> > >  		return false;
> > > --
> > > 2.47.0
> > > 


More information about the igt-dev mailing list