[PATCH i-g-t 4/4] lib/igt_device_scan: Fix scan vs bind/unbind/reload
Lucas De Marchi
lucas.demarchi at intel.com
Fri Dec 20 18:52:10 UTC 2024
On Fri, Dec 20, 2024 at 07:59:33AM +0100, Zbigniew Kempczyński wrote:
>On Thu, Dec 19, 2024 at 10:35:00AM -0600, Lucas De Marchi wrote:
>> On Wed, Dec 18, 2024 at 07:34:19AM +0100, Zbigniew Kempczyński wrote:
>> > On Tue, Dec 17, 2024 at 09:13:24PM -0800, Lucas De Marchi wrote:
>> > > There's no guarantee a card will end up with the same device node when
>> > > modules are loaded/unloaded and drivers bound/unbound. There's some
>> > > fundamental issue with the igt's the way it is and it's also puzzling
>> > > from the logs it looks like the device vanished from the bus, when in
>> > > reality is just the SW state out of sync with what the kernel is
>> > > exporting.
>> > >
>> > > Re-scanning when trying to match a device is not expensive compared to
>> > > what most tests are doing, so simply force it to occur whenever trying
>> > > to match a card.
>> >
>> > I also should comment the above. It is generally true, but I've noticed
>> > getting attributes might be expensive. Even it may take up to few
>> > seconds, that's why I've added some attributes we don't fetch from udev
>> > (see is_on_blacklist()). If I'm not wrong getting 'config' was a cause
>> > to limit attributes we fetch.
>>
>> why would we get all attributes and exclude some? Shouldn't we get only
>> the attributes we actually use? AFAIK this logic is basically used by
>> --device/IGT_DEVICE, right? What filters we normally use?
>
>At moment of developing I didn't know how this code might be used by others
>to write filters. I had only in my mind sriov scenario and I assumed
>more attributes might be useful in the meaning - lets see how it goes.
>
>Currently we use three attributes: driver, sriov_numvfs and physfn.
>I would rewrite this for support both - fetching all attributes without
>excluding anything (for lsgpu -p) and second for normal use so we would
ok... for tools like lsgpu we will never scan more than once.
>get only attributes we use in filters. We would achieve two goals - keep
>lsgpu list of props/attrs almost intact (with exception we'll fetch them
>all now) and have current state of drm device nodes in the test.
>
>Question - will you write this or you want from me to do this? I can
>do this on the beginning of the year. If you're wondering what for
>-p in lsgpu - it is convinient for me to use it instead of udevadm.
if you can do it, it's better
>
>>
>> I usually pass the pci slot (because I know that won't change
>> dynamically and cause surprises). Apparently CI passes vendor/devid:
>>
>> export IGT_DEVICE=pci:vendor=$1,device=$2
>>
>> (but it seems to vary depending on pipeline)
>>
>> Some devs pass the device node directly too as in a lot of places
>> there's only ever card0 possible.
>>
>> Are there other uses?
>
>In Intel likely not, outside - I don't know.
>
>BTW I've checked the problematic attribute 'config' and it seems it
>is immediately fetched on nvidia card now. Still it is binary, so it
>will require to be ignored (...) or ascii encoded.
thanks
Lucas De Marchi
More information about the igt-dev
mailing list