Add env info to igt_runner (was: Re: [PATCH i-g-t 4/4] lib/igt_device_scan: Fix scan vs bind/unbind/reload)

Fri Dec 20 17:05:22 UTC 2024

Hi Lucas,
On 2024-12-19 at 11:24:09 -0600, Lucas De Marchi wrote:
> hijacking the thread and adding some people to Cc for the igt_runner question.
> Previously In-Reply-To: <rnw3q6mhthnwyvowvszr2gllyjtbb2mozk4em272xlmkvm7pyl at szbhtg3sd7d7>
> 
> On Thu, Dec 19, 2024 at 10:35:00AM -0600, Lucas De Marchi wrote:
> > On Wed, Dec 18, 2024 at 07:34:19AM +0100, Zbigniew Kempczyński wrote:
> > > On Tue, Dec 17, 2024 at 09:13:24PM -0800, Lucas De Marchi wrote:
> > > > There's no guarantee a card will end up with the same device node when
> > > > modules are loaded/unloaded and drivers bound/unbound. There's some
> > > > fundamental issue with the igt's the way it is and it's also puzzling
> > > > from the logs it looks like the device vanished from the bus, when in
> > > > reality is just the SW state out of sync with what the kernel is
> > > > exporting.
> > > > 
> > > > Re-scanning when trying to match a device is not expensive compared to
> > > > what most tests are doing, so simply force it to occur whenever trying
> > > > to match a card.
> > > 
> > > I also should comment the above. It is generally true, but I've noticed
> > > getting attributes might be expensive. Even it may take up to few
> > > seconds, that's why I've added some attributes we don't fetch from udev
> > > (see is_on_blacklist()). If I'm not wrong getting 'config' was a cause
> > > to limit attributes we fetch.
> > 
> > why would we get all attributes and exclude some?  Shouldn't we get only
> > the attributes we actually use? AFAIK this logic is basically used by
> > --device/IGT_DEVICE, right? What filters we normally use?
> > 
> > I usually pass the pci slot (because I know that won't change
> > dynamically and cause surprises). Apparently CI passes vendor/devid:
> > 
> > 	export IGT_DEVICE=pci:vendor=$1,device=$2
> > 
> > (but it seems to vary depending on pipeline)
> > 
> > Some devs pass the device node directly too as in a lot of places
> > there's only ever card0 possible.
> 
> 
> Could we dump the env and args somewhere so we know how igt_runner or
> individual tests are being called without looking at the CI piepeline
> sources? I was thinking about either having that info in the stdout
> output of igt_runner or in the json. Another possibility would be in
> dmesg, but I'm not sure it's a good option. Thoughts?

Not only that, also parameters used to start igt_runner,
what was in .igtrc file (if any), current wall time,
testlist prepared to run, free memory and free disk.
metadata file for igt_resume, it will enable with prepared
teslist to re-execute run.

Also kernel config from /boot ? Or should it be in shard
run info (avoided duplication).

Maybe some other info, either igt_facts or lspci output?
Should we ask also display team and our CI?

+cc Jari from display

Regards,
Kamil

> 
> My preferred option would be to have e.g.:
> 
> {
>   "__type__": "TestrunResult",
>   "results_version": 10,
>   "name": "xe-2403-995cd30a4e222b6a7b4b40c36219e4937fd7109e\/bat-bmg-1\/0",
>   "uname": "Linux bat-bmg-1 6.13.0-rc3-xe+ #1 SMP PREEMPT_DYNAMIC Thu Dec 19 14:40:51 UTC 2024 x86_64",
>   "time_elapsed": {
>     "__type__": "TimeAttribute",
>     "start": 1734621126.8734231,
>     "end": 1734621288.5994539
>   },
>   "environment": {
>     "IGT_DEVICE": ...
>     <any IGT_* env var>
>   },
>   "argv": [ ... ]
> 
> 
> Lucas De Marchi