[PATCH i-g-t v11] igt-runner fact checking

Kamil Konieczny kamil.konieczny at linux.intel.com
Mon Dec 9 17:16:00 UTC 2024


Hi Zbigniew,
On 2024-12-09 at 11:53:35 +0100, Zbigniew Kempczyński wrote:
> On Thu, Dec 05, 2024 at 11:51:43AM +0100, Peter Senna Tschudin wrote:
> > When using igt-runner, collect facts before each test and after the
> > last test, and report when facts change. The facts are:
> >  - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> >  - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> >  - Kernel taints: kernel.is_tainted.taint_warn: true
> >  - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
> > 
> > This change imposes little execution overhead and adds just a few
> > lines of logging. The facts will be printed on normal igt-runner
> > output. Here is a real example from our CI shwoing
> > hotreplug-lateclose changing the DRM card number and tainting the
> > kernel on the abort path:
> > 
> >  [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
> >  [245.383596] Starting subtest: hotreplug-lateclose
> >  [249.843361] Aborting: Lockdep not active
> >  [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> >  [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> >  [249.859075] Closing watchdogs
> > 
> > CC: Ryszard Knop <ryszard.knop at intel.com>
> > CC: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> > CC: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> > CC: Lucas De Marchi <lucas.demarchi at intel.com>
> > CC: luciano.coelho at intel.com
> > CC: nirmoy.das at intel.com
> > CC: stuart.summers at intel.com
> > CC: himal.prasad.ghimiray at intel.com
> > CC: dominik.karol.piatkowski at intel.com
> > CC: katarzyna.piecielska at intel.com
> > Reviewed-by: Dominik Karol Piątkowski <dominik.karol.piatkowski at intel.com>
> > Signed-off-by: Peter Senna Tschudin <peter.senna at linux.intel.com>
> 
> According to our offline discussion today according acceptance criteria
> from my pov:
> 
> - split patch to library, tool and test part (3 commits + cover letter)
> - add switch (set it to disabled by default) to enable the facts collection
>   by CI, it can selectively enable your feature and this will give the
>   time to warm up. I have in my mind how long I've been working on bug
>   in udev scanning race (commit: a23e8aed0b54018339647d0817267431bd2b7075)
> - get r-b from CI folks (+Ryszard in cc). According to my request they
>   will need to enable using facts in CI. I don't know is any other
>   vendor interested with using facts (I guess on embedded likely not)
>   because there was no feedback from them, but I assume we don't want
>   to break their CI if there will be any issue with facts. I prefer
>   to be polite in meaning - "there's new code, it's hard to say it
>   is bug free, so we're allow you to enable this if you want. In the
>   meantime we'll be using this in our CI so that's will give the time
>   to warm it up". I have no such resistance in tests, sometimes in library
>   because there's more samples which might catch this on premerge. For
>   runner there's only one sample per machine (unless it will reboot)
>   but this is too small set to me. Selective enabling will give us
>   more samples in the some time period in which at least we won't break
>   other vendors CI.
> - there's minor issue I'm asking people when I'm doing review for them
>   - use /* */ comments instead of //.

The only exception is SPDX line to make checkpatch.pl happy.

> 
> Ping me when you'll address all of my requests.
> 
> --
> Zbigniew
> 

>From this ("commit 268845156b299ac61e68dc52c104bbd79f929f79")
it looks like amdgpu devs are using igt_runner so adding them
to Cc, so they could also join a discussion for igt_runner
to collect facts and watch which ones are changing
(and which one we are missing if any?)

Cc: Leo Li <sunpeng.li at amd.com>
Cc: Alex Hung <alex.hung at amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira at amd.com>
Cc: Nicholas Choi <nicholas.choi at amd.com>
Cc: George Zhang <george.zhang at amd.com>

Btw sorry for late notify.

Regards,
Kamil

> > ---
> > v11:
> >  - fix typo
> > 
> > v10:
> >  - fix memory leaks from asprintf (Thank you Dominik Karol!)
> >  - fix comments for consistency (Thank you Dominik Karol!)
> > 
> > v9:
> >  - do not report new hardware when loading/unloading kmod changes the
> >    string of the GPU name. I accidentally reintroduced this issue
> >    when refactoring to use linked lists.
> >  - add tools/lsfacts: 9 lines of code that print either the facts or
> >    that no facts were found.
> >  - fix code comments describing functions
> >  - fix white space issues
> > 
> > v8:
> >  - fix white space issues
> > 
> > v7:
> >  - refactor to use linked lists provided by igt_lists
> >  - Added function arguments to code comments
> >  - updated commit message
> > 
> > v6:
> >  - sort includes in igt_facts.c alphabetically
> >  - add facts for kernel taints using igt_kernel_tainted() and
> >    igt_explain_taints()
> > 
> > v5:
> >  - fix the broken patch format from v4
> > 
> > v4:
> >  - fix a bug on delete_fact()
> >  - drop glib and calls to g_ functions
> >  - change commit message to indicate that report only on fact changes
> >  - use consistent format for reporting changes
> >  - fix SPDX header format
> > 
> > v3:
> >  - refreshed commit message
> >  - changed format SPDX string
> >  - removed license text
> >  - replace last_test assignment when null by two ternary operators
> >  - added function descriptions following example found elsewhere in
> >    the code
> >  - added igt_assert to catch failures to realloc()
> > 
> > v2:
> >  - add lib/tests/igt_facts.c for basic unit testing
> >  - bugfix: do not report a new gpu when the driver changes the gpu name
> >  - bugfix: do not report the pci_id twice on the gpu name
> > 
> >  lib/igt_facts.c       | 755 ++++++++++++++++++++++++++++++++++++++++++
> >  lib/igt_facts.h       |  47 +++
> >  lib/meson.build       |   1 +
> >  lib/tests/igt_facts.c |  15 +
> >  lib/tests/meson.build |   1 +
> >  runner/executor.c     |  10 +
> >  tools/lsfacts.c       |  25 ++
> >  tools/meson.build     |   1 +
> >  8 files changed, 855 insertions(+)
> >  create mode 100644 lib/igt_facts.c
> >  create mode 100644 lib/igt_facts.h
> >  create mode 100644 lib/tests/igt_facts.c
> >  create mode 100644 tools/lsfacts.c
> > 

...cut...



More information about the igt-dev mailing list