[PATCH i-g-t v11] igt-runner fact checking
Kamil Konieczny
kamil.konieczny at linux.intel.com
Mon Dec 9 17:16:00 UTC 2024
Hi Zbigniew,
On 2024-12-09 at 11:53:35 +0100, Zbigniew Kempczyński wrote:
> On Thu, Dec 05, 2024 at 11:51:43AM +0100, Peter Senna Tschudin wrote:
> > When using igt-runner, collect facts before each test and after the
> > last test, and report when facts change. The facts are:
> > - GPUs on PCI bus: hardware.pci.gpu_at_addr.0000:03:00.0: 8086:e20b Intel Battlemage (Gen20)
> > - Associations between PCI GPU and DRM card: hardware.pci.drm_card_at_addr.0000:03:00.0: card1
> > - Kernel taints: kernel.is_tainted.taint_warn: true
> > - GPU kernel modules loaded: kernel.kmod_is_loaded.i915: true
> >
> > This change imposes little execution overhead and adds just a few
> > lines of logging. The facts will be printed on normal igt-runner
> > output. Here is a real example from our CI shwoing
> > hotreplug-lateclose changing the DRM card number and tainting the
> > kernel on the abort path:
> >
> > [245.316207] [056/121] (816s left) core_hotunplug (hotreplug-lateclose)
> > [245.383596] Starting subtest: hotreplug-lateclose
> > [249.843361] Aborting: Lockdep not active
> > [249.858249] [FACT core_hotunplug (hotreplug-lateclose)] changed: hardware.pci.drm_card_at_addr.0000:00:02.0: card0 -> card1
> > [249.858392] [FACT core_hotunplug (hotreplug-lateclose)] new: kernel.is_tainted.taint_die: true
> > [249.859075] Closing watchdogs
> >
> > CC: Ryszard Knop <ryszard.knop at intel.com>
> > CC: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
> > CC: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> > CC: Lucas De Marchi <lucas.demarchi at intel.com>
> > CC: luciano.coelho at intel.com
> > CC: nirmoy.das at intel.com
> > CC: stuart.summers at intel.com
> > CC: himal.prasad.ghimiray at intel.com
> > CC: dominik.karol.piatkowski at intel.com
> > CC: katarzyna.piecielska at intel.com
> > Reviewed-by: Dominik Karol Piątkowski <dominik.karol.piatkowski at intel.com>
> > Signed-off-by: Peter Senna Tschudin <peter.senna at linux.intel.com>
>
> According to our offline discussion today according acceptance criteria
> from my pov:
>
> - split patch to library, tool and test part (3 commits + cover letter)
> - add switch (set it to disabled by default) to enable the facts collection
> by CI, it can selectively enable your feature and this will give the
> time to warm up. I have in my mind how long I've been working on bug
> in udev scanning race (commit: a23e8aed0b54018339647d0817267431bd2b7075)
> - get r-b from CI folks (+Ryszard in cc). According to my request they
> will need to enable using facts in CI. I don't know is any other
> vendor interested with using facts (I guess on embedded likely not)
> because there was no feedback from them, but I assume we don't want
> to break their CI if there will be any issue with facts. I prefer
> to be polite in meaning - "there's new code, it's hard to say it
> is bug free, so we're allow you to enable this if you want. In the
> meantime we'll be using this in our CI so that's will give the time
> to warm it up". I have no such resistance in tests, sometimes in library
> because there's more samples which might catch this on premerge. For
> runner there's only one sample per machine (unless it will reboot)
> but this is too small set to me. Selective enabling will give us
> more samples in the some time period in which at least we won't break
> other vendors CI.
> - there's minor issue I'm asking people when I'm doing review for them
> - use /* */ comments instead of //.
The only exception is SPDX line to make checkpatch.pl happy.
>
> Ping me when you'll address all of my requests.
>
> --
> Zbigniew
>
>From this ("commit 268845156b299ac61e68dc52c104bbd79f929f79")
it looks like amdgpu devs are using igt_runner so adding them
to Cc, so they could also join a discussion for igt_runner
to collect facts and watch which ones are changing
(and which one we are missing if any?)
Cc: Leo Li <sunpeng.li at amd.com>
Cc: Alex Hung <alex.hung at amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira at amd.com>
Cc: Nicholas Choi <nicholas.choi at amd.com>
Cc: George Zhang <george.zhang at amd.com>
Btw sorry for late notify.
Regards,
Kamil
> > ---
> > v11:
> > - fix typo
> >
> > v10:
> > - fix memory leaks from asprintf (Thank you Dominik Karol!)
> > - fix comments for consistency (Thank you Dominik Karol!)
> >
> > v9:
> > - do not report new hardware when loading/unloading kmod changes the
> > string of the GPU name. I accidentally reintroduced this issue
> > when refactoring to use linked lists.
> > - add tools/lsfacts: 9 lines of code that print either the facts or
> > that no facts were found.
> > - fix code comments describing functions
> > - fix white space issues
> >
> > v8:
> > - fix white space issues
> >
> > v7:
> > - refactor to use linked lists provided by igt_lists
> > - Added function arguments to code comments
> > - updated commit message
> >
> > v6:
> > - sort includes in igt_facts.c alphabetically
> > - add facts for kernel taints using igt_kernel_tainted() and
> > igt_explain_taints()
> >
> > v5:
> > - fix the broken patch format from v4
> >
> > v4:
> > - fix a bug on delete_fact()
> > - drop glib and calls to g_ functions
> > - change commit message to indicate that report only on fact changes
> > - use consistent format for reporting changes
> > - fix SPDX header format
> >
> > v3:
> > - refreshed commit message
> > - changed format SPDX string
> > - removed license text
> > - replace last_test assignment when null by two ternary operators
> > - added function descriptions following example found elsewhere in
> > the code
> > - added igt_assert to catch failures to realloc()
> >
> > v2:
> > - add lib/tests/igt_facts.c for basic unit testing
> > - bugfix: do not report a new gpu when the driver changes the gpu name
> > - bugfix: do not report the pci_id twice on the gpu name
> >
> > lib/igt_facts.c | 755 ++++++++++++++++++++++++++++++++++++++++++
> > lib/igt_facts.h | 47 +++
> > lib/meson.build | 1 +
> > lib/tests/igt_facts.c | 15 +
> > lib/tests/meson.build | 1 +
> > runner/executor.c | 10 +
> > tools/lsfacts.c | 25 ++
> > tools/meson.build | 1 +
> > 8 files changed, 855 insertions(+)
> > create mode 100644 lib/igt_facts.c
> > create mode 100644 lib/igt_facts.h
> > create mode 100644 lib/tests/igt_facts.c
> > create mode 100644 tools/lsfacts.c
> >
...cut...
More information about the igt-dev
mailing list