[igt-dev] [RFC PATCH v3 0/8] Add multi-process subtests for multi-GPUs

Petri Latvala petri.latvala at intel.com
Mon Nov 28 11:26:24 UTC 2022


On Fri, Nov 25, 2022 at 01:01:40PM +0100, Kamil Konieczny wrote:
> Add one simple macro igt_fork_dyn() and two new helpers in
> igt_core to enable running dynamic tests on two or more GPUs in
> parallel.
> To test this idea I added two subtests gem_basic at create-close
> and gem_exec_gttfill at multigpu-basic.
> It is open-coded for ease of debug but can be converted
> into macro if this idea will get acceptance.
> 
> v3: added log for opened device extension from Mauro with
>   some modifications
>   added tests for fork_dyn so it works as igt_fork
>   added prefix log to help debug problems
>   rework gttfill multigpu-basic subtest

In its current form there's multiple hurdles still.

The socket communications strictly cannot cope with dynamic subtest
outputs being interleaved. The parsing is done with one pass through
the comms dump, tracking whether the output is inside a dynamic
subtest and which one. Having another dynamic subtest start while
another is being tracked will make it confused.

I don't know how the text-based parsing handles that.

But that's not an important point before another question is answered:
What is the point of having dynamic subtests per gpu index? The
results won't be comparable across systems.

Compare to for example the kernel selftests, a selftest "hugepages"
might not be there for a platform, or for a different kernel version,
but if it is, it's comparable to another "hugepages" result. You can
say "it works on this platform, but doesn't on this platform" and
point to a kernel bug.

What can you say if gpu-0 fails on host "fi-multi-dg2" but passes on
"fi-multi-dg1"? What can you even say if gpu-0 fails and gpu-1 passes?
Is that the interesting data? In my opinion
"create-closemultigpu at gpu-0" is not the interesting part, it's just
"create-close-multigpu".

(That's not even getting to the implied requirements on the host system
for this test setup of kinda maybe requiring those devices to be
identical.)


-- 
Petri Latvala




> 
> See some logs below.
> 
> Cc: Anna Karas <anna.karas at intel.com>
> Cc: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> Cc: Mauro Carvalho Chehab <mauro.chehab at linux.intel.com>
> Cc: Petri Latvala <petri.latvala at intel.com>
> 
> Starting subtest: multigpu-basic
> <g:0> Setup 1025 batches in 3398.88ms
> <g:1> Setup 1025 batches in 3392.46ms
> [..skipped..]
> <g:0> Total: 33 cycles
> <g:1> Total: 33 cycles
> Subtest multigpu-basic: SUCCESS (36.248s)
> 
> Kamil Konieczny (7):
>   lib/igt_core: add fork for dynamic tests
>   lib/igt_core: add prefix to logging
>   lib/tests/igt_fork: add tests for igt_fork_dyn
>   lib/tests/igt_fork: change comments into prints
>   tests/i915/gem_basic: add multi-gpu tests
>   tests/i915/gem_exec_gttfill: add new subtest multigpu-basic
>   HAX test few multi-gpu subtests
> 
> Mauro Carvalho Chehab (1):
>   lib/igt_core: store GPU string or opened device name
> 
>  lib/drmtest.c                         |   4 +-
>  lib/igt_core.c                        | 248 ++++++++++++++++++++++++--
>  lib/igt_core.h                        |  27 +++
>  lib/tests/igt_fork.c                  |  93 +++++++---
>  tests/i915/gem_basic.c                |  28 ++-
>  tests/i915/gem_exec_gttfill.c         |  31 +++-
>  tests/intel-ci/fast-feedback.testlist | 176 +-----------------
>  7 files changed, 386 insertions(+), 221 deletions(-)
> 
> -- 
> 2.34.1
> 


More information about the igt-dev mailing list