[igt-dev] [RFC i-g-t v2 0/3] Add multi-process subtests for multi-GPUs

Mauro Carvalho Chehab mauro.chehab at linux.intel.com
Tue Oct 11 13:43:10 UTC 2022


On Tue, 11 Oct 2022 13:46:32 +0300
Petri Latvala <petri.latvala at intel.com> wrote:

> On Tue, Oct 11, 2022 at 11:31:11AM +0200, Mauro Carvalho Chehab wrote:
> > On Tue, 11 Oct 2022 11:17:03 +0300
> > Petri Latvala <petri.latvala at intel.com> wrote:
> > 
> > > On Fri, Oct 07, 2022 at 08:48:58PM +0200, Kamil Konieczny wrote:
> > > > Add one simple macro igt_fork_dyn() and two new helpers in
> > > > igt_core to enable running dynamic tests on two or more GPUs in
> > > > parallel.
> > > > To test this idea I modified two subtests gem_basic at create-close
> > > > and gem_exec_gttfill at basic.
> > > > It is open-coded for ease of debug but can be converted
> > > > into macro if this idea will get acceptance.
> > > > 
> > > > Todo: add some log extension to igt_core from Mauro:
> > > >   https://patchwork.freedesktop.org/series/109171/
> > > >   "add sysfs node at subtest results when available"
> > > > 
> > > > See some logs below.
> > > > 
> > > > Cc: Anna Karas <anna.karas at intel.com>
> > > > Cc: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> > > > Cc: Mauro Carvalho Chehab <mauro.chehab at linux.intel.com>
> > > > Cc: Petri Latvala <petri.latvala at intel.com>
> > > > 
> > > > This is log from gem_exec_gttfill run on one GPU machine:
> > > > 
> > > > IGT-Version: 1.26-NO-GIT (x86_64) (Linux: 6.0.0-rc5-CI_DRM_12145-g2dc9ea03abff x86_64)
> > > > Starting subtest: basic
> > > > Starting dynamic subtest: basic-gpu-0
> > > > Starting dynamic subtest: basic-gpu-1
> > > > Test requirement not met in function start_helpers, file ../tests/i915/gem_exec_gttfill.c:229:
> > > > Test requirement: i915 > 0
> > > > Last errno: 2, No such file or directory
> > > > Dynamic subtest basic-gpu-1: SKIP (0.025s)
> > > > Setup 1025 batches in 1051.24ms
> > > > engine[2]: 2 cycles
> > > > engine[1]: 1 cycles
> > > > engine[0]: 3 cycles
> > > > engine[3]: 2 cycles
> > > > engine[4]: 2 cycles
> > > > Total: 10 cycles
> > > > Dynamic subtest basic-gpu-0: SUCCESS (2.960s)
> > > > Subtest basic: SUCCESS (2.967s)
> > > > 
> > > > Result from machine with two discrete GPUs:
> > > > 
> > > > Starting subtest: basic
> > > > Starting dynamic subtest: basic-gpu-0
> > > > Starting dynamic subtest: basic-gpu-1
> > > > Setup 1025 batches in 3518.56ms
> > > > Setup 1025 batches in 3494.03ms
> > > > ...
> > > > Dynamic subtest basic-gpu-0: SUCCESS (35.349s)
> > > > Dynamic subtest basic-gpu-1: SUCCESS (35.374s)
> > > > Subtest basic: SUCCESS (35.401s)  
> > > 
> > > Having child processes report results breaks a surprising amount of
> > > things. Only the main process should enter/exit subtests or dynamic
> > > subtests.
> > 
> > Ok, but still it makes sense to have per-subtest results somehow.
> > Perhaps we'll need a new igt macro to report multiGPU child test 
> > results.
> > 
> > > There isn't much value here having the separate gpus in separate
> > > dynamic subtests. Conceptually dynamic subtests are entry points that
> > > are not enumerable at compile-time, and this change conceptually
> > > always wants to run all of them really.
> > 
> > The usage of one or multiple GPUs is a runtime decision, based on IGT_DEVICE 
> > handling logic. That should not decided at compile-time.
> 
> 
> The point was them being entry points. There's never a need to do
> --dynamic-subtest basic-gpu-0, when you execute these, you want all of
> them.

Yes.

> > > Instead this should just have everything in the subtest and manually
> > > print which gpu is doing what.
> > 
> > This exercise actually rises an interesting point: on a multi-GPU run,
> > what should be the "global" test result when the same test has different
> > results depending on the GPU?
> 
> Depends on the test.

For some tests, yes, but I guess for most of the stuff what we want is to
ensure that multiple GPUs will be exercised at the same time, in order
to identify potential contentions and lack of serialization when using
multiple GPUs. For those, if an specific subtest is skipped from a
subtest list, it should be OK to return success if the non-skipped
GPUs return success.

> The thumb rule is:
> 
> FAIL - Kernel has a bug.
> SKIP - Cannot test, the HW configuration is not what we need.
> 
> For example,
> 
> 1) There's only one GPU.
> 
> SKIP.
> 
> 2) The test wants multiple identical GPUs. There's multiple GPUs but they're different.
> 
> SKIP.
> 
> 3) The test wants different GPUs. There's multiple GPUs but they're identical.
> 
> SKIP.
> 
> Such conditions should be of course done before launching stuff.

Yeah, things like the above are clear and should be done before the
actual test.

Yet, there are some cases where some subtests are conditionally
skipped, like the ones which use igt_skip_on*(), depending if the
GPU supports or not some specific subtest.

On such cases, we'll have some subtests returning SKIP on some GPUs
of the GPU set, and (hopefully) returning SUCCESS for the others.

IMO, it makes sense to have a "default" propagation rule for such
cases (while allowing the default to be redefined if needed).

> For failures, any failures on one gpu should make the "global" result
> a fail too.

Yes, agreed.

Regards,
Mauro


More information about the igt-dev mailing list