[PATCH i-g-t v11 4/6] lib/intel_multigpu: Add multi_fork_foreach_gpu_chip

Wed Mar 6 19:03:11 UTC 2024

Hi Kamil,

On Wednesday, 6 March 2024 14:31:56 CET Kamil Konieczny wrote:
> Create macro for testing drm driver in multigpu scenarios with
> the help of few new multigpu functions. Also while at this,
> add documentation for public functions and use new helper in
> existing multi-gpu macro.

Is that correct?  I can't see any existing macro modified.

> 
> v2: corrected description (Kamil), rebased on corrected patches
>   after Dominik review
> v3: corrected descriptions, renamed function to print_gpus
>   and did small refactoring (Kamil)
>   fixed typo in function name (Dominik)
> v4: fixed typos (Dominik)
> v10: rename new macro into multi_fork_foreach_gpu_chip (Zbigniew)
> 
> Cc: "Zbigniew Kempczyński" <zbigniew.kempczynski at intel.com>
> Reviewed-by: "Dominik Karol Piątkowski" <dominik.karol.piatkowski at intel.com>
> Signed-off-by: Kamil Konieczny <kamil.konieczny at linux.intel.com>
> ---
>  lib/intel_multigpu.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/intel_multigpu.h | 10 ++++++
>  2 files changed, 82 insertions(+)
> 
> diff --git a/lib/intel_multigpu.c b/lib/intel_multigpu.c
> index c4e7cf542..8b7d6f3fc 100644
> --- a/lib/intel_multigpu.c
> +++ b/lib/intel_multigpu.c
> @@ -4,6 +4,7 @@
>   */
>  
>  #include "drmtest.h"
> +#include "i915/gem.h"
>  #include "igt_core.h"
>  #include "igt_device_scan.h"
>  #include "intel_multigpu.h"
> @@ -60,3 +61,74 @@ void gem_require_multigpu(int gpus_wanted)
>  	print_gpus(gpus_wanted, gpu_filters);
>  	igt_skip_on(gpu_filters < gpus_wanted);
>  }
> +
> +/**
> + * multigpu_acquire_view:
> + * @gpus_wanted: minimum number of GPUs required

That's what I think makes a non-iterating function multigpu specific -- an 
argument for passing a minimum number of user required GPUs.  But still not 
Intel specific, I believe, then I'm not happy with the source file name.

> + * @chipset: chipset, e.g. DRIVER_XE or DRIVER_INTEL
> + *
> + * Function prepares filtered view for GPU cards with given chipset.
> + * On error skips and prints available GPUs found on PCI bus.
> + * Returns: number of GPUs found
> + */
> +int multigpu_acquire_view(int gpus_wanted, unsigned int chipset)
> +{
> +	int gpu_count = drm_prepare_filtered_multigpu(chipset);
> +
> +	igt_assert(gpus_wanted > 1);

Why?  Can't we imagine a test that uses the same code for exercising any 
number of GPUs, including 1 as a special case?  I would call such test 
"multigpu ready".

> +	if (gpu_count >= gpus_wanted)
> +		return gpu_count;
> +
> +	print_gpus(gpus_wanted, gpu_count);
> +	igt_skip_on(gpu_count < gpus_wanted);
> +
> +	return gpu_count; /* non-reachable */
> +}
> +
> +/**
> + * multigpu_open_another:
> + * @idx: index of GPU card, starting from 0
> + * @chipset: chipset, e.g. DRIVER_XE or DRIVER_INTEL
> + *
> + * Function opens GPU card with drm_open_driver_another(). For i915 it 
checks
> + * if opened card is functional. Skips on errors.
> + *
> + * Returns: opened fd for @idx card
> + */
> +int multigpu_open_another(int idx, unsigned int chipset)
> +{
> +	int fd;
> +
> +	igt_assert(idx >= 0);

I think the above check belongs to drm_open_driver_another().

> +	fd = drm_open_driver_another(idx, chipset);
> +
> +	if (chipset == DRIVER_INTEL)
> +		igt_require_gem(fd);
> +
> +	return fd;

This function doesn't look like multigpu specific, nor even Intel specific, as 
other vendors may easily add their own chipset specific requirements.  Do we 
really want to give it a name multigpu_* and place it in an intel_* library?

Besides, we may not need it (see below).

> +}
> +
> +/**
> + * multigpu_open_filtered_card:
> + * @idx: index of GPU card, starting from 0
> + * @chipset: chipset, e.g. DRIVER_XE or DRIVER_INTEL
> + *
> + * Function opens GPU card prepared with filtered view. It also checks if
> + * opened card agrees with desired chipset or checks if opened card is
> + * functional. Skips on errors.
> + *
> + * Returns: opened fd for @idx card
> + */
> +int multigpu_open_filtered_card(int idx, unsigned int chipset)
> +{
> +	int fd;
> +
> +	igt_assert(idx >= 0);
> +	fd = drm_open_filtered_card(idx);
> +	if (chipset == DRIVER_XE)
> +		igt_require_xe(fd);
> +	else if (chipset == DRIVER_INTEL)
> +		igt_require_gem(fd);

The same here (we may not need it if we call drm_open_filtered_card() after 
multigpu_acquire_view() (or drm_prepare_filtered_multigpu()) -- see below).

> +
> +	return fd;
> +}
> diff --git a/lib/intel_multigpu.h b/lib/intel_multigpu.h
> index 5acfcd5f6..a99e7a85e 100644
> --- a/lib/intel_multigpu.h
> +++ b/lib/intel_multigpu.h
> @@ -12,6 +12,10 @@
>  void gem_require_multigpu(int count);
>  int gem_multigpu_count_class(int chipset);
>  
> +int multigpu_acquire_view(int min_gpus_wanted, unsigned int chipset);
> +int multigpu_open_another(int idx, unsigned int chipset);
> +int multigpu_open_filtered_card(int idx, unsigned int chipset);
> +
>  #define igt_foreach_gpu(fd__, chipset__) \
>  	for (int igt_unique(i) = 0, fd__; \
>  		(fd__ = __drm_open_driver_another(igt_unique(i)++, 
chipset__)) >= 0; \
> @@ -23,4 +27,10 @@ int gem_multigpu_count_class(int chipset);
>  			__fd >= 0; \
>  			drm_close_driver(__fd), __fd = -1) \
>  
> +#define multi_fork_foreach_gpu_chip(__fd, __gpu_index, __chipset) \
> +	igt_multi_fork(__gpu_index, multigpu_acquire_view(2, __chipset)) \

Instead of hardcoding the minimum number of 2 GPUs, maybe let this macro 
accept a gpus_wanted parameter, and wrap another macro around it that passes 2 
as gpus_wanted.

> +		for (int __fd = multigpu_open_filtered_card(__gpu_index, 
__chipset); \

Again, your multigpu_open_filtered_card() may skip, while igt_multi_fork() 
documentation says it expects no skips.

We could avoid skips inside the body of igt_multi_fork() if we moved required 
checks to a variant of multigpu_acquire_view() that sets up filters for GPUs 
that not only match the selected chipset but also satisfy those checks.  Since 
devices are now iterated inside drm_prepare_filtered_multigpu(), we  could 
implement a new variant of this function that either performs some predefined 
chipset based checks, like you proposed for multigpu_open_another() and 
multigpu_open_filtered_card(), or accepts an argument that points to an 
arbitrary user provided function, either boolean that returns OK/not OK for a 
GPU file descriptor, or void that skips if not OK.  That way, potential skips 
would occur outside of igt_multi_fork() body, and you could drop those 
multigpu_open_*() helpers which seem of limited use -- test functions 
themselves should detect potential device issues.

> +			__fd >= 0; \
> +			drm_close_driver(__fd), __fd = -1) \

Another trailing backslash, not necessarily intended.

Thanks,
Janusz

> +
>  #endif /* __INTEL_MULTIGPU_H */
>