[PATCH i-g-t v11 4/6] lib/intel_multigpu: Add multi_fork_foreach_gpu_chip

Tue Mar 12 13:18:15 UTC 2024

Hi Janusz,
On 2024-03-06 at 20:03:11 +0100, Janusz Krzysztofik wrote:
> Hi Kamil,
> 
> On Wednesday, 6 March 2024 14:31:56 CET Kamil Konieczny wrote:
> > Create macro for testing drm driver in multigpu scenarios with
> > the help of few new multigpu functions. Also while at this,
> > add documentation for public functions and use new helper in
> > existing multi-gpu macro.
> 
> Is that correct?  I can't see any existing macro modified.
> 

Good point, I moved doc to patch where it belongs, I will
correct description here.

> > 
> > v2: corrected description (Kamil), rebased on corrected patches
> >   after Dominik review
> > v3: corrected descriptions, renamed function to print_gpus
> >   and did small refactoring (Kamil)
> >   fixed typo in function name (Dominik)
> > v4: fixed typos (Dominik)
> > v10: rename new macro into multi_fork_foreach_gpu_chip (Zbigniew)
> > 
> > Cc: "Zbigniew Kempczyński" <zbigniew.kempczynski at intel.com>
> > Reviewed-by: "Dominik Karol Piątkowski" <dominik.karol.piatkowski at intel.com>
> > Signed-off-by: Kamil Konieczny <kamil.konieczny at linux.intel.com>
> > ---
> >  lib/intel_multigpu.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
> >  lib/intel_multigpu.h | 10 ++++++
> >  2 files changed, 82 insertions(+)
> > 
> > diff --git a/lib/intel_multigpu.c b/lib/intel_multigpu.c
> > index c4e7cf542..8b7d6f3fc 100644
> > --- a/lib/intel_multigpu.c
> > +++ b/lib/intel_multigpu.c
> > @@ -4,6 +4,7 @@
> >   */
> >  
> >  #include "drmtest.h"
> > +#include "i915/gem.h"
> >  #include "igt_core.h"
> >  #include "igt_device_scan.h"
> >  #include "intel_multigpu.h"
> > @@ -60,3 +61,74 @@ void gem_require_multigpu(int gpus_wanted)
> >  	print_gpus(gpus_wanted, gpu_filters);
> >  	igt_skip_on(gpu_filters < gpus_wanted);
> >  }
> > +
> > +/**
> > + * multigpu_acquire_view:
> > + * @gpus_wanted: minimum number of GPUs required
> 
> That's what I think makes a non-iterating function multigpu specific -- an 
> argument for passing a minimum number of user required GPUs.  But still not 
> Intel specific, I believe, then I'm not happy with the source file name.
> 

Yes, I renemed it to igt_multigpu.*

> > + * @chipset: chipset, e.g. DRIVER_XE or DRIVER_INTEL
> > + *
> > + * Function prepares filtered view for GPU cards with given chipset.
> > + * On error skips and prints available GPUs found on PCI bus.
> > + * Returns: number of GPUs found
> > + */
> > +int multigpu_acquire_view(int gpus_wanted, unsigned int chipset)
> > +{
> > +	int gpu_count = drm_prepare_filtered_multigpu(chipset);
> > +
> > +	igt_assert(gpus_wanted > 1);
> 
> Why?  Can't we imagine a test that uses the same code for exercising any 
> number of GPUs, including 1 as a special case?  I would call such test 
> "multigpu ready".
>  

Should I warn/info about it being 1 here?
What about:

	igt_assert(gpus_wanted > 0);
	if(gpus_wanted == 1)
            igt_info("requested only one GPU for multi-gpu test\n");

Or maybe not here but just before return, when gpu_count == 1?

> > +	if (gpu_count >= gpus_wanted)
> > +		return gpu_count;
> > +
> > +	print_gpus(gpus_wanted, gpu_count);
> > +	igt_skip_on(gpu_count < gpus_wanted);
> > +
> > +	return gpu_count; /* non-reachable */
> > +}
> > +
> > +/**
> > + * multigpu_open_another:
> > + * @idx: index of GPU card, starting from 0
> > + * @chipset: chipset, e.g. DRIVER_XE or DRIVER_INTEL
> > + *
> > + * Function opens GPU card with drm_open_driver_another(). For i915 it 
> checks
> > + * if opened card is functional. Skips on errors.
> > + *
> > + * Returns: opened fd for @idx card
> > + */
> > +int multigpu_open_another(int idx, unsigned int chipset)
> > +{
> > +	int fd;
> > +
> > +	igt_assert(idx >= 0);
> 
> I think the above check belongs to drm_open_driver_another().
> 

Right, I will remove it.

> > +	fd = drm_open_driver_another(idx, chipset);
> > +
> > +	if (chipset == DRIVER_INTEL)
> > +		igt_require_gem(fd);
> > +
> > +	return fd;
> 
> This function doesn't look like multigpu specific, nor even Intel specific, as 
> other vendors may easily add their own chipset specific requirements.  Do we 
> really want to give it a name multigpu_* and place it in an intel_* library?
> 
> Besides, we may not need it (see below).
> 

Note that it do some extra work - it checks for has_gem(fd)
but I think that it should also check DRIVER_ANY and is_i915()

> > +}
> > +
> > +/**
> > + * multigpu_open_filtered_card:
> > + * @idx: index of GPU card, starting from 0
> > + * @chipset: chipset, e.g. DRIVER_XE or DRIVER_INTEL
> > + *
> > + * Function opens GPU card prepared with filtered view. It also checks if
> > + * opened card agrees with desired chipset or checks if opened card is
> > + * functional. Skips on errors.
> > + *
> > + * Returns: opened fd for @idx card
> > + */
> > +int multigpu_open_filtered_card(int idx, unsigned int chipset)
> > +{
> > +	int fd;
> > +
> > +	igt_assert(idx >= 0);
> > +	fd = drm_open_filtered_card(idx);
> > +	if (chipset == DRIVER_XE)
> > +		igt_require_xe(fd);
> > +	else if (chipset == DRIVER_INTEL)
> > +		igt_require_gem(fd);
> 
> The same here (we may not need it if we call drm_open_filtered_card() after 
> multigpu_acquire_view() (or drm_prepare_filtered_multigpu()) -- see below).
> 

Same here - it do some more extra work.

> > +
> > +	return fd;
> > +}
> > diff --git a/lib/intel_multigpu.h b/lib/intel_multigpu.h
> > index 5acfcd5f6..a99e7a85e 100644
> > --- a/lib/intel_multigpu.h
> > +++ b/lib/intel_multigpu.h
> > @@ -12,6 +12,10 @@
> >  void gem_require_multigpu(int count);
> >  int gem_multigpu_count_class(int chipset);
> >  
> > +int multigpu_acquire_view(int min_gpus_wanted, unsigned int chipset);
> > +int multigpu_open_another(int idx, unsigned int chipset);
> > +int multigpu_open_filtered_card(int idx, unsigned int chipset);
> > +
> >  #define igt_foreach_gpu(fd__, chipset__) \
> >  	for (int igt_unique(i) = 0, fd__; \
> >  		(fd__ = __drm_open_driver_another(igt_unique(i)++, 
> chipset__)) >= 0; \
> > @@ -23,4 +27,10 @@ int gem_multigpu_count_class(int chipset);
> >  			__fd >= 0; \
> >  			drm_close_driver(__fd), __fd = -1) \
> >  
> > +#define multi_fork_foreach_gpu_chip(__fd, __gpu_index, __chipset) \
> > +	igt_multi_fork(__gpu_index, multigpu_acquire_view(2, __chipset)) \
> 
> Instead of hardcoding the minimum number of 2 GPUs, maybe let this macro 
> accept a gpus_wanted parameter, and wrap another macro around it that passes 2 
> as gpus_wanted.
> 
> > +		for (int __fd = multigpu_open_filtered_card(__gpu_index, 
> __chipset); \
> 
> Again, your multigpu_open_filtered_card() may skip, while igt_multi_fork() 
> documentation says it expects no skips.
> 

It is because it treats such skips as failures, not as a skips.

> We could avoid skips inside the body of igt_multi_fork() if we moved required 
> checks to a variant of multigpu_acquire_view() that sets up filters for GPUs 
> that not only match the selected chipset but also satisfy those checks.  Since 
> devices are now iterated inside drm_prepare_filtered_multigpu(), we  could 
> implement a new variant of this function that either performs some predefined 
> chipset based checks, like you proposed for multigpu_open_another() and 
> multigpu_open_filtered_card(), or accepts an argument that points to an 
> arbitrary user provided function, either boolean that returns OK/not OK for a 
> GPU file descriptor, or void that skips if not OK.  That way, potential skips 
> would occur outside of igt_multi_fork() body, and you could drop those 
> multigpu_open_*() helpers which seem of limited use -- test functions 
> themselves should detect potential device issues.
> 

It is, first line of this macro:
  igt_multi_fork(__gpu_index, multigpu_acquire_view(2, __chipset)) \

has multigpu_acuire_view() and it should do exactly what you wrote,
it should make checks that all opens later should end in success.
They can still fail later (for example - has_gem() for i915).

> > +			__fd >= 0; \
> > +			drm_close_driver(__fd), __fd = -1) \
> 
> Another trailing backslash, not necessarily intended.

Right, I will fix it.

> 
> Thanks,
> Janusz
> 

Thank you for your comments,

Regards,
Kamil

> > +
> >  #endif /* __INTEL_MULTIGPU_H */
> > 
> 
> 
> 
>