[PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI Modules for i915/Xe Driver

Lucas De Marchi lucas.demarchi at intel.com
Wed Sep 11 16:18:51 UTC 2024


+ linux-modules
+ Luis

On Wed, Sep 11, 2024 at 01:00:47AM GMT, Bommu, Krishnaiah wrote:
>
>
>> -----Original Message-----
>> From: De Marchi, Lucas <lucas.demarchi at intel.com>
>> Sent: Tuesday, September 10, 2024 9:13 PM
>> To: Vivi, Rodrigo <rodrigo.vivi at intel.com>
>> Cc: Bommu, Krishnaiah <krishnaiah.bommu at intel.com>; intel-
>> xe at lists.freedesktop.org; intel-gfx at lists.freedesktop.org; Kamil Konieczny
>> <kamil.konieczny at linux.intel.com>; Ceraolo Spurio, Daniele
>> <daniele.ceraolospurio at intel.com>; Upadhyay, Tejas
>> <tejas.upadhyay at intel.com>; Tvrtko Ursulin <tursulin at ursulin.net>; Joonas
>> Lahtinen <joonas.lahtinen at linux.intel.com>; Nikula, Jani
>> <jani.nikula at intel.com>; Thomas Hellström
>> <thomas.hellstrom at linux.intel.com>; Teres Alexis, Alan Previn
>> <alan.previn.teres.alexis at intel.com>; Winkler, Tomas
>> <tomas.winkler at intel.com>; Usyskin, Alexander
>> <alexander.usyskin at intel.com>
>> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
>> Modules for i915/Xe Driver
>>
>> On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
>> >On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
>> >> This update addresses the unload/reload sequence of MEI modules in
>> >> relation to the i915/Xe graphics driver. On platforms where the MEI
>> >> hardware is integrated with the graphics device (e.g., DG2/BMG), the
>> >> i915/xe driver is depend on the MEI modules. Conversely, on newer
>> >> platforms like MTL and LNL, where the MEI hardware is separate, this
>> dependency does not exist.
>> >>
>> >> The changes introduced ensure that MEI modules are unloaded and
>> >> reloaded in the correct order based on platform-specific
>> >> dependencies. This is achieved by adding a MODULE_SOFTDEP directive to
>> the i915 and Xe module code.
>>
>>
>> can you explain what causes the modules to be loaded today? Also, is this to fix
>> anything related to *loading* order or just unload?
>>
>> >>
>> >> These changes enhance the robustness of MEI module handling across
>> >> different hardware platforms, ensuring that the i915/Xe driver can be
>> >> cleanly unloaded and reloaded without issues.
>> >>
>> >> v2: updated commit message
>> >>
>> >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu at intel.com>
>> >> Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
>> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> >> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> >> Cc: Tejas Upadhyay <tejas.upadhyay at intel.com>
>> >> ---
>> >>  drivers/gpu/drm/i915/i915_module.c | 2 ++
>> >>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
>> >>  2 files changed, 4 insertions(+)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/i915_module.c
>> >> b/drivers/gpu/drm/i915/i915_module.c
>> >> index 65acd7bf75d0..2ad079ad35db 100644
>> >> --- a/drivers/gpu/drm/i915/i915_module.c
>> >> +++ b/drivers/gpu/drm/i915/i915_module.c
>> >> @@ -75,6 +75,8 @@ static const struct {  };  static int
>> >> init_progress;
>> >>
>> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >> +
>> >>  static int __init i915_init(void)
>> >>  {
>> >>  	int err, i;
>> >> diff --git a/drivers/gpu/drm/xe/xe_module.c
>> >> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
>> >> 100644
>> >> --- a/drivers/gpu/drm/xe/xe_module.c
>> >> +++ b/drivers/gpu/drm/xe/xe_module.c
>> >> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>> >>  	init_funcs[i].exit();
>> >>  }
>> >>
>> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >
>> >I'm honestly not very comfortable with this.
>> >
>> >1. This is not true for every device supported by these modules.
>> >2. This is not true for every (and the most basic) functionality of these drivers.
>> >
>> >Shouldn't this be done in the the mei side?
>>
>> I don't think it's possible to do from the mei side. Would mei depend on both xe
>> and i915 (and thus cause both to be loaded regardless of the platform?). For a
>> runtime dependency like this that depends on the platform, I think the best way
>> would be a weakdep + either a request_module() or something else that causes
>> the module to load (is that what comp_* is doing today?)
>>
>> >
>> >Couldn't at probe we identify the need of them and if needed we return
>> >-EPROBE to attempt a retry after the mei drivers were probed?
>>
>> I'm not sure this is fixing anything for probe. I think we already wait on the other
>> component to be ready without blocking the rest of the driver functionality.
>>
>> A weakdep wouldn't cause the module to be loaded where it's not needed, but
>> need some clarification if this is trying to fix anything load-related or just unload.
>
>This change is fixing unload.
>During xe load I am seeing mei_gsc modules was loaded, but not unloaded during the unload xe

so, first thing: if things are correct in the kernel, we shouldn't need to **unload** the module
after unbinding the device. Why are we unloading xe and the other
modules for tests? 

>
>root at DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after system reboot
>root at DUT6127BMGFRD:/home/gta#
>root at DUT6127BMGFRD:/home/gta# lsmod | grep mei
>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  2
>mei                   167936  5 mei_hdcp,mei_pxp,mei_me
>root at DUT6127BMGFRD:/home/gta# lsmod | grep xe
>root at DUT6127BMGFRD:/home/gta#
>root at DUT6127BMGFRD:/home/gta# modprobe xe
>root at DUT6127BMGFRD:/home/gta#
>root at DUT6127BMGFRD:/home/gta# lsmod | grep mei
>mei_gsc_proxy          16384  0
>mei_gsc                12288  1

			       ^ which means there's one user, which
			         should be xe

>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  3 mei_gsc
>mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>root at DUT6127BMGFRD:/home/gta#
>root at DUT6127BMGFRD:/home/gta#
>root at DUT6127BMGFRD:/home/gta#
>root at DUT6127BMGFRD:/home/gta# init 3
>root at DUT6127BMGFRD:/home/gta# echo -n auto > /sys/bus/pci/devices/0000\:03\:00.0/power/control
>root at DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" > /sys/bus/pci/drivers/xe/unbind
>root at DUT6127BMGFRD:/home/gta# modprobe -r xe
>root at DUT6127BMGFRD:/home/gta#
>root at DUT6127BMGFRD:/home/gta# lsmod | grep xe
>root at DUT6127BMGFRD:/home/gta# lsmod | grep mei
>mei_gsc_proxy          16384  0
>mei_gsc                12288  0

			       ^ great, so the refcount went to 0,
			         confirming it was xe. It should go to 0
				 even before you unload the module,
				 when unbind.

A couple of points:

1) why do we care about unloading mei_gsc. Just loading xe
    again (or even not even unloading it, just unbind/rebind),
    should still work if the xe <-> mei_gsc integration is done
    correctly.

2) If for some reason we do want to remove the module, then we will
    need some work in kernel/module/  to start tracking runtime module
    dependencies, i.e. when one module does a module_get(foo->owner), it
    would add to a list and output on sysfs together with the holders list.
    This way you would be able to track the runtime deps and remove them
    if their refcount went to 0 after removing xe.

(2) is doable, but previous attempts were not successful [1]. Is  there
something else to make the simpler solution (1) to work?

thanks
Lucas De Marchi

[1] https://lore.kernel.org/linux-modules/cover.1652113087.git.mchehab@kernel.org/

>mei_hdcp               28672  0
>mei_pxp                16384  0
>mei_me                 49152  3 mei_gsc
>mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>root at DUT6127BMGFRD:/home/gta#
>
>Regards,
>Krishna.
>
>>
>> Lucas De Marchi
>>
>> >
>> >Cc: Alexander Usyskin <alexander.usyskin at intel.com>
>> >Cc: Tomas Winkler <tomas.winkler at intel.com>
>> >Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
>> >Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> >Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> >Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> >Cc: Jani Nikula <jani.nikula at intel.com>
>> >Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> >Cc: Tvrtko Ursulin <tursulin at ursulin.net>
>> >
>> >> +
>> >>  static int __init xe_init(void)
>> >>  {
>> >>  	int err, i;
>> >> --
>> >> 2.25.1
>> >>


More information about the Intel-gfx mailing list