[PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI Modules for i915/Xe Driver

Lucas De Marchi lucas.demarchi at intel.com
Thu Sep 12 20:42:52 UTC 2024


On Thu, Sep 12, 2024 at 11:58:37AM GMT, Bommu, Krishnaiah wrote:
>
>
>> -----Original Message-----
>> From: De Marchi, Lucas <lucas.demarchi at intel.com>
>> Sent: Wednesday, September 11, 2024 9:49 PM
>> To: Bommu, Krishnaiah <krishnaiah.bommu at intel.com>
>> Cc: Vivi, Rodrigo <rodrigo.vivi at intel.com>; intel-xe at lists.freedesktop.org; intel-
>> gfx at lists.freedesktop.org; Kamil Konieczny <kamil.konieczny at linux.intel.com>;
>> Ceraolo Spurio, Daniele <daniele.ceraolospurio at intel.com>; Upadhyay, Tejas
>> <tejas.upadhyay at intel.com>; Tvrtko Ursulin <tursulin at ursulin.net>; Joonas
>> Lahtinen <joonas.lahtinen at linux.intel.com>; Nikula, Jani
>> <jani.nikula at intel.com>; Thomas Hellström
>> <thomas.hellstrom at linux.intel.com>; Teres Alexis, Alan Previn
>> <alan.previn.teres.alexis at intel.com>; Winkler, Tomas
>> <tomas.winkler at intel.com>; Usyskin, Alexander
>> <alexander.usyskin at intel.com>; linux-modules at vger.kernel.org; Luis
>> Chamberlain <mcgrof at kernel.org>
>> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
>> Modules for i915/Xe Driver
>>
>> + linux-modules
>> + Luis
>>
>> On Wed, Sep 11, 2024 at 01:00:47AM GMT, Bommu, Krishnaiah wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: De Marchi, Lucas <lucas.demarchi at intel.com>
>> >> Sent: Tuesday, September 10, 2024 9:13 PM
>> >> To: Vivi, Rodrigo <rodrigo.vivi at intel.com>
>> >> Cc: Bommu, Krishnaiah <krishnaiah.bommu at intel.com>; intel-
>> >> xe at lists.freedesktop.org; intel-gfx at lists.freedesktop.org; Kamil
>> >> Konieczny <kamil.konieczny at linux.intel.com>; Ceraolo Spurio, Daniele
>> >> <daniele.ceraolospurio at intel.com>; Upadhyay, Tejas
>> >> <tejas.upadhyay at intel.com>; Tvrtko Ursulin <tursulin at ursulin.net>;
>> >> Joonas Lahtinen <joonas.lahtinen at linux.intel.com>; Nikula, Jani
>> >> <jani.nikula at intel.com>; Thomas Hellström
>> >> <thomas.hellstrom at linux.intel.com>; Teres Alexis, Alan Previn
>> >> <alan.previn.teres.alexis at intel.com>; Winkler, Tomas
>> >> <tomas.winkler at intel.com>; Usyskin, Alexander
>> >> <alexander.usyskin at intel.com>
>> >> Subject: Re: [PATCH v2] drm: Ensure Proper Unload/Reload Order of MEI
>> >> Modules for i915/Xe Driver
>> >>
>> >> On Tue, Sep 10, 2024 at 11:03:30AM GMT, Rodrigo Vivi wrote:
>> >> >On Mon, Sep 09, 2024 at 09:33:17AM +0530, Bommu Krishnaiah wrote:
>> >> >> This update addresses the unload/reload sequence of MEI modules in
>> >> >> relation to the i915/Xe graphics driver. On platforms where the
>> >> >> MEI hardware is integrated with the graphics device (e.g.,
>> >> >> DG2/BMG), the i915/xe driver is depend on the MEI modules.
>> >> >> Conversely, on newer platforms like MTL and LNL, where the MEI
>> >> >> hardware is separate, this
>> >> dependency does not exist.
>> >> >>
>> >> >> The changes introduced ensure that MEI modules are unloaded and
>> >> >> reloaded in the correct order based on platform-specific
>> >> >> dependencies. This is achieved by adding a MODULE_SOFTDEP
>> >> >> directive to
>> >> the i915 and Xe module code.
>> >>
>> >>
>> >> can you explain what causes the modules to be loaded today? Also, is
>> >> this to fix anything related to *loading* order or just unload?
>> >>
>> >> >>
>> >> >> These changes enhance the robustness of MEI module handling across
>> >> >> different hardware platforms, ensuring that the i915/Xe driver can
>> >> >> be cleanly unloaded and reloaded without issues.
>> >> >>
>> >> >> v2: updated commit message
>> >> >>
>> >> >> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu at intel.com>
>> >> >> Cc: Kamil Konieczny <kamil.konieczny at linux.intel.com>
>> >> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> >> >> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> >> >> Cc: Tejas Upadhyay <tejas.upadhyay at intel.com>
>> >> >> ---
>> >> >>  drivers/gpu/drm/i915/i915_module.c | 2 ++
>> >> >>  drivers/gpu/drm/xe/xe_module.c     | 2 ++
>> >> >>  2 files changed, 4 insertions(+)
>> >> >>
>> >> >> diff --git a/drivers/gpu/drm/i915/i915_module.c
>> >> >> b/drivers/gpu/drm/i915/i915_module.c
>> >> >> index 65acd7bf75d0..2ad079ad35db 100644
>> >> >> --- a/drivers/gpu/drm/i915/i915_module.c
>> >> >> +++ b/drivers/gpu/drm/i915/i915_module.c
>> >> >> @@ -75,6 +75,8 @@ static const struct {  };  static int
>> >> >> init_progress;
>> >> >>
>> >> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >> >> +
>> >> >>  static int __init i915_init(void)  {
>> >> >>  	int err, i;
>> >> >> diff --git a/drivers/gpu/drm/xe/xe_module.c
>> >> >> b/drivers/gpu/drm/xe/xe_module.c index bfc3deebdaa2..5633ea1841b7
>> >> >> 100644
>> >> >> --- a/drivers/gpu/drm/xe/xe_module.c
>> >> >> +++ b/drivers/gpu/drm/xe/xe_module.c
>> >> >> @@ -127,6 +127,8 @@ static void xe_call_exit_func(unsigned int i)
>> >> >>  	init_funcs[i].exit();
>> >> >>  }
>> >> >>
>> >> >> +MODULE_SOFTDEP("pre: mei_gsc_proxy mei_gsc");
>> >> >
>> >> >I'm honestly not very comfortable with this.
>> >> >
>> >> >1. This is not true for every device supported by these modules.
>> >> >2. This is not true for every (and the most basic) functionality of these
>> drivers.
>> >> >
>> >> >Shouldn't this be done in the the mei side?
>> >>
>> >> I don't think it's possible to do from the mei side. Would mei depend
>> >> on both xe and i915 (and thus cause both to be loaded regardless of
>> >> the platform?). For a runtime dependency like this that depends on
>> >> the platform, I think the best way would be a weakdep + either a
>> >> request_module() or something else that causes the module to load (is
>> >> that what comp_* is doing today?)
>> >>
>> >> >
>> >> >Couldn't at probe we identify the need of them and if needed we
>> >> >return -EPROBE to attempt a retry after the mei drivers were probed?
>> >>
>> >> I'm not sure this is fixing anything for probe. I think we already
>> >> wait on the other component to be ready without blocking the rest of the
>> driver functionality.
>> >>
>> >> A weakdep wouldn't cause the module to be loaded where it's not
>> >> needed, but need some clarification if this is trying to fix anything load-
>> related or just unload.
>> >
>> >This change is fixing unload.
>> >During xe load I am seeing mei_gsc modules was loaded, but not unloaded
>> >during the unload xe
>>
>> so, first thing: if things are correct in the kernel, we shouldn't need to
>> **unload** the module after unbinding the device. Why are we unloading xe
>> and the other modules for tests?
>
>While running gta at xe_module_load@reload-no-display I see failure, to address this failure I have this changes, previously I am trying to fix from IGT, but as per igt review suggestion I am trying to fix issue in kernel,
>IGT patch: https://patchwork.freedesktop.org/series/137343/


it seems a mistake in igt to try to remove the mei_gsc module.
As a dgfx, it's even worse - what happens if another card is using the
module? What happens if I have a RPL + BMG and i915 driving the former
while xe drives the latter?

You shouldn't need to remove it.  This works for me with BMG (unbinding
all drivers for simplicity since we are removing the module... but if we
don't remove the module, then we can test with the only device we care
about):


# modprobe xe
# unbind
Unbinding /sys/bus/pci/devices/0000:00:02.0 (8086:a782)... ok
Unbinding /sys/bus/pci/devices/0000:03:00.0 (8086:e20b)... ok
# lsmod | grep -e xe -e mei_gsc
xe                   3584000  0
drm_gpuvm              45056  1 xe
video                  77824  1 xe
i2c_algo_bit           12288  1 xe
drm_ttm_helper         16384  1 xe
gpu_sched              61440  1 xe
drm_suballoc_helper    16384  1 xe
drm_display_helper    270336  1 xe
drm_kunit_helpers      16384  1 xe
drm_buddy              20480  1 xe
ttm                   114688  2 drm_ttm_helper,xe
mei_gsc_proxy          16384  0
mei_gsc                12288  0
drm_exec               16384  2 drm_gpuvm,xe
kunit                  73728  2 xe,drm_kunit_helpers
drm_kms_helper        241664  4 drm_display_helper,drm_ttm_helper,xe,drm_kunit_helpers
mei_me                 65536  3 mei_gsc
mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
drm                   737280  11 gpu_sched,drm_kms_helper,drm_exec,drm_gpuvm,drm_suballoc_helper,drm_display_helper,drm_buddy,drm_ttm_helper,xe,drm_kunit_helpers,ttm
# modprobe -r xe
# modprobe xe probe_display=0
# unbind
Unbinding /sys/bus/pci/devices/0000:00:02.0 (8086:a782)... ok
Unbinding /sys/bus/pci/devices/0000:03:00.0 (8086:e20b)... ok
# modprobe -r xe
# modprobe xe

I didn't check if mei_gsc continues to work after reload, but I guess so
as its refcount is incremented:

mei_gsc                12288  1


unbind function is this:

function unbind {
         vga="0300"
         display="0380"
         pci_vendor="8086"

         while read -r pci_slot class devid xxx; do
                 sysdev=/sys/bus/pci/devices/0000:$pci_slot

                 echo -n "Unbinding $sysdev ($devid)... "
                 if [ ! -e "$sysdev/driver" ]; then
                         echo "(skip: not bound)"
                         continue
                 fi

                 echo -n auto > ${sysdev}/power/control
                 echo -n "0000:$pci_slot" > $sysdev/driver/unbind
                 echo "ok"
         done <<<$(lspci -d ${pci_vendor}::${display} -n; lspci -d ${pci_vendor}::${vga} -n )
}


So... for igt: I *think* simply removing the array with modules to
unload first would fix it.

Lucas De Marchi

>
>> >root at DUT6127BMGFRD:/home/gta# lsmod | grep xe ------>>>just after
>> >system reboot root at DUT6127BMGFRD:/home/gta#
>> >root at DUT6127BMGFRD:/home/gta# lsmod | grep mei
>> >mei_hdcp               28672  0
>> >mei_pxp                16384  0
>> >mei_me                 49152  2
>> >mei                   167936  5 mei_hdcp,mei_pxp,mei_me
>> >root at DUT6127BMGFRD:/home/gta# lsmod | grep xe
>> >root at DUT6127BMGFRD:/home/gta# root at DUT6127BMGFRD:/home/gta#
>> modprobe xe
>> >root at DUT6127BMGFRD:/home/gta# root at DUT6127BMGFRD:/home/gta#
>> lsmod |
>> >grep mei
>> >mei_gsc_proxy          16384  0
>> >mei_gsc                12288  1
>>
>> 			       ^ which means there's one user, which
>> 			         should be xe
>>
>> >mei_hdcp               28672  0
>> >mei_pxp                16384  0
>> >mei_me                 49152  3 mei_gsc
>> >mei                   167936  8 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>> >root at DUT6127BMGFRD:/home/gta#
>> >root at DUT6127BMGFRD:/home/gta#
>> >root at DUT6127BMGFRD:/home/gta#
>> >root at DUT6127BMGFRD:/home/gta# init 3
>> >root at DUT6127BMGFRD:/home/gta# echo -n auto >
>> >/sys/bus/pci/devices/0000\:03\:00.0/power/control
>> >root at DUT6127BMGFRD:/home/gta# echo -n "0000:03:00.0" >
>> >/sys/bus/pci/drivers/xe/unbind root at DUT6127BMGFRD:/home/gta#
>> modprobe
>> >-r xe root at DUT6127BMGFRD:/home/gta#
>> root at DUT6127BMGFRD:/home/gta# lsmod
>> >| grep xe root at DUT6127BMGFRD:/home/gta# lsmod | grep mei
>> >mei_gsc_proxy          16384  0
>> >mei_gsc                12288  0
>>
>> 			       ^ great, so the refcount went to 0,
>> 			         confirming it was xe. It should go to 0
>> 				 even before you unload the module,
>> 				 when unbind.
>>
>> A couple of points:
>>
>> 1) why do we care about unloading mei_gsc. Just loading xe
>>     again (or even not even unloading it, just unbind/rebind),
>>     should still work if the xe <-> mei_gsc integration is done
>>     correctly.
>>
>> 2) If for some reason we do want to remove the module, then we will
>>     need some work in kernel/module/  to start tracking runtime module
>>     dependencies, i.e. when one module does a module_get(foo->owner), it
>>     would add to a list and output on sysfs together with the holders list.
>>     This way you would be able to track the runtime deps and remove them
>>     if their refcount went to 0 after removing xe.
>>
>> (2) is doable, but previous attempts were not successful [1]. Is  there something
>> else to make the simpler solution (1) to work?
>>
>
>Reference why I am doing this changes, please see review comments of this patch https://patchwork.freedesktop.org/series/137343/
>
>Regards,
>Krishna.
>
>> thanks
>> Lucas De Marchi
>>
>> [1] https://lore.kernel.org/linux-
>> modules/cover.1652113087.git.mchehab at kernel.org/
>>
>> >mei_hdcp               28672  0
>> >mei_pxp                16384  0
>> >mei_me                 49152  3 mei_gsc
>> >mei                   167936  7 mei_gsc_proxy,mei_gsc,mei_hdcp,mei_pxp,mei_me
>> >root at DUT6127BMGFRD:/home/gta#
>> >
>> >Regards,
>> >Krishna.
>> >
>> >>
>> >> Lucas De Marchi
>> >>
>> >> >
>> >> >Cc: Alexander Usyskin <alexander.usyskin at intel.com>
>> >> >Cc: Tomas Winkler <tomas.winkler at intel.com>
>> >> >Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
>> >> >Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> >> >Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>> >> >Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> >> >Cc: Jani Nikula <jani.nikula at intel.com>
>> >> >Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> >> >Cc: Tvrtko Ursulin <tursulin at ursulin.net>
>> >> >
>> >> >> +
>> >> >>  static int __init xe_init(void)
>> >> >>  {
>> >> >>  	int err, i;
>> >> >> --
>> >> >> 2.25.1
>> >> >>


More information about the Intel-gfx mailing list