[Intel-gfx] [PATCH 2/2] drm/i915: Tear down properly on early i915_init exit
Daniel Vetter
daniel at ffwll.ch
Mon Jul 19 08:28:22 UTC 2021
On Sat, Jul 17, 2021 at 12:48 AM Jason Ekstrand <jason at jlekstrand.net> wrote:
> In i915_exit(), we check i915_pci_driver.driver.owner to detect if
> i915_init exited early and don't tear anything down. However, we didn't
> have proper tear-down paths for early exits in i915_init().
>
> Most of the time, you would never notice this as driver init failures
> are extremely rare and generally the sign of a bigger bug. However,
> when the mock self-tests are run, they run as part of i915_init() and
> exit early once they complete. They run after i915_globals_init() and
> before we set up anything else. The IGT test then unloads the module,
> invoking i915_exit() which, thanks to our i915_pci_driver.driver.owner
> check, doesn't actually tear anything down. Importantly, this means
> i915_globals_exit() never gets called even though i915_globals_init()
> was and we leak the globals.
>
> The most annoying part is that you don't actually notice the failure as
> part of the self-tests since leaking a bit of memory, while bad, doesn't
> result in anything observable from userspace. Instead, the next time we
> load the driver (usually for next IGT test), i915_globals_init() gets
> invoked again, we go to allocate a bunch of new memory slabs, those
> implicitly create debugfs entries, and debugfs warns that we're trying
> to create directories and files that already exist. Since this all
> happens as part of the next driver load, it shows up in the dmesg-warn
> of whatever IGT test ran after the mock selftests.
My idea was to onion-unwind in i915_exit, but that means we need to
carry state over or have checks for every step, which is a bit
annoying.
Yours unwinds even if i915_init returns 0, i.e. success, if we had
some selftests, which is most unusual and I think deserves an
explainer here in the commit message and maybe somewhere in the code.
> Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
> Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
> Cc: Daniel Vetter <daniel at ffwll.ch>
> ---
> drivers/gpu/drm/i915/i915_globals.c | 4 ++--
> drivers/gpu/drm/i915/i915_pci.c | 23 +++++++++++++++++------
> 2 files changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
> index 77f1911c463b8..87267e1d2ad92 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -138,7 +138,7 @@ void i915_globals_unpark(void)
> atomic_inc(&active);
> }
>
> -static void __exit __i915_globals_flush(void)
> +static void __i915_globals_flush(void)
> {
> atomic_inc(&active); /* skip shrinking */
>
> @@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void)
> atomic_dec(&active);
> }
>
> -void __exit i915_globals_exit(void)
> +void i915_globals_exit(void)
> {
> GEM_BUG_ON(atomic_read(&active));
>
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 50ed93b03e582..783f547be0990 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1199,13 +1199,20 @@ static int __init i915_init(void)
> bool use_kms = true;
> int err;
>
> + /* We use this to detect early returns from i915_init() so we don't
> + * tear anything down in i915_exit()
> + */
> + i915_pci_driver.driver.owner = NULL;
Setting this seems redundant? Or if you want to make it explicit, just
have a dedicated bool with a big comment explaining that only when we
load the full pci driver do we tear down stuff in i915_exit. You could
then set after pci_register_driver was successful. Some screaming name
like driver_fully_loaded or something like that ...
> +
> err = i915_globals_init();
> if (err)
> return err;
>
> err = i915_mock_selftests();
> - if (err)
> - return err > 0 ? 0 : err;
> + if (err) {
> + err = err > 0 ? 0 : err;
> + goto globals_exit;
> + }
>
> /*
> * Enable KMS by default, unless explicitly overriden by
Imo move this up, but if you want I can send out my diff so you score
an r-b: tag :-)
> @@ -1228,13 +1235,17 @@ static int __init i915_init(void)
> i915_pmu_init();
>
> err = pci_register_driver(&i915_pci_driver);
> - if (err) {
> - i915_pmu_exit();
> - return err;
> - }
> + if (err)
> + goto pmu_exit;
>
> i915_perf_sysctl_register();
> return 0;
> +
We unwind even on success, which is most unusual. I think that
deserves a comment.
> +pmu_exit:
> + i915_pmu_exit();
> +globals_exit:
> + i915_globals_exit();
> + return err;
> }
>
> static void __exit i915_exit(void)
> --
> 2.31.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list