[Intel-gfx] [PATCH 2/2] drm/i915: Tear down properly on early i915_init exit

Daniel Vetter daniel at ffwll.ch
Mon Jul 19 08:28:22 UTC 2021


On Sat, Jul 17, 2021 at 12:48 AM Jason Ekstrand <jason at jlekstrand.net> wrote:
> In i915_exit(), we check i915_pci_driver.driver.owner to detect if
> i915_init exited early and don't tear anything down.  However, we didn't
> have proper tear-down paths for early exits in i915_init().
>
> Most of the time, you would never notice this as driver init failures
> are extremely rare and generally the sign of a bigger bug.  However,
> when the mock self-tests are run, they run as part of i915_init() and
> exit early once they complete.  They run after i915_globals_init() and
> before we set up anything else.  The IGT test then unloads the module,
> invoking i915_exit() which, thanks to our i915_pci_driver.driver.owner
> check, doesn't actually tear anything down.  Importantly, this means
> i915_globals_exit() never gets called even though i915_globals_init()
> was and we leak the globals.
>
> The most annoying part is that you don't actually notice the failure as
> part of the self-tests since leaking a bit of memory, while bad, doesn't
> result in anything observable from userspace.  Instead, the next time we
> load the driver (usually for next IGT test), i915_globals_init() gets
> invoked again, we go to allocate a bunch of new memory slabs, those
> implicitly create debugfs entries, and debugfs warns that we're trying
> to create directories and files that already exist.  Since this all
> happens as part of the next driver load, it shows up in the dmesg-warn
> of whatever IGT test ran after the mock selftests.

My idea was to onion-unwind in i915_exit, but that means we need to
carry state over or have checks for every step, which is a bit
annoying.

Yours unwinds even if i915_init returns 0, i.e. success, if we had
some selftests, which is most unusual and I think deserves an
explainer here in the commit message and maybe somewhere in the code.

> Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
> Fixes: 32eb6bcfdda9 ("drm/i915: Make request allocation caches global")
> Cc: Daniel Vetter <daniel at ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_globals.c |  4 ++--
>  drivers/gpu/drm/i915/i915_pci.c     | 23 +++++++++++++++++------
>  2 files changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
> index 77f1911c463b8..87267e1d2ad92 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -138,7 +138,7 @@ void i915_globals_unpark(void)
>         atomic_inc(&active);
>  }
>
> -static void __exit __i915_globals_flush(void)
> +static void __i915_globals_flush(void)
>  {
>         atomic_inc(&active); /* skip shrinking */
>
> @@ -148,7 +148,7 @@ static void __exit __i915_globals_flush(void)
>         atomic_dec(&active);
>  }
>
> -void __exit i915_globals_exit(void)
> +void i915_globals_exit(void)
>  {
>         GEM_BUG_ON(atomic_read(&active));
>
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 50ed93b03e582..783f547be0990 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1199,13 +1199,20 @@ static int __init i915_init(void)
>         bool use_kms = true;
>         int err;
>
> +       /* We use this to detect early returns from i915_init() so we don't
> +        * tear anything down in i915_exit()
> +        */
> +       i915_pci_driver.driver.owner = NULL;

Setting this seems redundant? Or if you want to make it explicit, just
have a dedicated bool with a big comment explaining that only when we
load the full pci driver do we tear down stuff in i915_exit. You could
then set after pci_register_driver was successful. Some screaming name
like driver_fully_loaded or something like that ...

> +
>         err = i915_globals_init();
>         if (err)
>                 return err;
>
>         err = i915_mock_selftests();
> -       if (err)
> -               return err > 0 ? 0 : err;
> +       if (err) {
> +               err = err > 0 ? 0 : err;
> +               goto globals_exit;
> +       }
>
>         /*
>          * Enable KMS by default, unless explicitly overriden by

Imo move this up, but if you want I can send out my diff so you score
an r-b: tag :-)

> @@ -1228,13 +1235,17 @@ static int __init i915_init(void)
>         i915_pmu_init();
>
>         err = pci_register_driver(&i915_pci_driver);
> -       if (err) {
> -               i915_pmu_exit();
> -               return err;
> -       }
> +       if (err)
> +               goto pmu_exit;
>
>         i915_perf_sysctl_register();
>         return 0;
> +

We unwind even on success, which is most unusual. I think that
deserves a comment.

> +pmu_exit:
> +       i915_pmu_exit();
> +globals_exit:
> +       i915_globals_exit();
> +       return err;
>  }
>
>  static void __exit i915_exit(void)
> --
> 2.31.1
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list