[Intel-gfx] [PATCH 3/6] drm/i915: Use a table for i915_init/exit
Jason Ekstrand
jason at jlekstrand.net
Wed Jul 21 15:12:11 UTC 2021
On Wed, Jul 21, 2021 at 4:06 AM Tvrtko Ursulin
<tvrtko.ursulin at linux.intel.com> wrote:
>
>
> On 20/07/2021 19:13, Jason Ekstrand wrote:
> > If the driver was not fully loaded, we may still have globals lying
> > around. If we don't tear those down in i915_exit(), we'll leak a bunch
> > of memory slabs. This can happen two ways: use_kms = false and if we've
> > run mock selftests. In either case, we have an early exit from
> > i915_init which happens after i915_globals_init() and we need to clean
> > up those globals.
> >
> > The mock selftests case is especially sticky. The load isn't entirely
> > a no-op. We actually do quite a bit inside those selftests including
> > allocating a bunch of mock objects and running tests on them. Once all
> > those tests are complete, we exit early from i915_init(). Perviously,
> > i915_init() would return a non-zero error code on failure and a zero
> > error code on success. In the success case, we would get to i915_exit()
> > and check i915_pci_driver.driver.owner to detect if i915_init exited early
> > and do nothing. In the failure case, we would fail i915_init() but
> > there would be no opportunity to clean up globals.
> >
> > The most annoying part is that you don't actually notice the failure as
> > part of the self-tests since leaking a bit of memory, while bad, doesn't
> > result in anything observable from userspace. Instead, the next time we
> > load the driver (usually for next IGT test), i915_globals_init() gets
> > invoked again, we go to allocate a bunch of new memory slabs, those
> > implicitly create debugfs entries, and debugfs warns that we're trying
> > to create directories and files that already exist. Since this all
> > happens as part of the next driver load, it shows up in the dmesg-warn
> > of whatever IGT test ran after the mock selftests.
> >
> > While the obvious thing to do here might be to call i915_globals_exit()
> > after selftests, that's not actually safe. The dma-buf selftests call
> > i915_gem_prime_export which creates a file. We call dma_buf_put() on
> > the resulting dmabuf which calls fput() on the file. However, fput()
> > isn't immediate and gets flushed right before syscall returns. This
> > means that all the fput()s from the selftests don't happen until right
> > before the module load syscall used to fire off the selftests returns
> > which is after i915_init(). If we call i915_globals_exit() in
> > i915_init() after selftests, we end up freeing slabs out from under
> > objects which won't get released until fput() is flushed at the end of
> > the module load syscall.
> >
> > The solution here is to let i915_init() return success early and detect
> > the early success in i915_exit() and only tear down globals and nothing
> > else. This way the module loads successfully, regardless of the success
> > or failure of the tests. Because we've not enumerated any PCI devices,
> > no device nodes are created and it's entirely useless from userspace.
> > The only thing the module does at that point is hold on to a bit of
> > memory until we unload it and i915_exit() is called. Importantly, this
> > means that everything from our selftests has the ability to properly
> > flush out between i915_init() and i915_exit() because there is at least
> > one syscall boundary in between.
> >
> > In order to handle all the delicate init/exit cases, we convert the
> > whole thing to a table of init/exit pairs and track the init status in
> > the new init_progress global. This allows us to ensure that i915_exit()
> > always tears down exactly the things that i915_init() successfully
> > initialized. We also allow early-exit of i915_init() without failure by
> > an init function returning > 0. This is useful for nomodeset, and
> > selftests. For the mock selftests, we convert them to always return 1
> > so we get the desired behavior of the driver always succeeding to load
> > the driver and then properly tearing down the partially loaded driver.
> >
> > Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_pci.c | 104 ++++++++++++------
> > drivers/gpu/drm/i915/i915_perf.c | 3 +-
> > drivers/gpu/drm/i915/i915_perf.h | 2 +-
> > drivers/gpu/drm/i915/i915_pmu.c | 4 +-
> > drivers/gpu/drm/i915/i915_pmu.h | 4 +-
> > .../gpu/drm/i915/selftests/i915_selftest.c | 2 +-
> > 6 files changed, 80 insertions(+), 39 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > index 4e627b57d31a2..64ebd89eae6ce 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -1185,27 +1185,9 @@ static void i915_pci_shutdown(struct pci_dev *pdev)
> > i915_driver_shutdown(i915);
> > }
> >
> > -static struct pci_driver i915_pci_driver = {
> > - .name = DRIVER_NAME,
> > - .id_table = pciidlist,
> > - .probe = i915_pci_probe,
> > - .remove = i915_pci_remove,
> > - .shutdown = i915_pci_shutdown,
> > - .driver.pm = &i915_pm_ops,
> > -};
> > -
> > -static int __init i915_init(void)
> > +static int i915_check_nomodeset(void)
> > {
> > bool use_kms = true;
> > - int err;
> > -
> > - err = i915_globals_init();
> > - if (err)
> > - return err;
> > -
> > - err = i915_mock_selftests();
> > - if (err)
> > - return err > 0 ? 0 : err;
> >
> > /*
> > * Enable KMS by default, unless explicitly overriden by
> > @@ -1222,31 +1204,87 @@ static int __init i915_init(void)
> > if (!use_kms) {
> > /* Silently fail loading to not upset userspace. */
> > DRM_DEBUG_DRIVER("KMS disabled.\n");
> > - return 0;
> > + return 1;
> > }
> >
> > - i915_pmu_init();
> > + return 0;
> > +}
> >
> > - err = pci_register_driver(&i915_pci_driver);
> > - if (err) {
> > - i915_pmu_exit();
> > - i915_globals_exit();
> > - return err;
> > +static struct pci_driver i915_pci_driver = {
> > + .name = DRIVER_NAME,
> > + .id_table = pciidlist,
> > + .probe = i915_pci_probe,
> > + .remove = i915_pci_remove,
> > + .shutdown = i915_pci_shutdown,
> > + .driver.pm = &i915_pm_ops,
> > +};
> > +
> > +static int i915_register_pci_driver(void)
> > +{
> > + return pci_register_driver(&i915_pci_driver);
> > +}
> > +
> > +static void i915_unregister_pci_driver(void)
> > +{
> > + pci_unregister_driver(&i915_pci_driver);
> > +}
> > +
> > +static const struct {
> > + int (*init)(void);
> > + void (*exit)(void);
> > +} init_funcs[] = {
> > + { i915_globals_init, i915_globals_exit },
> > + { i915_mock_selftests, NULL },
> > + { i915_check_nomodeset, NULL },
> > + { i915_pmu_init, i915_pmu_exit },
> > + { i915_register_pci_driver, i915_unregister_pci_driver },
> > + { i915_perf_sysctl_register, i915_perf_sysctl_unregister },
> > +};
> > +static int init_progress;
> > +
> > +static int __init i915_init(void)
> > +{
> > + int err, i;
> > +
> > + for (i = 0; i < ARRAY_SIZE(init_funcs); i++) {
> > + err = init_funcs[i].init();
> > + if (err < 0) {
> > + while (i--) {
> > + if (init_funcs[i].exit)
> > + init_funcs[i].exit();
> > + }
> > + return err;
> > + } else if (err > 0) {
> > + /*
> > + * Early-exit success is reserved for things which
> > + * don't have an exit() function because we have no
> > + * idea how far they got or how to partially tear
> > + * them down.
> > + */
> > + WARN_ON(init_funcs[i].exit);
>
> I'm not completely happy with the subtlety of where the knowledge of who
> needs the module to remain loaded and why ends up. It's partly in the
> change of return code from i915_mock_selftests and partly here. But I
> don't have any better ideas, which wouldn't have downsides of their own,
> on how to express this cleanly so just passing grumbling.
I can't say I really like it either which is why I threw in this
WARN_ON and comment. This really should be a quite exceptional case.
> I mean ideally it should be only that specific dma buf test case which
> sends out a specific return value requesting not to unload, when it
> knows it has used fput. But that would need the i915 selftests runner to
> accept the positive error and no idea if that would have some other
> consequences without going very deep.
>
> > +
> > + /*
> > + * We don't want to advertise devices with an only
> > + * partially initialized driver.
> > + */
> > + WARN_ON(i915_pci_driver.driver.owner);
> > + break;
> > + }
> > }
> >
> > - i915_perf_sysctl_register();
> > + init_progress = i;
> > +
> > return 0;
> > }
> >
> > static void __exit i915_exit(void)
> > {
> > - if (!i915_pci_driver.driver.owner)
> > - return;
> > + int i;
> >
> > - i915_perf_sysctl_unregister();
> > - pci_unregister_driver(&i915_pci_driver);
> > - i915_pmu_exit();
> > - i915_globals_exit();
> > + for (i = init_progress - 1; i >= 0; i--) {
>
> Perhaps GEM_BUG_ON(i >= ARRAY_SIZE(init_funcs)) here just in case?
Can do.
--Jason
> > + if (init_funcs[i].exit)
> > + init_funcs[i].exit();
> > + }
> > }
> >
> > module_init(i915_init);
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index b4ec114a4698b..48ddb363b3bda 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -4483,9 +4483,10 @@ static int destroy_config(int id, void *p, void *data)
> > return 0;
> > }
> >
> > -void i915_perf_sysctl_register(void)
> > +int i915_perf_sysctl_register(void)
> > {
> > sysctl_header = register_sysctl_table(dev_root);
> > + return 0;
> > }
> >
> > void i915_perf_sysctl_unregister(void)
> > diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h
> > index 882fdd0a76800..1d1329e5af3ae 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.h
> > +++ b/drivers/gpu/drm/i915/i915_perf.h
> > @@ -23,7 +23,7 @@ void i915_perf_fini(struct drm_i915_private *i915);
> > void i915_perf_register(struct drm_i915_private *i915);
> > void i915_perf_unregister(struct drm_i915_private *i915);
> > int i915_perf_ioctl_version(void);
> > -void i915_perf_sysctl_register(void);
> > +int i915_perf_sysctl_register(void);
> > void i915_perf_sysctl_unregister(void);
> >
> > int i915_perf_open_ioctl(struct drm_device *dev, void *data,
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > index 34d37d46a1262..eca92076f31d2 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -1088,7 +1088,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
> >
> > static enum cpuhp_state cpuhp_slot = CPUHP_INVALID;
> >
> > -void i915_pmu_init(void)
> > +int i915_pmu_init(void)
> > {
> > int ret;
> >
> > @@ -1101,6 +1101,8 @@ void i915_pmu_init(void)
> > ret);
> > else
> > cpuhp_slot = ret;
> > +
> > + return 0;
> > }
> >
> > void i915_pmu_exit(void)
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
> > index 60f9595f902cd..449057648f39b 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.h
> > +++ b/drivers/gpu/drm/i915/i915_pmu.h
> > @@ -147,14 +147,14 @@ struct i915_pmu {
> > };
> >
> > #ifdef CONFIG_PERF_EVENTS
> > -void i915_pmu_init(void);
> > +int i915_pmu_init(void);
> > void i915_pmu_exit(void);
> > void i915_pmu_register(struct drm_i915_private *i915);
> > void i915_pmu_unregister(struct drm_i915_private *i915);
> > void i915_pmu_gt_parked(struct drm_i915_private *i915);
> > void i915_pmu_gt_unparked(struct drm_i915_private *i915);
> > #else
> > -static inline void i915_pmu_init(void) {}
> > +static inline int i915_pmu_init(void) { return 0; }
> > static inline void i915_pmu_exit(void) {}
> > static inline void i915_pmu_register(struct drm_i915_private *i915) {}
> > static inline void i915_pmu_unregister(struct drm_i915_private *i915) {}
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_selftest.c b/drivers/gpu/drm/i915/selftests/i915_selftest.c
> > index 1bc11c09faef5..935d065725345 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_selftest.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_selftest.c
> > @@ -187,7 +187,7 @@ int i915_mock_selftests(void)
> > err = run_selftests(mock, NULL);
> > if (err) {
> > i915_selftest.mock = err;
> > - return err;
> > + return 1;
>
> I checked igt_kselftest_execute and it looks like it will handle this
> change in behaviour so that's fine.
>
> > }
> >
> > if (i915_selftest.mock < 0) {
> >
>
> Regards,
>
> Tvrtko
More information about the Intel-gfx
mailing list