[PATCH 0/6] Fix crash after reloading a driver using ttm
Daniel Vetter
daniel at ffwll.ch
Tue Apr 16 12:18:31 UTC 2019
On Tue, Apr 16, 2019 at 11:06:54AM +0000, Koenig, Christian wrote:
> Am 16.04.19 um 12:54 schrieb Karol Herbst:
> > On Tue, Apr 16, 2019 at 11:12 AM Koenig, Christian
> > <Christian.Koenig at amd.com> wrote:
> >> Am 16.04.19 um 11:10 schrieb Karol Herbst:
> >>> On Tue, Apr 16, 2019 at 8:38 AM Christian König
> >>> <ckoenig.leichtzumerken at gmail.com> wrote:
> >>>> Am 16.04.19 um 02:35 schrieb Karol Herbst:
> >>>>> Kobjects are supposed to be dynamically allocated, but with recent changes
> >>>>> this rule was violated. Reverting those commits fixes crashes when a drm
> >>>>> driver using TTM gets loaded again.
> >>>>>
> >>>>> The object in question is "ttm_mem_glob" declared inside
> >>>>> "include/drm/ttm/ttm_memory.h" and instatiated inside
> >>>>> "drivers/gpu/drm/ttm/ttm_memory.c".
> >>>>>
> >>>>> from "Documentation/kobject.txt":
> >>>>> "Because kobjects are dynamic, they must not be declared statically or on
> >>>>> the stack, but instead, always allocated dynamically. Future versions of
> >>>>> the kernel will contain a run-time check for kobjects that are created
> >>>>> statically and will warn the developer of this improper usage."
> >>>>>
> >>>>> Unloading ttm before reloading the driver workarounds that crash, because
> >>>>> the memory backing the kobject member "kobj" is cleaned up. The kobject_del
> >>>>> and kobject_put function never free or clean up the kobject object leaving
> >>>>> it in an undefined state.
> >>>>>
> >>>>> I reverted a few more commits to make it less painful for me to rever this
> >>>>> rather big change.
> >>>> Well, NAK. By reverting those change you also re-introduced the problems
> >>>> we originally fixed with those patches.
> >>>>
> >>>> Please work on a proper fix instead,
> >>>> Christian.
> >>> And which problem was that besides duplicated code (or maybe even a
> >>> bit of memory consumption if multiple ttm driver were used)? If I had
> >>> to choose between duplicated code and a crash, I choose the former.
> >>>
> >>> Maybe I missed the real reason why those changes are made, but the
> >>> commit messages don't really seem to tell me.
> >> The old implementation crashed because different drivers tried to
> >> allocate the same kobj.
> >>
> >> Crashing in one way is not better than crashing in another way.
> >>
> >> Christian.
> >>
> > how can that old crash be triggered? By loading two ttm based drivers
> > at the same time or by other means?
>
> Yes, exactly. Even worse it could be triggered by one driver
> instantiating multiple times at the same time, e.g two AMD GPUs in the
> same system.
>
> On the other hand I completely agree that using kobj static is
> completely nuts. The problem is that using a kobj was the wrong approach
> in the first place.
>
> In other words when you have something which controls a global behavior
> of a module, what do you use? Right, a module parameter.
>
> Point is that we can't get away from those kobj without breaking the
> uapi, so that is something which can't be done easily.
Randome idea: Push the kobj setup into drm.ko (and shrugg it off as
another lesson in how maybe uapi shouldn't have been designed, but hey not
our worst mistake by far). I think that'd be totally ok.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dri-devel
mailing list