[PATCH 0/6] Fix crash after reloading a driver using ttm

Koenig, Christian Christian.Koenig at amd.com
Tue Apr 16 12:29:28 UTC 2019


Am 16.04.19 um 14:18 schrieb Daniel Vetter:
> On Tue, Apr 16, 2019 at 11:06:54AM +0000, Koenig, Christian wrote:
>> Am 16.04.19 um 12:54 schrieb Karol Herbst:
>>> On Tue, Apr 16, 2019 at 11:12 AM Koenig, Christian
>>> <Christian.Koenig at amd.com> wrote:
>>>> Am 16.04.19 um 11:10 schrieb Karol Herbst:
>>>>> On Tue, Apr 16, 2019 at 8:38 AM Christian König
>>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>>> Am 16.04.19 um 02:35 schrieb Karol Herbst:
>>>>>>> Kobjects are supposed to be dynamically allocated, but with recent changes
>>>>>>> this rule was violated. Reverting those commits fixes crashes when a drm
>>>>>>> driver using TTM gets loaded again.
>>>>>>>
>>>>>>> The object in question is "ttm_mem_glob" declared inside
>>>>>>> "include/drm/ttm/ttm_memory.h" and instatiated inside
>>>>>>> "drivers/gpu/drm/ttm/ttm_memory.c".
>>>>>>>
>>>>>>> from "Documentation/kobject.txt":
>>>>>>> "Because kobjects are dynamic, they must not be declared statically or on
>>>>>>> the stack, but instead, always allocated dynamically.  Future versions of
>>>>>>> the kernel will contain a run-time check for kobjects that are created
>>>>>>> statically and will warn the developer of this improper usage."
>>>>>>>
>>>>>>> Unloading ttm before reloading the driver workarounds that crash, because
>>>>>>> the memory backing the kobject member "kobj" is cleaned up. The kobject_del
>>>>>>> and kobject_put function never free or clean up the kobject object leaving
>>>>>>> it in an undefined state.
>>>>>>>
>>>>>>> I reverted a few more commits to make it less painful for me to rever this
>>>>>>> rather big change.
>>>>>> Well, NAK. By reverting those change you also re-introduced the problems
>>>>>> we originally fixed with those patches.
>>>>>>
>>>>>> Please work on a proper fix instead,
>>>>>> Christian.
>>>>> And which problem was that besides duplicated code (or maybe even a
>>>>> bit of memory consumption if multiple ttm driver were used)? If I had
>>>>> to choose between duplicated code and a crash, I choose the former.
>>>>>
>>>>> Maybe I missed the real reason why those changes are made, but the
>>>>> commit messages don't really seem to tell me.
>>>> The old implementation crashed because different drivers tried to
>>>> allocate the same kobj.
>>>>
>>>> Crashing in one way is not better than crashing in another way.
>>>>
>>>> Christian.
>>>>
>>> how can that old crash be triggered? By loading two ttm based drivers
>>> at the same time or by other means?
>> Yes, exactly. Even worse it could be triggered by one driver
>> instantiating multiple times at the same time, e.g two AMD GPUs in the
>> same system.
>>
>> On the other hand I completely agree that using kobj static is
>> completely nuts. The problem is that using a kobj was the wrong approach
>> in the first place.
>>
>> In other words when you have something which controls a global behavior
>> of a module, what do you use? Right, a module parameter.
>>
>> Point is that we can't get away from those kobj without breaking the
>> uapi, so that is something which can't be done easily.
> Randome idea: Push the kobj setup into drm.ko (and shrugg it off as
> another lesson in how maybe uapi shouldn't have been designed, but hey not
> our worst mistake by far). I think that'd be totally ok.

Yeah, thought about that as well.

But I would rather re-design this from the scratch instead of just 
moving it over.

And yes I agree with a bit of luck that UAPI is actually not used at 
all, so we could remove it sooner or later.

Regards,
Christian.

> -Daniel



More information about the dri-devel mailing list