[PATCH 0/6] Fix crash after reloading a driver using ttm

Christian König ckoenig.leichtzumerken at gmail.com
Tue Apr 16 11:25:49 UTC 2019


Am 16.04.19 um 13:20 schrieb Karol Herbst:
> On Tue, Apr 16, 2019 at 1:07 PM Koenig, Christian
> <Christian.Koenig at amd.com> wrote:
>> Am 16.04.19 um 12:54 schrieb Karol Herbst:
>>> On Tue, Apr 16, 2019 at 11:12 AM Koenig, Christian
>>> <Christian.Koenig at amd.com> wrote:
>>>> Am 16.04.19 um 11:10 schrieb Karol Herbst:
>>>>> On Tue, Apr 16, 2019 at 8:38 AM Christian König
>>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>>> Am 16.04.19 um 02:35 schrieb Karol Herbst:
>>>>>>> Kobjects are supposed to be dynamically allocated, but with recent changes
>>>>>>> this rule was violated. Reverting those commits fixes crashes when a drm
>>>>>>> driver using TTM gets loaded again.
>>>>>>>
>>>>>>> The object in question is "ttm_mem_glob" declared inside
>>>>>>> "include/drm/ttm/ttm_memory.h" and instatiated inside
>>>>>>> "drivers/gpu/drm/ttm/ttm_memory.c".
>>>>>>>
>>>>>>> from "Documentation/kobject.txt":
>>>>>>> "Because kobjects are dynamic, they must not be declared statically or on
>>>>>>> the stack, but instead, always allocated dynamically.  Future versions of
>>>>>>> the kernel will contain a run-time check for kobjects that are created
>>>>>>> statically and will warn the developer of this improper usage."
>>>>>>>
>>>>>>> Unloading ttm before reloading the driver workarounds that crash, because
>>>>>>> the memory backing the kobject member "kobj" is cleaned up. The kobject_del
>>>>>>> and kobject_put function never free or clean up the kobject object leaving
>>>>>>> it in an undefined state.
>>>>>>>
>>>>>>> I reverted a few more commits to make it less painful for me to rever this
>>>>>>> rather big change.
>>>>>> Well, NAK. By reverting those change you also re-introduced the problems
>>>>>> we originally fixed with those patches.
>>>>>>
>>>>>> Please work on a proper fix instead,
>>>>>> Christian.
>>>>> And which problem was that besides duplicated code (or maybe even a
>>>>> bit of memory consumption if multiple ttm driver were used)? If I had
>>>>> to choose between duplicated code and a crash, I choose the former.
>>>>>
>>>>> Maybe I missed the real reason why those changes are made, but the
>>>>> commit messages don't really seem to tell me.
>>>> The old implementation crashed because different drivers tried to
>>>> allocate the same kobj.
>>>>
>>>> Crashing in one way is not better than crashing in another way.
>>>>
>>>> Christian.
>>>>
>>> how can that old crash be triggered? By loading two ttm based drivers
>>> at the same time or by other means?
>> Yes, exactly. Even worse it could be triggered by one driver
>> instantiating multiple times at the same time, e.g two AMD GPUs in the
>> same system.
>>
>> On the other hand I completely agree that using kobj static is
>> completely nuts. The problem is that using a kobj was the wrong approach
>> in the first place.
>>
>> In other words when you have something which controls a global behavior
>> of a module, what do you use? Right, a module parameter.
>>
>> Point is that we can't get away from those kobj without breaking the
>> uapi, so that is something which can't be done easily.
>>
>> Regards,
>> Christian.
> okay, thanks for the explanation!
>
> Currently I will be testing your first patch you sent out on top of
> 5.0.7, so that we could propose that for stable. The second didn't
> apply, but it didn't look relevant for the actual fix. Will reply to
> the patch with my results then.

The second is just a further clean, please ignore that one for backporting.

Let me know if the first one works on 5.0.7, if yes I'm going to add a 
CC stable tag before pushing.

Thanks,
Christian.

>
> Thanks
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



More information about the dri-devel mailing list