[PATCH 0/6] Fix crash after reloading a driver using ttm

Koenig, Christian Christian.Koenig at amd.com
Tue Apr 16 09:12:44 UTC 2019


Am 16.04.19 um 11:10 schrieb Karol Herbst:
> On Tue, Apr 16, 2019 at 8:38 AM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>> Am 16.04.19 um 02:35 schrieb Karol Herbst:
>>> Kobjects are supposed to be dynamically allocated, but with recent changes
>>> this rule was violated. Reverting those commits fixes crashes when a drm
>>> driver using TTM gets loaded again.
>>>
>>> The object in question is "ttm_mem_glob" declared inside
>>> "include/drm/ttm/ttm_memory.h" and instatiated inside
>>> "drivers/gpu/drm/ttm/ttm_memory.c".
>>>
>>> from "Documentation/kobject.txt":
>>> "Because kobjects are dynamic, they must not be declared statically or on
>>> the stack, but instead, always allocated dynamically.  Future versions of
>>> the kernel will contain a run-time check for kobjects that are created
>>> statically and will warn the developer of this improper usage."
>>>
>>> Unloading ttm before reloading the driver workarounds that crash, because
>>> the memory backing the kobject member "kobj" is cleaned up. The kobject_del
>>> and kobject_put function never free or clean up the kobject object leaving
>>> it in an undefined state.
>>>
>>> I reverted a few more commits to make it less painful for me to rever this
>>> rather big change.
>> Well, NAK. By reverting those change you also re-introduced the problems
>> we originally fixed with those patches.
>>
>> Please work on a proper fix instead,
>> Christian.
> And which problem was that besides duplicated code (or maybe even a
> bit of memory consumption if multiple ttm driver were used)? If I had
> to choose between duplicated code and a crash, I choose the former.
>
> Maybe I missed the real reason why those changes are made, but the
> commit messages don't really seem to tell me.

The old implementation crashed because different drivers tried to 
allocate the same kobj.

Crashing in one way is not better than crashing in another way.

Christian.

>
>>> dmesg output:
>>> [54758.418036] kobject (00000000687a067d): tried to init an initialized object, something is seriously wrong.
>>> [54758.418040] CPU: 6 PID: 26746 Comm: insmod Tainted: G     U     OE     5.0.6-200.fc29.x86_64 #1
>>> [54758.418041] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.12.1 10/02/2018
>>> [54758.418041] Call Trace:
>>> [54758.418049]  dump_stack+0x5c/0x80
>>> [54758.418054]  kobject_init.cold.9+0x31/0x3f
>>> [54758.418057]  kobject_init_and_add+0x35/0xa0
>>> [54758.418063]  ttm_mem_global_init+0x8f/0x2b0 [ttm]
>>> [54758.418067]  ? __debugfs_create_file+0xe1/0x110
>>> [54758.418071]  ttm_bo_device_init+0x198/0x2a0 [ttm]
>>> [54758.418144]  nouveau_ttm_init+0xbf/0x340 [nouveau]
>>> [54758.418206]  nouveau_drm_device_init+0x125/0x7d0 [nouveau]
>>> [54758.418210]  ? pci_bus_read_config_word+0x49/0x70
>>> [54758.418266]  nouveau_drm_probe+0x26f/0x2c0 [nouveau]
>>> [54758.418270]  local_pci_probe+0x41/0x90
>>> [54758.418272]  pci_device_probe+0x118/0x1a0
>>> [54758.418275]  really_probe+0xf8/0x3b0
>>> [54758.418277]  driver_probe_device+0xb3/0xf0
>>> [54758.418278]  __driver_attach+0xdd/0x110
>>> [54758.418280]  ? driver_probe_device+0xf0/0xf0
>>> [54758.418282]  bus_for_each_dev+0x77/0xc0
>>> [54758.418285]  ? klist_add_tail+0x3b/0x60
>>> [54758.418287]  bus_add_driver+0x152/0x230
>>> [54758.418288]  ? 0xffffffffc1027000
>>> [54758.418290]  driver_register+0x6b/0xb0
>>> [54758.418291]  ? 0xffffffffc1027000
>>> [54758.418294]  do_one_initcall+0x46/0x1c3
>>> [54758.418296]  ? _cond_resched+0x15/0x30
>>> [54758.418299]  ? kmem_cache_alloc_trace+0x154/0x1d0
>>> [54758.418302]  do_init_module+0x5a/0x210
>>> [54758.418304]  load_module+0x2096/0x22d0
>>> [54758.418308]  ? ima_post_read_file+0xf4/0x100
>>> [54758.418310]  ? __do_sys_finit_module+0xa8/0x110
>>> [54758.418312]  __do_sys_finit_module+0xa8/0x110
>>> [54758.418315]  do_syscall_64+0x5b/0x160
>>> [54758.418317]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [54758.418319] RIP: 0033:0x7fc0b38b6edd
>>> [54758.418321] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7b 7f 0c 00 f7 d8 64 89 01 48
>>> [54758.418322] RSP: 002b:00007ffc7f1620d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>> [54758.418323] RAX: ffffffffffffffda RBX: 00005629a3996ac0 RCX: 00007fc0b38b6edd
>>> [54758.418324] RDX: 0000000000000000 RSI: 00005629a3996260 RDI: 0000000000000003
>>> [54758.418325] RBP: 00005629a3996260 R08: 0000000000000000 R09: 0000000000000000
>>> [54758.418326] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
>>> [54758.418326] R13: 00005629a3996a80 R14: 0000000000000000 R15: 00005629a3996260
>>> [54758.418346] BUG: unable to handle kernel paging request at 00000004ee194700
>>> [54758.418348] #PF error: [WRITE]
>>> [54758.418349] PGD 0 P4D 0
>>> [54758.418352] Oops: 0002 [#1] SMP PTI
>>> [54758.418354] CPU: 6 PID: 26746 Comm: insmod Tainted: G     U     OE     5.0.6-200.fc29.x86_64 #1
>>> [54758.418355] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.12.1 10/02/2018
>>> [54758.418360] RIP: 0010:ttm_mem_global_init+0x1fe/0x2b0 [ttm]
>>> [54758.418361] Code: 00 00 00 48 89 5d 40 48 89 ab a0 00 00 00 e8 79 ba 95 f8 85 c0 0f 85 a2 00 00 00 8b 83 90 00 00 00 8d 50 01 89 93 90 00 00 00 <48> 89 ac c3 80 00 00 00 85 d2 0f 85 c6 99 00 00 48 8b 83 98 00 00
>>> [54758.418363] RSP: 0018:ffffc24e53c1f988 EFLAGS: 00010246
>>> [54758.418364] RAX: 00000000a5a2f300 RBX: ffffffffc101ae80 RCX: 0000000000000000
>>> [54758.418366] RDX: 00000000a5a2f301 RSI: 0000000000000000 RDI: ffff9ebe78b67330
>>> [54758.418367] RBP: ffff9ebe42c79d00 R08: 0000000000000044 R09: 0000000000000228
>>> [54758.418368] R10: 0000000000000000 R11: ffff9ebb62720bb8 R12: 0000000000000000
>>> [54758.418370] R13: ffff9ebe27d71640 R14: ffff9eba9f258600 R15: ffffffffc1453880
>>> [54758.418371] FS:  00007fc0b379c280(0000) GS:ffff9ebeae380000(0000) knlGS:0000000000000000
>>> [54758.418373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [54758.418374] CR2: 00000004ee194700 CR3: 000000045069e005 CR4: 00000000003606e0
>>> [54758.418375] Call Trace:
>>> [54758.418381]  ttm_bo_device_init+0x198/0x2a0 [ttm]
>>> [54758.418439]  nouveau_ttm_init+0xbf/0x340 [nouveau]
>>> [54758.418495]  nouveau_drm_device_init+0x125/0x7d0 [nouveau]
>>> [54758.418499]  ? pci_bus_read_config_word+0x49/0x70
>>> [54758.418552]  nouveau_drm_probe+0x26f/0x2c0 [nouveau]
>>> [54758.418556]  local_pci_probe+0x41/0x90
>>> [54758.418558]  pci_device_probe+0x118/0x1a0
>>> [54758.418561]  really_probe+0xf8/0x3b0
>>> [54758.418563]  driver_probe_device+0xb3/0xf0
>>> [54758.418565]  __driver_attach+0xdd/0x110
>>> [54758.418567]  ? driver_probe_device+0xf0/0xf0
>>> [54758.418570]  bus_for_each_dev+0x77/0xc0
>>> [54758.418573]  ? klist_add_tail+0x3b/0x60
>>> [54758.418574]  bus_add_driver+0x152/0x230
>>> [54758.418576]  ? 0xffffffffc1027000
>>> [54758.418578]  driver_register+0x6b/0xb0
>>> [54758.418580]  ? 0xffffffffc1027000
>>> [54758.418583]  do_one_initcall+0x46/0x1c3
>>> [54758.418585]  ? _cond_resched+0x15/0x30
>>> [54758.418587]  ? kmem_cache_alloc_trace+0x154/0x1d0
>>> [54758.418591]  do_init_module+0x5a/0x210
>>> [54758.418593]  load_module+0x2096/0x22d0
>>> [54758.418596]  ? ima_post_read_file+0xf4/0x100
>>> [54758.418599]  ? __do_sys_finit_module+0xa8/0x110
>>> [54758.418601]  __do_sys_finit_module+0xa8/0x110
>>> [54758.418604]  do_syscall_64+0x5b/0x160
>>> [54758.418607]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [54758.418609] RIP: 0033:0x7fc0b38b6edd
>>> [54758.418610] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7b 7f 0c 00 f7 d8 64 89 01 48
>>> [54758.418612] RSP: 002b:00007ffc7f1620d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>> [54758.418614] RAX: ffffffffffffffda RBX: 00005629a3996ac0 RCX: 00007fc0b38b6edd
>>> [54758.418615] RDX: 0000000000000000 RSI: 00005629a3996260 RDI: 0000000000000003
>>> [54758.418616] RBP: 00005629a3996260 R08: 0000000000000000 R09: 0000000000000000
>>> [54758.418618] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
>>> [54758.418619] R13: 00005629a3996a80 R14: 0000000000000000 R15: 00005629a3996260
>>> [54758.418620] Modules linked in: nouveau(OE+) acpi_call(OE) ttm rfcomm ccm xt_CHECKSUM ipt_MASQUERADE tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc vfat fat btusb btrtl btbcm btintel uvcvideo videobuf2_vmalloc bluetooth videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media ecdh_generic arc4 snd_hda_codec_hdmi ath10k_pci ath10k_core snd_hda_codec_realtek iTCO_wdt mei_wdt iTCO_vendor_support snd_hda_codec_generic mac80211 intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec dell_laptop ledtrig_audio dell_smm_hwmon snd_hda_core kvm_intel snd_hwdep ath snd_seq snd_seq_device intel_cstate cfg80211 snd_pcm dell_wmi intel_uncore
>>> [54758.418653]  dell_smbios dcdbas intel_rapl_perf snd_timer joydev dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof snd idma64 soundcore i2c_i801 rfkill rtsx_pci_ms memstick mei_me mei processor_thermal_device intel_lpss_pci intel_soc_dts_iosf intel_pch_thermal intel_lpss int3400_thermal acpi_pad acpi_thermal_rel int3403_thermal intel_hid pcc_cpufreq int340x_thermal_zone sparse_keymap binfmt_misc zram dm_crypt hid_multitouch i915 kvmgt mdev vfio kvm irqbypass i2c_algo_bit crct10dif_pclmul drm_kms_helper rtsx_pci_sdmmc crc32_pclmul mmc_core crc32c_intel mxm_wmi drm nvme ghash_clmulni_intel nvme_core serio_raw rtsx_pci i2c_hid video wmi i2c_dev lz4 lz4_compress [last unloaded: nouveau]
>>> [54758.418679] CR2: 00000004ee194700
>>> [54758.418681] ---[ end trace c5175234e6efc034 ]---
>>> [54758.418686] RIP: 0010:ttm_mem_global_init+0x1fe/0x2b0 [ttm]
>>> [54758.418688] Code: 00 00 00 48 89 5d 40 48 89 ab a0 00 00 00 e8 79 ba 95 f8 85 c0 0f 85 a2 00 00 00 8b 83 90 00 00 00 8d 50 01 89 93 90 00 00 00 <48> 89 ac c3 80 00 00 00 85 d2 0f 85 c6 99 00 00 48 8b 83 98 00 00
>>> [54758.418689] RSP: 0018:ffffc24e53c1f988 EFLAGS: 00010246
>>> [54758.418690] RAX: 00000000a5a2f300 RBX: ffffffffc101ae80 RCX: 0000000000000000
>>> [54758.418691] RDX: 00000000a5a2f301 RSI: 0000000000000000 RDI: ffff9ebe78b67330
>>> [54758.418692] RBP: ffff9ebe42c79d00 R08: 0000000000000044 R09: 0000000000000228
>>> [54758.418694] R10: 0000000000000000 R11: ffff9ebb62720bb8 R12: 0000000000000000
>>> [54758.418695] R13: ffff9ebe27d71640 R14: ffff9eba9f258600 R15: ffffffffc1453880
>>> [54758.418696] FS:  00007fc0b379c280(0000) GS:ffff9ebeae380000(0000) knlGS:0000000000000000
>>> [54758.418698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [54758.418699] CR2: 00000004ee194700 CR3: 000000045069e005 CR4: 00000000003606e0
>>>
>>> Karol Herbst (6):
>>>     Revert "drm: Remove drm_global.{c,h} v2"
>>>     Revert "drm/ttm: initialize globals during device init (v2)"
>>>     Revert "drm/ttm: Fix bo_global and mem_global kfree error"
>>>     Revert "drm/ttm: use a static ttm_bo_global instance"
>>>     Revert "drm/ttm: make the device list mutex static"
>>>     Revert "drm/ttm: use a static ttm_mem_global instance"
>>>
>>>    drivers/gpu/drm/Makefile                      |   2 +-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 103 ++++++++++++-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       |   2 +
>>>    drivers/gpu/drm/ast/ast_drv.h                 |   2 +
>>>    drivers/gpu/drm/ast/ast_ttm.c                 |  64 ++++++++
>>>    drivers/gpu/drm/bochs/bochs.h                 |   2 +
>>>    drivers/gpu/drm/bochs/bochs_mm.c              |  61 ++++++++
>>>    drivers/gpu/drm/cirrus/cirrus_drv.h           |   2 +
>>>    drivers/gpu/drm/cirrus/cirrus_ttm.c           |  64 ++++++++
>>>    drivers/gpu/drm/drm_drv.c                     |   2 +
>>>    drivers/gpu/drm/drm_global.c                  | 137 ++++++++++++++++++
>>>    .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h   |   2 +
>>>    drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c   |  57 ++++++++
>>>    drivers/gpu/drm/mgag200/mgag200_drv.h         |   2 +
>>>    drivers/gpu/drm/mgag200/mgag200_ttm.c         |  64 ++++++++
>>>    drivers/gpu/drm/nouveau/nouveau_drv.h         |   2 +
>>>    drivers/gpu/drm/nouveau/nouveau_ttm.c         |  67 +++++++++
>>>    drivers/gpu/drm/qxl/qxl_drv.h                 |   3 +
>>>    drivers/gpu/drm/qxl/qxl_ttm.c                 |  61 ++++++++
>>>    drivers/gpu/drm/radeon/radeon.h               |   3 +
>>>    drivers/gpu/drm/radeon/radeon_ttm.c           |  65 +++++++++
>>>    drivers/gpu/drm/ttm/ttm_bo.c                  |  67 +++------
>>>    drivers/gpu/drm/ttm/ttm_memory.c              |  14 +-
>>>    drivers/gpu/drm/virtio/virtgpu_drv.h          |   3 +
>>>    drivers/gpu/drm/virtio/virtgpu_ttm.c          |  62 ++++++++
>>>    drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |  15 +-
>>>    drivers/gpu/drm/vmwgfx/vmwgfx_drv.h           |   6 +-
>>>    drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c      |  54 +++++++
>>>    drivers/staging/vboxvideo/vbox_drv.h          |   2 +
>>>    drivers/staging/vboxvideo/vbox_ttm.c          |  65 ++++++++-
>>>    include/drm/drmP.h                            |   1 +
>>>    include/drm/drm_global.h                      |  53 +++++++
>>>    include/drm/ttm/ttm_bo_driver.h               |  54 ++++++-
>>>    include/drm/ttm/ttm_memory.h                  |   4 +-
>>>    34 files changed, 1102 insertions(+), 65 deletions(-)
>>>    create mode 100644 drivers/gpu/drm/drm_global.c
>>>    create mode 100644 include/drm/drm_global.h
>>>
>>




More information about the dri-devel mailing list