[PATCH 0/6] Fix crash after reloading a driver using ttm

Karol Herbst kherbst at redhat.com
Tue Apr 16 09:10:58 UTC 2019


On Tue, Apr 16, 2019 at 8:38 AM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 16.04.19 um 02:35 schrieb Karol Herbst:
> > Kobjects are supposed to be dynamically allocated, but with recent changes
> > this rule was violated. Reverting those commits fixes crashes when a drm
> > driver using TTM gets loaded again.
> >
> > The object in question is "ttm_mem_glob" declared inside
> > "include/drm/ttm/ttm_memory.h" and instatiated inside
> > "drivers/gpu/drm/ttm/ttm_memory.c".
> >
> > from "Documentation/kobject.txt":
> > "Because kobjects are dynamic, they must not be declared statically or on
> > the stack, but instead, always allocated dynamically.  Future versions of
> > the kernel will contain a run-time check for kobjects that are created
> > statically and will warn the developer of this improper usage."
> >
> > Unloading ttm before reloading the driver workarounds that crash, because
> > the memory backing the kobject member "kobj" is cleaned up. The kobject_del
> > and kobject_put function never free or clean up the kobject object leaving
> > it in an undefined state.
> >
> > I reverted a few more commits to make it less painful for me to rever this
> > rather big change.
>
> Well, NAK. By reverting those change you also re-introduced the problems
> we originally fixed with those patches.
>
> Please work on a proper fix instead,
> Christian.

And which problem was that besides duplicated code (or maybe even a
bit of memory consumption if multiple ttm driver were used)? If I had
to choose between duplicated code and a crash, I choose the former.

Maybe I missed the real reason why those changes are made, but the
commit messages don't really seem to tell me.

>
> >
> > dmesg output:
> > [54758.418036] kobject (00000000687a067d): tried to init an initialized object, something is seriously wrong.
> > [54758.418040] CPU: 6 PID: 26746 Comm: insmod Tainted: G     U     OE     5.0.6-200.fc29.x86_64 #1
> > [54758.418041] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.12.1 10/02/2018
> > [54758.418041] Call Trace:
> > [54758.418049]  dump_stack+0x5c/0x80
> > [54758.418054]  kobject_init.cold.9+0x31/0x3f
> > [54758.418057]  kobject_init_and_add+0x35/0xa0
> > [54758.418063]  ttm_mem_global_init+0x8f/0x2b0 [ttm]
> > [54758.418067]  ? __debugfs_create_file+0xe1/0x110
> > [54758.418071]  ttm_bo_device_init+0x198/0x2a0 [ttm]
> > [54758.418144]  nouveau_ttm_init+0xbf/0x340 [nouveau]
> > [54758.418206]  nouveau_drm_device_init+0x125/0x7d0 [nouveau]
> > [54758.418210]  ? pci_bus_read_config_word+0x49/0x70
> > [54758.418266]  nouveau_drm_probe+0x26f/0x2c0 [nouveau]
> > [54758.418270]  local_pci_probe+0x41/0x90
> > [54758.418272]  pci_device_probe+0x118/0x1a0
> > [54758.418275]  really_probe+0xf8/0x3b0
> > [54758.418277]  driver_probe_device+0xb3/0xf0
> > [54758.418278]  __driver_attach+0xdd/0x110
> > [54758.418280]  ? driver_probe_device+0xf0/0xf0
> > [54758.418282]  bus_for_each_dev+0x77/0xc0
> > [54758.418285]  ? klist_add_tail+0x3b/0x60
> > [54758.418287]  bus_add_driver+0x152/0x230
> > [54758.418288]  ? 0xffffffffc1027000
> > [54758.418290]  driver_register+0x6b/0xb0
> > [54758.418291]  ? 0xffffffffc1027000
> > [54758.418294]  do_one_initcall+0x46/0x1c3
> > [54758.418296]  ? _cond_resched+0x15/0x30
> > [54758.418299]  ? kmem_cache_alloc_trace+0x154/0x1d0
> > [54758.418302]  do_init_module+0x5a/0x210
> > [54758.418304]  load_module+0x2096/0x22d0
> > [54758.418308]  ? ima_post_read_file+0xf4/0x100
> > [54758.418310]  ? __do_sys_finit_module+0xa8/0x110
> > [54758.418312]  __do_sys_finit_module+0xa8/0x110
> > [54758.418315]  do_syscall_64+0x5b/0x160
> > [54758.418317]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [54758.418319] RIP: 0033:0x7fc0b38b6edd
> > [54758.418321] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7b 7f 0c 00 f7 d8 64 89 01 48
> > [54758.418322] RSP: 002b:00007ffc7f1620d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > [54758.418323] RAX: ffffffffffffffda RBX: 00005629a3996ac0 RCX: 00007fc0b38b6edd
> > [54758.418324] RDX: 0000000000000000 RSI: 00005629a3996260 RDI: 0000000000000003
> > [54758.418325] RBP: 00005629a3996260 R08: 0000000000000000 R09: 0000000000000000
> > [54758.418326] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
> > [54758.418326] R13: 00005629a3996a80 R14: 0000000000000000 R15: 00005629a3996260
> > [54758.418346] BUG: unable to handle kernel paging request at 00000004ee194700
> > [54758.418348] #PF error: [WRITE]
> > [54758.418349] PGD 0 P4D 0
> > [54758.418352] Oops: 0002 [#1] SMP PTI
> > [54758.418354] CPU: 6 PID: 26746 Comm: insmod Tainted: G     U     OE     5.0.6-200.fc29.x86_64 #1
> > [54758.418355] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.12.1 10/02/2018
> > [54758.418360] RIP: 0010:ttm_mem_global_init+0x1fe/0x2b0 [ttm]
> > [54758.418361] Code: 00 00 00 48 89 5d 40 48 89 ab a0 00 00 00 e8 79 ba 95 f8 85 c0 0f 85 a2 00 00 00 8b 83 90 00 00 00 8d 50 01 89 93 90 00 00 00 <48> 89 ac c3 80 00 00 00 85 d2 0f 85 c6 99 00 00 48 8b 83 98 00 00
> > [54758.418363] RSP: 0018:ffffc24e53c1f988 EFLAGS: 00010246
> > [54758.418364] RAX: 00000000a5a2f300 RBX: ffffffffc101ae80 RCX: 0000000000000000
> > [54758.418366] RDX: 00000000a5a2f301 RSI: 0000000000000000 RDI: ffff9ebe78b67330
> > [54758.418367] RBP: ffff9ebe42c79d00 R08: 0000000000000044 R09: 0000000000000228
> > [54758.418368] R10: 0000000000000000 R11: ffff9ebb62720bb8 R12: 0000000000000000
> > [54758.418370] R13: ffff9ebe27d71640 R14: ffff9eba9f258600 R15: ffffffffc1453880
> > [54758.418371] FS:  00007fc0b379c280(0000) GS:ffff9ebeae380000(0000) knlGS:0000000000000000
> > [54758.418373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [54758.418374] CR2: 00000004ee194700 CR3: 000000045069e005 CR4: 00000000003606e0
> > [54758.418375] Call Trace:
> > [54758.418381]  ttm_bo_device_init+0x198/0x2a0 [ttm]
> > [54758.418439]  nouveau_ttm_init+0xbf/0x340 [nouveau]
> > [54758.418495]  nouveau_drm_device_init+0x125/0x7d0 [nouveau]
> > [54758.418499]  ? pci_bus_read_config_word+0x49/0x70
> > [54758.418552]  nouveau_drm_probe+0x26f/0x2c0 [nouveau]
> > [54758.418556]  local_pci_probe+0x41/0x90
> > [54758.418558]  pci_device_probe+0x118/0x1a0
> > [54758.418561]  really_probe+0xf8/0x3b0
> > [54758.418563]  driver_probe_device+0xb3/0xf0
> > [54758.418565]  __driver_attach+0xdd/0x110
> > [54758.418567]  ? driver_probe_device+0xf0/0xf0
> > [54758.418570]  bus_for_each_dev+0x77/0xc0
> > [54758.418573]  ? klist_add_tail+0x3b/0x60
> > [54758.418574]  bus_add_driver+0x152/0x230
> > [54758.418576]  ? 0xffffffffc1027000
> > [54758.418578]  driver_register+0x6b/0xb0
> > [54758.418580]  ? 0xffffffffc1027000
> > [54758.418583]  do_one_initcall+0x46/0x1c3
> > [54758.418585]  ? _cond_resched+0x15/0x30
> > [54758.418587]  ? kmem_cache_alloc_trace+0x154/0x1d0
> > [54758.418591]  do_init_module+0x5a/0x210
> > [54758.418593]  load_module+0x2096/0x22d0
> > [54758.418596]  ? ima_post_read_file+0xf4/0x100
> > [54758.418599]  ? __do_sys_finit_module+0xa8/0x110
> > [54758.418601]  __do_sys_finit_module+0xa8/0x110
> > [54758.418604]  do_syscall_64+0x5b/0x160
> > [54758.418607]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [54758.418609] RIP: 0033:0x7fc0b38b6edd
> > [54758.418610] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7b 7f 0c 00 f7 d8 64 89 01 48
> > [54758.418612] RSP: 002b:00007ffc7f1620d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > [54758.418614] RAX: ffffffffffffffda RBX: 00005629a3996ac0 RCX: 00007fc0b38b6edd
> > [54758.418615] RDX: 0000000000000000 RSI: 00005629a3996260 RDI: 0000000000000003
> > [54758.418616] RBP: 00005629a3996260 R08: 0000000000000000 R09: 0000000000000000
> > [54758.418618] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
> > [54758.418619] R13: 00005629a3996a80 R14: 0000000000000000 R15: 00005629a3996260
> > [54758.418620] Modules linked in: nouveau(OE+) acpi_call(OE) ttm rfcomm ccm xt_CHECKSUM ipt_MASQUERADE tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc vfat fat btusb btrtl btbcm btintel uvcvideo videobuf2_vmalloc bluetooth videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media ecdh_generic arc4 snd_hda_codec_hdmi ath10k_pci ath10k_core snd_hda_codec_realtek iTCO_wdt mei_wdt iTCO_vendor_support snd_hda_codec_generic mac80211 intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec dell_laptop ledtrig_audio dell_smm_hwmon snd_hda_core kvm_intel snd_hwdep ath snd_seq snd_seq_device intel_cstate cfg80211 snd_pcm dell_wmi intel_uncore
> > [54758.418653]  dell_smbios dcdbas intel_rapl_perf snd_timer joydev dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof snd idma64 soundcore i2c_i801 rfkill rtsx_pci_ms memstick mei_me mei processor_thermal_device intel_lpss_pci intel_soc_dts_iosf intel_pch_thermal intel_lpss int3400_thermal acpi_pad acpi_thermal_rel int3403_thermal intel_hid pcc_cpufreq int340x_thermal_zone sparse_keymap binfmt_misc zram dm_crypt hid_multitouch i915 kvmgt mdev vfio kvm irqbypass i2c_algo_bit crct10dif_pclmul drm_kms_helper rtsx_pci_sdmmc crc32_pclmul mmc_core crc32c_intel mxm_wmi drm nvme ghash_clmulni_intel nvme_core serio_raw rtsx_pci i2c_hid video wmi i2c_dev lz4 lz4_compress [last unloaded: nouveau]
> > [54758.418679] CR2: 00000004ee194700
> > [54758.418681] ---[ end trace c5175234e6efc034 ]---
> > [54758.418686] RIP: 0010:ttm_mem_global_init+0x1fe/0x2b0 [ttm]
> > [54758.418688] Code: 00 00 00 48 89 5d 40 48 89 ab a0 00 00 00 e8 79 ba 95 f8 85 c0 0f 85 a2 00 00 00 8b 83 90 00 00 00 8d 50 01 89 93 90 00 00 00 <48> 89 ac c3 80 00 00 00 85 d2 0f 85 c6 99 00 00 48 8b 83 98 00 00
> > [54758.418689] RSP: 0018:ffffc24e53c1f988 EFLAGS: 00010246
> > [54758.418690] RAX: 00000000a5a2f300 RBX: ffffffffc101ae80 RCX: 0000000000000000
> > [54758.418691] RDX: 00000000a5a2f301 RSI: 0000000000000000 RDI: ffff9ebe78b67330
> > [54758.418692] RBP: ffff9ebe42c79d00 R08: 0000000000000044 R09: 0000000000000228
> > [54758.418694] R10: 0000000000000000 R11: ffff9ebb62720bb8 R12: 0000000000000000
> > [54758.418695] R13: ffff9ebe27d71640 R14: ffff9eba9f258600 R15: ffffffffc1453880
> > [54758.418696] FS:  00007fc0b379c280(0000) GS:ffff9ebeae380000(0000) knlGS:0000000000000000
> > [54758.418698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [54758.418699] CR2: 00000004ee194700 CR3: 000000045069e005 CR4: 00000000003606e0
> >
> > Karol Herbst (6):
> >    Revert "drm: Remove drm_global.{c,h} v2"
> >    Revert "drm/ttm: initialize globals during device init (v2)"
> >    Revert "drm/ttm: Fix bo_global and mem_global kfree error"
> >    Revert "drm/ttm: use a static ttm_bo_global instance"
> >    Revert "drm/ttm: make the device list mutex static"
> >    Revert "drm/ttm: use a static ttm_mem_global instance"
> >
> >   drivers/gpu/drm/Makefile                      |   2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 103 ++++++++++++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       |   2 +
> >   drivers/gpu/drm/ast/ast_drv.h                 |   2 +
> >   drivers/gpu/drm/ast/ast_ttm.c                 |  64 ++++++++
> >   drivers/gpu/drm/bochs/bochs.h                 |   2 +
> >   drivers/gpu/drm/bochs/bochs_mm.c              |  61 ++++++++
> >   drivers/gpu/drm/cirrus/cirrus_drv.h           |   2 +
> >   drivers/gpu/drm/cirrus/cirrus_ttm.c           |  64 ++++++++
> >   drivers/gpu/drm/drm_drv.c                     |   2 +
> >   drivers/gpu/drm/drm_global.c                  | 137 ++++++++++++++++++
> >   .../gpu/drm/hisilicon/hibmc/hibmc_drm_drv.h   |   2 +
> >   drivers/gpu/drm/hisilicon/hibmc/hibmc_ttm.c   |  57 ++++++++
> >   drivers/gpu/drm/mgag200/mgag200_drv.h         |   2 +
> >   drivers/gpu/drm/mgag200/mgag200_ttm.c         |  64 ++++++++
> >   drivers/gpu/drm/nouveau/nouveau_drv.h         |   2 +
> >   drivers/gpu/drm/nouveau/nouveau_ttm.c         |  67 +++++++++
> >   drivers/gpu/drm/qxl/qxl_drv.h                 |   3 +
> >   drivers/gpu/drm/qxl/qxl_ttm.c                 |  61 ++++++++
> >   drivers/gpu/drm/radeon/radeon.h               |   3 +
> >   drivers/gpu/drm/radeon/radeon_ttm.c           |  65 +++++++++
> >   drivers/gpu/drm/ttm/ttm_bo.c                  |  67 +++------
> >   drivers/gpu/drm/ttm/ttm_memory.c              |  14 +-
> >   drivers/gpu/drm/virtio/virtgpu_drv.h          |   3 +
> >   drivers/gpu/drm/virtio/virtgpu_ttm.c          |  62 ++++++++
> >   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |  15 +-
> >   drivers/gpu/drm/vmwgfx/vmwgfx_drv.h           |   6 +-
> >   drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c      |  54 +++++++
> >   drivers/staging/vboxvideo/vbox_drv.h          |   2 +
> >   drivers/staging/vboxvideo/vbox_ttm.c          |  65 ++++++++-
> >   include/drm/drmP.h                            |   1 +
> >   include/drm/drm_global.h                      |  53 +++++++
> >   include/drm/ttm/ttm_bo_driver.h               |  54 ++++++-
> >   include/drm/ttm/ttm_memory.h                  |   4 +-
> >   34 files changed, 1102 insertions(+), 65 deletions(-)
> >   create mode 100644 drivers/gpu/drm/drm_global.c
> >   create mode 100644 include/drm/drm_global.h
> >
>
>


More information about the dri-devel mailing list