TTM refcount problem.

Bas Nieuwenhuizen bas at basnieuwenhuizen.nl
Wed Oct 16 10:09:49 UTC 2019


On Mon, Jul 29, 2019 at 11:32 AM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> > Is this a known issue?
> No, that looks like a new one to me.
>
> Is that somehow reproducible?

I tried finding a reliable reproducer (only Vulkan CTS runs uncommonly
caught it), but could not find anything better.

However this issue seems to be fixed with one of the following patches
from drm-misc-fixes:

"drm/ttm: fix handling in ttm_bo_add_mem_to_lru"
"drm/ttm: fix busy reference in ttm_mem_evict_first"

I haven't seen the issue in 100 CTS runs.

Thanks,
Bas

>
> Christian.
>
> Am 29.07.19 um 10:14 schrieb Bas Nieuwenhuizen:
> > Hi all,
> >
> > I have a TTM refcount issue:
> >
> > [173774.309968] ------------[ cut here ]------------
> > [173774.309970] kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:202!
> > [173774.309982] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > [173774.309985] CPU: 13 PID: 128214 Comm: kworker/13:2 Not tainted
> > 5.2.0-rc1-g3f2e519b0974 #10
> > [173774.309986] Hardware name: To Be Filled By O.E.M. To Be Filled By
> > O.E.M./X399 Taichi, BIOS P1.50 09/05/2017
> > [173774.309995] Workqueue: events ttm_bo_delayed_workqueue [ttm]
> > [173774.310000] RIP: 0010:ttm_bo_ref_bug+0x5/0x10 [ttm]
> > [173774.310002] Code: c0 c3 b8 01 00 00 00 c3 66 66 2e 0f 1f 84 00 00
> > 00 00 00 66 90 0f 1f 44 00 00 f0 ff 8f a4 00 00 00 c3 0f 1f 00 0f 1f
> > 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 8b 07
> > 48 89
> > [173774.310003] RSP: 0018:ffffb42e5589bde8 EFLAGS: 00010246
> > [173774.310005] RAX: ffffb42e5589be40 RBX: ffff9395fd0cd908 RCX:
> > ffff9395fd0cd8f8
> > [173774.310006] RDX: ffffb42e5589be40 RSI: ffff939b59b64f18 RDI:
> > ffff9395fd0cd87c
> > [173774.310007] RBP: ffffffffc0930f40 R08: 0000000000140000 R09:
> > ffffffffc091f100
> > [173774.310008] R10: ffff9399f69b0800 R11: 0000000000000001 R12:
> > 0000000000000000
> > [173774.310009] R13: ffff9395fd0cd850 R14: 0000000000000001 R15:
> > 0000000000000001
> > [173774.310010] FS:  0000000000000000(0000) GS:ffff939b7d340000(0000)
> > knlGS:0000000000000000
> > [173774.310011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [173774.310012] CR2: 00007f4f64008838 CR3: 0000000643baa000 CR4:
> > 00000000003406e0
> > [173774.310013] Call Trace:
> > [173774.310019]  ttm_bo_cleanup_refs+0x160/0x1e0 [ttm]
> > [173774.310025]  ttm_bo_delayed_delete+0xa8/0x1e0 [ttm]
> > [173774.310029]  ttm_bo_delayed_workqueue+0x17/0x40 [ttm]
> > [173774.310033]  process_one_work+0x1fd/0x430
> > [173774.310036]  worker_thread+0x2d/0x3d0
> > [173774.310038]  ? process_one_work+0x430/0x430
> > [173774.310040]  kthread+0x112/0x130
> > [173774.310042]  ? kthread_create_on_node+0x60/0x60
> > [173774.310045]  ret_from_fork+0x22/0x40
> > [173774.310048] Modules linked in: fuse nct6775 hwmon_vid
> > nls_iso8859_1 nls_cp437 vfat fat edac_mce_amd kvm_amd kvm irqbypass
> > amdgpu arc4 iwlmvm mac80211 snd_usb_audio uvcvideo snd_usbmidi_lib
> > videobuf2_vmalloc crct10dif_pclmul videobuf2_memops
> > snd_hda_codec_realtek videobuf2_v4l2 btusb gpu_sched snd_rawmidi
> > videobuf2_common snd_hda_codec_generic btrtl videodev crc32_pclmul
> > btbcm snd_seq_device ledtrig_audio ttm btintel ghash_clmulni_intel
> > wmi_bmof mxm_wmi snd_hda_codec_hdmi media bluetooth drm_kms_helper
> > iwlwifi snd_hda_intel drm aesni_intel snd_hda_codec joydev input_leds
> > aes_x86_64 snd_hda_core mousedev evdev crypto_simd cryptd ecdh_generic
> > led_class agpgart snd_hwdep mac_hid cdc_acm glue_helper ecc snd_pcm
> > igb syscopyarea pcspkr cfg80211 sysfillrect snd_timer sysimgblt snd
> > fb_sys_fops ccp ptp soundcore pps_core rng_core k10temp i2c_algo_bit
> > sp5100_tco dca i2c_piix4 rfkill wmi pcc_cpufreq button acpi_cpufreq
> > sch_fq_codel ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2
> > sd_mod
> > [173774.310085]  hid_generic usbhid hid crc32c_intel ahci xhci_pci
> > libahci xhci_hcd libata usbcore scsi_mod usb_common
> > [173774.310094] ---[ end trace 1f8d21980c0b3fd5 ]---
> > [173774.310097] RIP: 0010:ttm_bo_ref_bug+0x5/0x10 [ttm]
> > [173774.310099] Code: c0 c3 b8 01 00 00 00 c3 66 66 2e 0f 1f 84 00 00
> > 00 00 00 66 90 0f 1f 44 00 00 f0 ff 8f a4 00 00 00 c3 0f 1f 00 0f 1f
> > 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 8b 07
> > 48 89
> > [173774.310100] RSP: 0018:ffffb42e5589bde8 EFLAGS: 00010246
> > [173774.310101] RAX: ffffb42e5589be40 RBX: ffff9395fd0cd908 RCX:
> > ffff9395fd0cd8f8
> > [173774.310102] RDX: ffffb42e5589be40 RSI: ffff939b59b64f18 RDI:
> > ffff9395fd0cd87c
> > [173774.310103] RBP: ffffffffc0930f40 R08: 0000000000140000 R09:
> > ffffffffc091f100
> > [173774.310104] R10: ffff9399f69b0800 R11: 0000000000000001 R12:
> > 0000000000000000
> > [173774.310104] R13: ffff9395fd0cd850 R14: 0000000000000001 R15:
> > 0000000000000001
> > [173774.310106] FS:  0000000000000000(0000) GS:ffff939b7d340000(0000)
> > knlGS:0000000000000000
> > [173774.310107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [173774.310107] CR2: 00007f4f64008838 CR3: 0000000643baa000 CR4:
> > 00000000003406e0
> > [173774.310110] note: kworker/13:2[128214] exited with preempt_count 1
> >
> >
> > With amd-staging-drm-next:
> >
> > commit 20d6b9c3b7f40ec427af912d140f2be0de098d2d (origin/amd-staging-drm-next)
> > Author: Gustavo A. R. Silva <gustavo at embeddedor.com>
> > Date:   Mon Jul 22 12:47:16 2019 -0500
> >
> >      drm/amdkfd/kfd_mqd_manager_v10: Avoid fall-through warning
> >
> > with a Vega10.
> >
> > Is this a known issue?
> >
> > Thanks,
> > Bas
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


More information about the amd-gfx mailing list