TTM refcount problem.

Christian König ckoenig.leichtzumerken at gmail.com
Wed Oct 16 10:58:06 UTC 2019


Am 16.10.19 um 12:09 schrieb Bas Nieuwenhuizen:
> On Mon, Jul 29, 2019 at 11:32 AM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>>> Is this a known issue?
>> No, that looks like a new one to me.
>>
>> Is that somehow reproducible?
> I tried finding a reliable reproducer (only Vulkan CTS runs uncommonly
> caught it), but could not find anything better.
>
> However this issue seems to be fixed with one of the following patches
> from drm-misc-fixes:
>
> "drm/ttm: fix handling in ttm_bo_add_mem_to_lru"
> "drm/ttm: fix busy reference in ttm_mem_evict_first"
>
> I haven't seen the issue in 100 CTS runs.

Thanks for the information.

I'm currently completely reworking the handling and trying to get rid of 
all the reference dropping which just results in a BUG().

Issues like that one will then hopefully completely disappear.

Regards,
Christian.

>
> Thanks,
> Bas
>
>> Christian.
>>
>> Am 29.07.19 um 10:14 schrieb Bas Nieuwenhuizen:
>>> Hi all,
>>>
>>> I have a TTM refcount issue:
>>>
>>> [173774.309968] ------------[ cut here ]------------
>>> [173774.309970] kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:202!
>>> [173774.309982] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>> [173774.309985] CPU: 13 PID: 128214 Comm: kworker/13:2 Not tainted
>>> 5.2.0-rc1-g3f2e519b0974 #10
>>> [173774.309986] Hardware name: To Be Filled By O.E.M. To Be Filled By
>>> O.E.M./X399 Taichi, BIOS P1.50 09/05/2017
>>> [173774.309995] Workqueue: events ttm_bo_delayed_workqueue [ttm]
>>> [173774.310000] RIP: 0010:ttm_bo_ref_bug+0x5/0x10 [ttm]
>>> [173774.310002] Code: c0 c3 b8 01 00 00 00 c3 66 66 2e 0f 1f 84 00 00
>>> 00 00 00 66 90 0f 1f 44 00 00 f0 ff 8f a4 00 00 00 c3 0f 1f 00 0f 1f
>>> 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 8b 07
>>> 48 89
>>> [173774.310003] RSP: 0018:ffffb42e5589bde8 EFLAGS: 00010246
>>> [173774.310005] RAX: ffffb42e5589be40 RBX: ffff9395fd0cd908 RCX:
>>> ffff9395fd0cd8f8
>>> [173774.310006] RDX: ffffb42e5589be40 RSI: ffff939b59b64f18 RDI:
>>> ffff9395fd0cd87c
>>> [173774.310007] RBP: ffffffffc0930f40 R08: 0000000000140000 R09:
>>> ffffffffc091f100
>>> [173774.310008] R10: ffff9399f69b0800 R11: 0000000000000001 R12:
>>> 0000000000000000
>>> [173774.310009] R13: ffff9395fd0cd850 R14: 0000000000000001 R15:
>>> 0000000000000001
>>> [173774.310010] FS:  0000000000000000(0000) GS:ffff939b7d340000(0000)
>>> knlGS:0000000000000000
>>> [173774.310011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [173774.310012] CR2: 00007f4f64008838 CR3: 0000000643baa000 CR4:
>>> 00000000003406e0
>>> [173774.310013] Call Trace:
>>> [173774.310019]  ttm_bo_cleanup_refs+0x160/0x1e0 [ttm]
>>> [173774.310025]  ttm_bo_delayed_delete+0xa8/0x1e0 [ttm]
>>> [173774.310029]  ttm_bo_delayed_workqueue+0x17/0x40 [ttm]
>>> [173774.310033]  process_one_work+0x1fd/0x430
>>> [173774.310036]  worker_thread+0x2d/0x3d0
>>> [173774.310038]  ? process_one_work+0x430/0x430
>>> [173774.310040]  kthread+0x112/0x130
>>> [173774.310042]  ? kthread_create_on_node+0x60/0x60
>>> [173774.310045]  ret_from_fork+0x22/0x40
>>> [173774.310048] Modules linked in: fuse nct6775 hwmon_vid
>>> nls_iso8859_1 nls_cp437 vfat fat edac_mce_amd kvm_amd kvm irqbypass
>>> amdgpu arc4 iwlmvm mac80211 snd_usb_audio uvcvideo snd_usbmidi_lib
>>> videobuf2_vmalloc crct10dif_pclmul videobuf2_memops
>>> snd_hda_codec_realtek videobuf2_v4l2 btusb gpu_sched snd_rawmidi
>>> videobuf2_common snd_hda_codec_generic btrtl videodev crc32_pclmul
>>> btbcm snd_seq_device ledtrig_audio ttm btintel ghash_clmulni_intel
>>> wmi_bmof mxm_wmi snd_hda_codec_hdmi media bluetooth drm_kms_helper
>>> iwlwifi snd_hda_intel drm aesni_intel snd_hda_codec joydev input_leds
>>> aes_x86_64 snd_hda_core mousedev evdev crypto_simd cryptd ecdh_generic
>>> led_class agpgart snd_hwdep mac_hid cdc_acm glue_helper ecc snd_pcm
>>> igb syscopyarea pcspkr cfg80211 sysfillrect snd_timer sysimgblt snd
>>> fb_sys_fops ccp ptp soundcore pps_core rng_core k10temp i2c_algo_bit
>>> sp5100_tco dca i2c_piix4 rfkill wmi pcc_cpufreq button acpi_cpufreq
>>> sch_fq_codel ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2
>>> sd_mod
>>> [173774.310085]  hid_generic usbhid hid crc32c_intel ahci xhci_pci
>>> libahci xhci_hcd libata usbcore scsi_mod usb_common
>>> [173774.310094] ---[ end trace 1f8d21980c0b3fd5 ]---
>>> [173774.310097] RIP: 0010:ttm_bo_ref_bug+0x5/0x10 [ttm]
>>> [173774.310099] Code: c0 c3 b8 01 00 00 00 c3 66 66 2e 0f 1f 84 00 00
>>> 00 00 00 66 90 0f 1f 44 00 00 f0 ff 8f a4 00 00 00 c3 0f 1f 00 0f 1f
>>> 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 8b 07
>>> 48 89
>>> [173774.310100] RSP: 0018:ffffb42e5589bde8 EFLAGS: 00010246
>>> [173774.310101] RAX: ffffb42e5589be40 RBX: ffff9395fd0cd908 RCX:
>>> ffff9395fd0cd8f8
>>> [173774.310102] RDX: ffffb42e5589be40 RSI: ffff939b59b64f18 RDI:
>>> ffff9395fd0cd87c
>>> [173774.310103] RBP: ffffffffc0930f40 R08: 0000000000140000 R09:
>>> ffffffffc091f100
>>> [173774.310104] R10: ffff9399f69b0800 R11: 0000000000000001 R12:
>>> 0000000000000000
>>> [173774.310104] R13: ffff9395fd0cd850 R14: 0000000000000001 R15:
>>> 0000000000000001
>>> [173774.310106] FS:  0000000000000000(0000) GS:ffff939b7d340000(0000)
>>> knlGS:0000000000000000
>>> [173774.310107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [173774.310107] CR2: 00007f4f64008838 CR3: 0000000643baa000 CR4:
>>> 00000000003406e0
>>> [173774.310110] note: kworker/13:2[128214] exited with preempt_count 1
>>>
>>>
>>> With amd-staging-drm-next:
>>>
>>> commit 20d6b9c3b7f40ec427af912d140f2be0de098d2d (origin/amd-staging-drm-next)
>>> Author: Gustavo A. R. Silva <gustavo at embeddedor.com>
>>> Date:   Mon Jul 22 12:47:16 2019 -0500
>>>
>>>       drm/amdkfd/kfd_mqd_manager_v10: Avoid fall-through warning
>>>
>>> with a Vega10.
>>>
>>> Is this a known issue?
>>>
>>> Thanks,
>>> Bas
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list