[PATCH libdrm 0/4] Dynamicly disable suites and tets.
Andrey Grodzovsky
Andrey.Grodzovsky at amd.com
Fri Nov 10 23:43:44 UTC 2017
On 11/10/2017 10:48 AM, Christian König wrote:
> Am 10.11.2017 um 16:36 schrieb Andrey Grodzovsky:
>>
>>
>> On 11/10/2017 07:17 AM, Christian König wrote:
>>> Series is Acked-by: Christian König <christian.koenig at amd.com>.
>>>
>>> Please note that I think your OOM killer test shows quite a bug we
>>> currently have in the kernel driver.
>>>
>>> A single allocation of 1TB shouldn't trigger the OOM killer, but
>>> rather be reacted immediately.
>>
>> Maybe we should add a second test which does incremental 1GB
>> allocations but still keep this tests ? With this test i get a
>> callstack as bellow + crash of the test suite
>> with general protection fault - As normal behavior I would have
>> expected just some errno returning from the amdgpu_bo_alloc which we
>> could check in the test.
>
> Yeah, totally agree. And when this works correctly we should really
> enable this test case by default as well.
>
> When I implemented scattered eviction I completely removed the check
> which limited the BO size. That was probably a bit to extreme.
>
> We still need to check the size here so that we don't create a BO
> larger than what makes sense for the domain it should be stored in.
Patch attached, tested with the DRM tester, call stack is gone but I
still get SIGSEV and tester crashes, attaching debugger shows SIGSEV
recived when the tester is still in the IOCTL -
r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_GEM_CREATE,
&args, sizeof(args));
dmesg
[ 104.608664 < 16.227791>] [drm:amdgpu_bo_do_create [amdgpu]] *ERROR*
BO size 1000000000000 > total memory in domain: 1073741824
[ 104.608911 < 0.000247>] [drm:amdgpu_bo_do_create [amdgpu]] *ERROR*
BO size 1000000000000 > total memory in domain: 3221225472
[ 104.609168 < 0.000257>] [drm:amdgpu_gem_object_create [amdgpu]]
*ERROR* Failed to allocate GEM object (1000000000000, 6, 4096, -12)
[ 104.609301 < 0.000133>] traps: lt-amdgpu_test[1142] general
protection ip:7f21c9ed6007 sp:7ffe08ae1e30 error:0 in
libdrm_amdgpu.so.1.0.0[7f21c9ed2000+b000]
Thanks,
Andrey
>
> Regards,
> Christian.
>
>>
>> Thanks,
>> Andrey
>>
>> [169053.128981 <72032.811683>] ------------[ cut here ]------------
>> [169053.129006 < 0.000025>] WARNING: CPU: 0 PID: 22883 at
>> mm/page_alloc.c:3883 __alloc_pages_slowpath+0xf03/0x14e0
>> [169053.129007 < 0.000001>] Modules linked in: amdgpu chash ttm
>> drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect
>> sysimgblt edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul
>> crc32_pclmul snd_hda_codec_realtek ghash_clmulni_intel
>> snd_hda_codec_generic pcbc snd_hda_codec_hdmi snd_hda_intel
>> aesni_intel snd_hda_codec aes_x86_64 snd_hda_core crypto_simd
>> glue_helper snd_hwdep rfkill_gpio cryptd snd_pcm snd_seq_midi
>> snd_seq_midi_event serio_raw snd_rawmidi snd_seq cdc_ether usbnet
>> snd_seq_device joydev fam15h_power k10temp r8152 snd_timer mii
>> i2c_piix4 rtsx_pci_ms snd memstick soundcore shpchp 8250_dw
>> i2c_designware_platform i2c_designware_core mac_hid binfmt_misc nfsd
>> auth_rpcgss nfs_acl lockd grace sunrpc parport_pc ppdev lp parport
>> autofs4 rtsx_pci_sdmmc psmouse rtsx_pci sdhci_pci ahci sdhci libahci
>> [169053.129084 < 0.000077>] video i2c_hid hid_generic usbhid hid
>> [169053.129096 < 0.000012>] CPU: 0 PID: 22883 Comm: lt-amdgpu_test
>> Tainted: G W 4.14.0-rc3+ #1
>> [169053.129097 < 0.000001>] Hardware name: AMD Gardenia/Gardenia,
>> BIOS RGA1101C 07/20/2015
>> [169053.129099 < 0.000002>] task: ffff880048803d80 task.stack:
>> ffff880064688000
>> [169053.129103 < 0.000004>] RIP:
>> 0010:__alloc_pages_slowpath+0xf03/0x14e0
>> [169053.129105 < 0.000002>] RSP: 0018:ffff88006468f108 EFLAGS:
>> 00010246
>> [169053.129108 < 0.000003>] RAX: 0000000000000000 RBX:
>> 00000000014000c0 RCX: ffffffff81279065
>> [169053.129109 < 0.000001>] RDX: dffffc0000000000 RSI:
>> 000000000000000f RDI: ffffffff82609000
>> [169053.129111 < 0.000002>] RBP: ffff88006468f328 R08:
>> 0000000000000000 R09: ffffffffffff8576
>> [169053.129113 < 0.000002>] R10: 000000005c2044e7 R11:
>> 0000000000000000 R12: ffff88006468f3d8
>> [169053.129114 < 0.000001>] R13: ffff880048803d80 R14:
>> 000000000140c0c0 R15: 000000000000000f
>> [169053.129117 < 0.000003>] FS: 00007f707863b700(0000)
>> GS:ffff88006ce00000(0000) knlGS:0000000000000000
>> [169053.129119 < 0.000002>] CS: 0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> [169053.129120 < 0.000001>] CR2: 0000000001250000 CR3:
>> 00000000644cf000 CR4: 00000000001406f0
>> [169053.129122 < 0.000002>] Call Trace:
>> [169053.129131 < 0.000009>] ? __module_address+0x145/0x190
>> [169053.129135 < 0.000004>] ? is_bpf_text_address+0xe/0x20
>> [169053.129140 < 0.000005>] ? __kernel_text_address+0x12/0x40
>> [169053.129144 < 0.000004>] ? unwind_get_return_address+0x36/0x50
>> [169053.129150 < 0.000006>] ? memcmp+0x5b/0x90
>> [169053.129152 < 0.000002>] ? warn_alloc+0x250/0x250
>> [169053.129156 < 0.000004>] ? get_page_from_freelist+0x147/0x10f0
>> [169053.129160 < 0.000004>] ? save_stack_trace+0x1b/0x20
>> [169053.129164 < 0.000004>] ? kasan_kmalloc+0xad/0xe0
>> [169053.129186 < 0.000022>] ? ttm_bo_mem_space+0x79/0x6b0 [ttm]
>> [169053.129196 < 0.000010>] ? ttm_bo_validate+0x178/0x220 [ttm]
>> [169053.129200 < 0.000004>] __alloc_pages_nodemask+0x3c4/0x400
>> [169053.129203 < 0.000003>] ? __alloc_pages_slowpath+0x14e0/0x14e0
>> [169053.129205 < 0.000002>] ? __save_stack_trace+0x66/0xd0
>> [169053.129209 < 0.000004>] ? rb_insert_color+0x32/0x3e0
>> [169053.129213 < 0.000004>] ? do_syscall_64+0xea/0x280
>> [169053.129217 < 0.000004>] alloc_pages_current+0x75/0x110
>> [169053.129221 < 0.000004>] kmalloc_order+0x1f/0x80
>> [169053.129223 < 0.000002>] kmalloc_order_trace+0x24/0xa0
>> [169053.129226 < 0.000003>] __kmalloc+0x264/0x280
>> [169053.129383 < 0.000157>] amdgpu_vram_mgr_new+0x11b/0x3b0 [amdgpu]
>> [169053.129391 < 0.000008>] ?
>> reservation_object_reserve_shared+0x64/0xf0
>> [169053.129401 < 0.000010>] ttm_bo_mem_space+0x196/0x6b0 [ttm]
>> [169053.129478 < 0.000077>] ? add_hole+0x20a/0x220 [drm]
>> [169053.129489 < 0.000011>] ttm_bo_validate+0x178/0x220 [ttm]
>> [169053.129498 < 0.000009>] ? ttm_bo_evict_mm+0x70/0x70 [ttm]
>> [169053.129508 < 0.000010>] ? ttm_check_swapping+0xf6/0x110 [ttm]
>> [169053.129541 < 0.000033>] ? drm_vma_offset_add+0x5b/0x80 [drm]
>> [169053.129572 < 0.000031>] ? drm_vma_offset_add+0x68/0x80 [drm]
>> [169053.129584 < 0.000012>] ttm_bo_init_reserved+0x546/0x630 [ttm]
>> [169053.129716 < 0.000132>] amdgpu_bo_do_create+0x28b/0x630 [amdgpu]
>> [169053.129816 < 0.000100>] ? amdgpu_fill_buffer+0x580/0x580
>> [amdgpu]
>> [169053.129952 < 0.000136>] ?
>> amdgpu_ttm_placement_from_domain+0x320/0x320 [amdgpu]
>> [169053.129956 < 0.000004>] ? try_to_wake_up+0xbe/0x720
>> [169053.130054 < 0.000098>] amdgpu_bo_create+0x85/0x400 [amdgpu]
>> [169053.130153 < 0.000099>] ? amdgpu_bo_do_create+0x630/0x630
>> [amdgpu]
>> [169053.130155 < 0.000002>] ? wake_up_process+0x15/0x20
>> [169053.130158 < 0.000003>] ? insert_work+0xf3/0x110
>> [169053.130257 < 0.000099>] amdgpu_gem_object_create+0x101/0x190
>> [amdgpu]
>> [169053.130356 < 0.000099>] ? amdgpu_gem_object_free+0xe0/0xe0
>> [amdgpu]
>> [169053.130360 < 0.000004>] ?
>> tty_insert_flip_string_fixed_flag+0xab/0x110
>> [169053.130468 < 0.000108>] amdgpu_gem_create_ioctl+0x364/0x460
>> [amdgpu]
>> [169053.130695 < 0.000227>] ? amdgpu_gem_object_close+0x320/0x320
>> [amdgpu]
>> [169053.130767 < 0.000072>] ? drm_dev_printk+0x120/0x120 [drm]
>> [169053.130840 < 0.000073>] ? __wake_up_common_lock+0xe9/0x170
>> [169053.130989 < 0.000149>] ? amdgpu_gem_object_close+0x320/0x320
>> [amdgpu]
>> [169053.131061 < 0.000072>] drm_ioctl_kernel+0xae/0xf0 [drm]
>> [169053.131115 < 0.000054>] drm_ioctl+0x466/0x520 [drm]
>> [169053.131238 < 0.000123>] ? amdgpu_gem_object_close+0x320/0x320
>> [amdgpu]
>> [169053.131291 < 0.000053>] ? drm_getunique+0xf0/0xf0 [drm]
>> [169053.131426 < 0.000135>] amdgpu_drm_ioctl+0x78/0xd0 [amdgpu]
>> [169053.131451 < 0.000025>] do_vfs_ioctl+0x12e/0x860
>> [169053.131466 < 0.000015>] ? apparmor_file_permission+0x1a/0x20
>> [169053.131489 < 0.000023>] ? ioctl_preallocate+0x130/0x130
>> [169053.131503 < 0.000014>] ? rw_verify_area+0x78/0x140
>> [169053.131520 < 0.000017>] ? vfs_write+0x1a2/0x260
>> [169053.131544 < 0.000024>] ? syscall_trace_enter+0x1fd/0x520
>> [169053.131568 < 0.000024>] ? sched_clock+0x9/0x10
>> [169053.131584 < 0.000016>] ? exit_to_usermode_loop+0xc0/0xc0
>> [169053.131607 < 0.000023>] ? __fget_light+0xa7/0xc0
>> [169053.131631 < 0.000024>] SyS_ioctl+0x79/0x90
>> [169053.131651 < 0.000020>] ?
>> __context_tracking_exit.part.4+0x53/0xc0
>> [169053.131672 < 0.000021>] ? do_vfs_ioctl+0x860/0x860
>> [169053.131683 < 0.000011>] do_syscall_64+0xea/0x280
>> [169053.131708 < 0.000025>] entry_SYSCALL64_slow_path+0x25/0x25
>> [169053.131720 < 0.000012>] RIP: 0033:0x7f70778eef07
>> [169053.131740 < 0.000020>] RSP: 002b:00007ffc509d13d8 EFLAGS:
>> 00000202 ORIG_RAX: 0000000000000010
>> [169053.131756 < 0.000016>] RAX: ffffffffffffffda RBX:
>> 000000000000001e RCX: 00007f70778eef07
>> [169053.131778 < 0.000022>] RDX: 00007ffc509d1490 RSI:
>> 00000000c0206440 RDI: 0000000000000004
>> [169053.131798 < 0.000020>] RBP: 00007ffc509d1410 R08:
>> 000000000124c660 R09: 0000000000000000
>> [169053.131815 < 0.000017>] R10: 000000000000006e R11:
>> 0000000000000202 R12: 000000000124b530
>> [169053.131835 < 0.000020>] R13: 00007ffc509d1800 R14:
>> 0000000000000000 R15: 0000000000000000
>> [169053.131854 < 0.000019>] Code: 89 85 c8 fe ff ff e9 5d fc ff ff
>> 8d 42 ff 45 31 f6 c6 85 d0 fe ff ff 01 89 85 c8 fe ff ff e9 45 fc ff
>> ff 41 89 c5 e9 10 fc ff ff <0f> ff e9 ba f1 ff ff 0f ff 89 d8 25 ff
>> ff f7 ff 89 85 8c fe ff
>> [169053.131933 < 0.000079>] ---[ end trace 8253dc1e92579724 ]---
>> [169053.132622 < 0.000689>] [drm:amdgpu_gem_object_create
>> [amdgpu]] *ERROR* Failed to allocate GEM object (1000000000000, 6,
>> 4096, -12)
>> [169053.132877 < 0.000255>] traps: lt-amdgpu_test[22883] general
>> protection ip:7f7077ff6007 sp:7ffc509d13e0 error:0 in
>> libdrm_amdgpu.so.1.0.0[7f7077ff2000+b000]
>>
>>
>>>
>>> Instead I expected that we need to do multiple 1GB allocations to
>>> trigger the next problem that our TTM code doesn't imply a global
>>> limit.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 10.11.2017 um 05:29 schrieb Andrey Grodzovsky:
>>>> THe following patch series intoroduce dynamic tests
>>>> dusabling/enabling
>>>> in amdgpu tester using Cunit API. Today test suits that
>>>> don't apply to specific HW just return success w/o executing while
>>>> single tests that can't be executed properly are commented out.
>>>>
>>>> Suits are diasbled based on hooks they provide (e.g incompatible
>>>> ASIC or missing blocks) while single tests are diasbled explicitly
>>>> since this is
>>>> usually due to some bug preventing from the tester or the system
>>>> to handle
>>>> the test w/o crashing or killing the tester.
>>>>
>>>> Inside this series also a minor cleanup and new test for memory
>>>> over allocation.
>>>>
>>>> Andrey Grodzovsky (4):
>>>> amdgpu: Add functions to disable suites and tests.
>>>> amdgpu: Use new suite/test disabling functionality.
>>>> amdgpu: Move memory alloc tests in bo suite.
>>>> amdgpu: Add memory over allocation test.
>>>>
>>>> tests/amdgpu/amdgpu_test.c | 169
>>>> +++++++++++++++++++++++++++++++++++++-----
>>>> tests/amdgpu/amdgpu_test.h | 46 ++++++++++++
>>>> tests/amdgpu/basic_tests.c | 49 ------------
>>>> tests/amdgpu/bo_tests.c | 69 +++++++++++++++++
>>>> tests/amdgpu/deadlock_tests.c | 8 +-
>>>> tests/amdgpu/uvd_enc_tests.c | 81 ++++++++------------
>>>> tests/amdgpu/vce_tests.c | 65 ++++++++--------
>>>> tests/amdgpu/vcn_tests.c | 74 ++++++++----------
>>>> 8 files changed, 363 insertions(+), 198 deletions(-)
>>>>
>>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-amdgpu-Implement-BO-size-validation.patch
Type: text/x-patch
Size: 2140 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20171110/f9be8f01/attachment.bin>
More information about the amd-gfx
mailing list