[PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini
Michel Dänzer
michel at daenzer.net
Fri Oct 13 14:34:48 UTC 2017
On 12/10/17 07:11 PM, Christian König wrote:
> Am 12.10.2017 um 18:49 schrieb Michel Dänzer:
>> On 12/10/17 01:00 PM, Michel Dänzer wrote:
>>> [0] I also got this, but I don't know yet if it's related:
>> No, that seems to be a separate issue; I can still reproduce it with the
>> huge page related changes reverted. Unfortunately, it doesn't seem to
>> happen reliably on every piglit run.
>
> Can you enable KASAN in your kernel,
KASAN caught something else at the beginning of piglit, see the attached
dmesg excerpt. Not sure it's related though.
amdgpu_job_free_cb+0x13d/0x160 decodes to:
amd_sched_get_job_priority at .../drivers/gpu/drm/amd/amdgpu/../scheduler/gpu_scheduler.h:182
static inline enum amd_sched_priority
amd_sched_get_job_priority(struct amd_sched_job *job)
{
return (job->s_entity->rq - job->sched->sched_rq); <===
}
(inlined by) amdgpu_job_free_cb at .../drivers/gpu/drm/amd/amdgpu/amdgpu_job.c:107
amdgpu_ring_priority_put(job->ring, amd_sched_get_job_priority(s_job));
> and please look up at which line number amdgpu_vm_bo_invalidate+0x88
> is.
Looks like it's this line:
if (evicted && bo->tbo.resv == vm->root.base.bo->tbo.resv) {
Maybe vm or vm->root.base.bo is NULL?
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
-------------- next part --------------
[ 89.594368] ==================================================================
[ 89.594440] BUG: KASAN: use-after-free in amdgpu_job_free_cb+0x13d/0x160 [amdgpu]
[ 89.594444] Read of size 8 at addr ffff880367cc22c0 by task kworker/8:1/142
[ 89.594449] CPU: 8 PID: 142 Comm: kworker/8:1 Tainted: G W 4.13.0-rc5+ #29
[ 89.594451] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
[ 89.594516] Workqueue: events amd_sched_job_finish [amdgpu]
[ 89.594517] Call Trace:
[ 89.594522] dump_stack+0xb8/0x152
[ 89.594524] ? dma_virt_map_sg+0x1fe/0x1fe
[ 89.594527] ? show_regs_print_info+0x62/0x62
[ 89.594531] print_address_description+0x6f/0x280
[ 89.594533] kasan_report+0x27a/0x370
[ 89.594596] ? amdgpu_job_free_cb+0x13d/0x160 [amdgpu]
[ 89.594599] __asan_report_load8_noabort+0x19/0x20
[ 89.594662] amdgpu_job_free_cb+0x13d/0x160 [amdgpu]
[ 89.594726] amd_sched_job_finish+0x36e/0x630 [amdgpu]
[ 89.594790] ? trace_event_raw_event_amd_sched_process_job+0x180/0x180 [amdgpu]
[ 89.594792] ? pick_next_task_fair+0x435/0x15c0
[ 89.594795] ? pwq_dec_nr_in_flight+0x1c2/0x4d0
[ 89.594797] ? cpu_load_update_active+0x330/0x330
[ 89.594800] ? __switch_to+0x685/0xda0
[ 89.594801] ? load_balance+0x3490/0x3490
[ 89.594803] process_one_work+0x8a5/0x1a30
[ 89.594805] ? wq_worker_sleeping+0x86/0x310
[ 89.594808] ? create_worker+0x590/0x590
[ 89.594810] ? __schedule+0x83b/0x1c80
[ 89.594813] ? schedule+0x10e/0x450
[ 89.594815] ? __schedule+0x1c80/0x1c80
[ 89.594817] ? alloc_worker+0x360/0x360
[ 89.594819] ? update_stack_state+0x402/0x780
[ 89.594820] ? update_stack_state+0x402/0x780
[ 89.594822] ? tsc_resume+0x10/0x10
[ 89.594824] worker_thread+0x21f/0x1920
[ 89.594825] ? sched_clock+0x9/0x10
[ 89.594826] ? sched_clock+0x9/0x10
[ 89.594828] ? sched_clock_local+0x43/0x130
[ 89.594831] ? process_one_work+0x1a30/0x1a30
[ 89.594832] ? pick_next_task_fair+0xcd3/0x15c0
[ 89.594833] ? cpu_load_update_active+0x330/0x330
[ 89.594835] ? __switch_to+0x685/0xda0
[ 89.594836] ? load_balance+0x3490/0x3490
[ 89.594838] ? compat_start_thread+0x80/0x80
[ 89.594839] ? sched_clock+0x9/0x10
[ 89.594840] ? sched_clock_local+0x43/0x130
[ 89.594843] ? set_rq_online.part.79+0x130/0x130
[ 89.594844] ? put_prev_entity+0x4e/0x370
[ 89.594846] ? __schedule+0x83b/0x1c80
[ 89.594847] ? kasan_kmalloc+0xad/0xe0
[ 89.594849] ? kmem_cache_alloc_trace+0xe9/0x1f0
[ 89.594851] ? firmware_map_remove+0x80/0x80
[ 89.594852] ? migrate_swap_stop+0x660/0x660
[ 89.594855] ? __schedule+0x1c80/0x1c80
[ 89.594856] ? default_wake_function+0x35/0x50
[ 89.594858] ? __wake_up_common+0xb9/0x150
[ 89.594859] ? print_dl_stats+0x80/0x80
[ 89.594861] kthread+0x310/0x3d0
[ 89.594863] ? process_one_work+0x1a30/0x1a30
[ 89.594864] ? kthread_create_on_node+0xc0/0xc0
[ 89.594866] ret_from_fork+0x25/0x30
[ 89.594869] Allocated by task 1701:
[ 89.594872] save_stack_trace+0x1b/0x20
[ 89.594873] save_stack+0x43/0xd0
[ 89.594874] kasan_kmalloc+0xad/0xe0
[ 89.594875] kmem_cache_alloc_trace+0xe9/0x1f0
[ 89.594913] amdgpu_driver_open_kms+0xec/0x3f0 [amdgpu]
[ 89.594925] drm_open+0x7ea/0x13a0 [drm]
[ 89.594936] drm_stub_open+0x2a7/0x420 [drm]
[ 89.594939] chrdev_open+0x24d/0x6f0
[ 89.594940] do_dentry_open+0x5b1/0xd30
[ 89.594941] vfs_open+0xf1/0x260
[ 89.594942] path_openat+0x130a/0x5240
[ 89.594944] do_filp_open+0x23e/0x3c0
[ 89.594945] do_sys_open+0x47a/0x800
[ 89.594946] SyS_open+0x1e/0x20
[ 89.594947] entry_SYSCALL_64_fastpath+0x1e/0xa9
[ 89.594949] Freed by task 1737:
[ 89.594951] save_stack_trace+0x1b/0x20
[ 89.594952] save_stack+0x43/0xd0
[ 89.594953] kasan_slab_free+0x72/0xc0
[ 89.594954] kfree+0x94/0x1a0
[ 89.594992] amdgpu_driver_postclose_kms+0x495/0x830 [amdgpu]
[ 89.595002] drm_release+0x9bf/0x1350 [drm]
[ 89.595004] __fput+0x306/0x900
[ 89.595005] ____fput+0xe/0x10
[ 89.595006] task_work_run+0x14d/0x230
[ 89.595008] exit_to_usermode_loop+0x1f5/0x230
[ 89.595009] syscall_return_slowpath+0x1d8/0x240
[ 89.595011] entry_SYSCALL_64_fastpath+0xa7/0xa9
[ 89.595013] The buggy address belongs to the object at ffff880367cc2200
which belongs to the cache kmalloc-2048 of size 2048
[ 89.595015] The buggy address is located 192 bytes inside of
2048-byte region [ffff880367cc2200, ffff880367cc2a00)
[ 89.595017] The buggy address belongs to the page:
[ 89.595019] page:ffffea000d9f3000 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
[ 89.595022] flags: 0x17fffc000008100(slab|head)
[ 89.595026] raw: 017fffc000008100 0000000000000000 0000000000000000 00000001800f000f
[ 89.595028] raw: dead000000000100 dead000000000200 ffff88038e00ea00 0000000000000000
[ 89.595029] page dumped because: kasan: bad access detected
[ 89.595031] Memory state around the buggy address:
[ 89.595032] ffff880367cc2180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 89.595034] ffff880367cc2200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 89.595036] >ffff880367cc2280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 89.595037] ^
[ 89.595038] ffff880367cc2300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 89.595040] ffff880367cc2380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 89.595041] ==================================================================
[ 89.595042] Disabling lock debugging due to kernel taint
[ 239.309507] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (4294967296, 2, 4096, -12)
[ 239.340689] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (4294967296, 2, 4096, -12)
[ 323.869197] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (2147483648, 2, 4096, -12)
[ 323.869377] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (2147483648, 2, 4096, -12)
[ 349.580299] kasan: CONFIG_KASAN_INLINE enabled
[ 349.580305] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 349.580311] general protection fault: 0000 [#1] SMP KASAN
[ 349.580313] Modules linked in: snd_hda_codec_realtek snd_hda_codec_generic cpufreq_powersave cpufreq_userspace cpufreq_conservative binfmt_misc nls_ascii nls_cp437 vfat fat edac_mce_amd kvm amdkfd irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel amdgpu pcbc snd_hda_codec_hdmi efi_pstore chash ttm snd_hda_intel snd_hda_codec drm_kms_helper snd_hda_core snd_hwdep wmi_bmof drm snd_pcm aesni_intel i2c_algo_bit snd_timer aes_x86_64 crypto_simd glue_helper sp5100_tco snd fb_sys_fops ccp syscopyarea r8169 ppdev sysfillrect cryptd pcspkr efivars sysimgblt i2c_piix4 mii sg rng_core mfd_core soundcore wmi parport_pc parport i2c_designware_platform i2c_designware_core button acpi_cpufreq tcp_bbr sch_fq nct6775 hwmon_vid sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto raid10
[ 349.580366] raid1 raid0 multipath linear md_mod dm_mod sd_mod evdev hid_generic usbhid hid ahci crc32c_intel libahci xhci_pci libata xhci_hcd usbcore scsi_mod shpchp gpio_amdpt gpio_generic
[ 349.580385] CPU: 5 PID: 529 Comm: max-texture-siz Tainted: G B W 4.13.0-rc5+ #29
[ 349.580388] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
[ 349.580392] task: ffff88036d258000 task.stack: ffff880274260000
[ 349.580466] RIP: 0010:amdgpu_vm_bo_invalidate+0x277/0xd60 [amdgpu]
[ 349.580469] RSP: 0018:ffff880274266810 EFLAGS: 00010216
[ 349.580472] RAX: 0000000000000000 RBX: ffff8801d9ac2d00 RCX: 0000000000000000
[ 349.580474] RDX: ffff880389c74408 RSI: ffff880389c73da8 RDI: 0000000000000220
[ 349.580476] RBP: ffff880274266aa8 R08: 0000000000000010 R09: 0000000000000044
[ 349.580479] R10: ffff880274266d88 R11: 0000000000000010 R12: dffffc0000000000
[ 349.580481] R13: ffff8801d9ac2d20 R14: ffff880259586600 R15: ffff880389c74400
[ 349.580484] FS: 00007ff180510300(0000) GS:ffff88038e540000(0000) knlGS:0000000000000000
[ 349.580487] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 349.580489] CR2: 00007fe4e75c1000 CR3: 000000037f1bc000 CR4: 00000000003406e0
[ 349.580491] Call Trace:
[ 349.580502] ? ttm_mem_global_alloc_zone.constprop.4+0x1c6/0x290 [ttm]
[ 349.580509] ? ttm_bo_init_reserved+0x9b6/0x1300 [ttm]
[ 349.580570] ? amdgpu_bo_do_create+0x594/0x1420 [amdgpu]
[ 349.580634] ? amdgpu_vm_bo_rmv+0xd60/0xd60 [amdgpu]
[ 349.580641] ? ttm_mem_global_alloc_page+0x9e/0xf0 [ttm]
[ 349.580649] ? ttm_pool_populate+0x46e/0xd10 [ttm]
[ 349.580655] ? kasan_kmalloc+0xad/0xe0
[ 349.580664] ? ttm_pool_unpopulate+0x290/0x290 [ttm]
[ 349.580669] ? kvmalloc_node+0x75/0x80
[ 349.580676] ? ttm_dma_tt_init+0x285/0x620 [ttm]
[ 349.580679] ? kasan_unpoison_shadow+0x35/0x50
[ 349.580686] ? ttm_tt_init+0x300/0x300 [ttm]
[ 349.580691] ? nommu_map_page+0x5c/0xa0
[ 349.580754] ? amdgpu_ttm_backend_bind+0x10f/0xc70 [amdgpu]
[ 349.580817] ? amdgpu_ttm_tt_create+0x132/0x2c0 [amdgpu]
[ 349.580879] amdgpu_bo_move_notify+0x10f/0x350 [amdgpu]
[ 349.580942] ? amdgpu_bo_get_metadata+0x200/0x200 [amdgpu]
[ 349.580950] ttm_bo_handle_move_mem+0x782/0x22f0 [ttm]
[ 349.580955] ? reservation_object_reserve_shared+0x167/0x200
[ 349.580962] ? ttm_bo_add_move_fence.isra.17+0x29/0x160 [ttm]
[ 349.580969] ? ttm_bo_mem_space+0x518/0xdf0 [ttm]
[ 349.580974] ? free_one_page+0x1560/0x1560
[ 349.580983] ttm_bo_evict+0x3ff/0xf70 [ttm]
[ 349.580990] ? ttm_bo_release_list+0x910/0x910 [ttm]
[ 349.580998] ? ttm_bo_handle_move_mem+0x22f0/0x22f0 [ttm]
[ 349.581005] ? ttm_bo_add_to_lru+0x46c/0x870 [ttm]
[ 349.581009] ? kasan_kmalloc_large+0x9c/0xd0
[ 349.581017] ? ttm_mem_global_alloc_zone.constprop.4+0x1c6/0x290 [ttm]
[ 349.581020] ? kmem_cache_alloc+0xb7/0x1c0
[ 349.581081] ? amdgpu_ttm_bo_eviction_valuable+0x199/0x2b0 [amdgpu]
[ 349.581143] ? amdgpu_vram_mgr_new+0x4e4/0x780 [amdgpu]
[ 349.581152] ttm_mem_evict_first+0x312/0x4a0 [ttm]
[ 349.581215] ? amdgpu_vram_mgr_new+0x4e4/0x780 [amdgpu]
[ 349.581222] ? ttm_bo_evict+0xf70/0xf70 [ttm]
[ 349.581230] ttm_bo_mem_space+0x84e/0xdf0 [ttm]
[ 349.581233] ? __shmem_file_setup+0x250/0x520
[ 349.581241] ttm_bo_validate+0x322/0x580 [ttm]
[ 349.581244] ? unwind_dump+0x4e0/0x4e0
[ 349.581251] ? ttm_bo_evict_mm+0xa0/0xa0 [ttm]
[ 349.581255] ? cpufreq_default_governor+0x20/0x20
[ 349.581277] ? drm_add_edid_modes+0x44a0/0x67e0 [drm]
[ 349.581286] ttm_bo_init_reserved+0x9b6/0x1300 [ttm]
[ 349.581294] ? ttm_bo_validate+0x580/0x580 [ttm]
[ 349.581298] ? dentry_path_raw+0x10/0x10
[ 349.581302] ? proc_nr_files+0x30/0x30
[ 349.581305] ? shmem_get_inode+0x668/0x8f0
[ 349.581308] ? shmem_fh_to_dentry+0x160/0x160
[ 349.581313] ? entry_SYSCALL_64_fastpath+0x1e/0xa9
[ 349.581318] ? _copy_to_user+0x90/0x90
[ 349.581321] ? alloc_file+0x16d/0x440
[ 349.581324] ? __shmem_file_setup+0x2e0/0x520
[ 349.581327] ? shmem_fill_super+0xa10/0xa10
[ 349.581344] ? drm_gem_private_object_init+0x189/0x300 [drm]
[ 349.581347] ? kasan_kmalloc+0xad/0xe0
[ 349.581412] amdgpu_bo_do_create+0x594/0x1420 [amdgpu]
[ 349.581486] ? amdgpu_fill_buffer+0xb80/0xb80 [amdgpu]
[ 349.581489] ? update_stack_state+0x402/0x780
[ 349.581559] ? amdgpu_ttm_placement_from_domain+0x8d0/0x8d0 [amdgpu]
[ 349.581565] ? show_initstate+0xb0/0xb0
[ 349.581570] ? bpf_prog_alloc+0x320/0x320
[ 349.581574] ? unwind_next_frame.part.5+0x1bb/0xc90
[ 349.581582] ? __free_insn_slot+0x6a0/0x6a0
[ 349.581585] ? unwind_dump+0x4e0/0x4e0
[ 349.581590] ? rb_erase+0x3540/0x3540
[ 349.581595] ? __mem_cgroup_threshold+0x7b0/0x7b0
[ 349.581599] ? memory_max_write+0x420/0x420
[ 349.581605] ? __kernel_text_address+0xbf/0xf0
[ 349.581608] ? unwind_get_return_address+0x66/0xb0
[ 349.581612] ? __save_stack_trace+0x92/0x100
[ 349.581685] amdgpu_bo_create+0xba/0xa00 [amdgpu]
[ 349.581759] ? amdgpu_bo_do_create+0x1420/0x1420 [amdgpu]
[ 349.581763] ? mem_cgroup_uncharge_swap+0xc0/0xc0
[ 349.581767] ? kmem_cache_alloc+0xb7/0x1c0
[ 349.581771] ? __anon_vma_prepare+0x37a/0x650
[ 349.581775] ? __handle_mm_fault+0x31ac/0x5070
[ 349.581778] ? handle_mm_fault+0x292/0x800
[ 349.581782] ? __do_page_fault+0x412/0xa00
[ 349.581785] ? do_page_fault+0x22/0x30
[ 349.581788] ? page_fault+0x28/0x30
[ 349.581792] ? memcg_oom_wake_function+0x6a0/0x6a0
[ 349.581866] amdgpu_gem_object_create+0x11f/0x240 [amdgpu]
[ 349.581942] ? amdgpu_gem_object_free+0x1d0/0x1d0 [amdgpu]
[ 349.581945] ? __alloc_pages_nodemask+0x3d8/0xe50
[ 349.582021] amdgpu_gem_create_ioctl+0x3bb/0xc10 [amdgpu]
[ 349.582097] ? amdgpu_gem_object_close+0x790/0x790 [amdgpu]
[ 349.582101] ? page_add_new_anon_rmap+0x203/0x3d0
[ 349.582105] ? __check_object_size+0x22e/0x560
[ 349.582180] ? amdgpu_gem_object_close+0x790/0x790 [amdgpu]
[ 349.582201] drm_ioctl_kernel+0x1ce/0x330 [drm]
[ 349.582221] ? drm_ioctl_permit+0x2c0/0x2c0 [drm]
[ 349.582225] ? kasan_check_write+0x14/0x20
[ 349.582246] drm_ioctl+0x79a/0xc30 [drm]
[ 349.582321] ? amdgpu_gem_object_close+0x790/0x790 [amdgpu]
[ 349.582342] ? drm_getstats+0x20/0x20 [drm]
[ 349.582347] ? do_mmap+0x641/0x10f0
[ 349.582419] amdgpu_drm_ioctl+0xd8/0x1b0 [amdgpu]
[ 349.582424] do_vfs_ioctl+0x197/0x1490
[ 349.582428] ? vm_mmap_pgoff+0x1fe/0x2c0
[ 349.582431] ? ioctl_preallocate+0x2c0/0x2c0
[ 349.582435] ? __fget_light+0x2be/0x410
[ 349.582438] ? iterate_fd+0x2e0/0x2e0
[ 349.582441] ? handle_mm_fault+0x292/0x800
[ 349.582445] ? __handle_mm_fault+0x5070/0x5070
[ 349.582449] ? __do_page_fault+0x43a/0xa00
[ 349.582453] SyS_ioctl+0x79/0x90
[ 349.582457] entry_SYSCALL_64_fastpath+0x1e/0xa9
[ 349.582461] RIP: 0033:0x7ff17d1a7dc7
[ 349.582463] RSP: 002b:00007ffe7a4edbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 349.582467] RAX: ffffffffffffffda RBX: 00007ff17d45eb00 RCX: 00007ff17d1a7dc7
[ 349.582470] RDX: 00007ffe7a4edc40 RSI: 00000000c0206440 RDI: 0000000000000006
[ 349.582472] RBP: 0000000040000010 R08: 00007ff17d45ebc8 R09: 0000000000000060
[ 349.582475] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000040001000
[ 349.582477] R13: 00007ff17d45eb58 R14: 0000000000001000 R15: 00007ff17d45eb00
[ 349.582480] Code: 49 8b b6 20 02 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 f7 05 00 00 49 8b 47 58 48 8d b8 20 02 00 00 49 89 f9 49 c1 e9 03 <43> 80 3c 21 00 0f 85 e8 08 00 00 48 3b b0 20 02 00 00 0f 84 d9
[ 349.582596] RIP: amdgpu_vm_bo_invalidate+0x277/0xd60 [amdgpu] RSP: ffff880274266810
[ 349.582600] ---[ end trace cc9c171d2cdc0539 ]---
[ 360.319061] amdgpu 0000:23:00.0: Disabling VM faults because of PRT request!
More information about the amd-gfx
mailing list