[PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

Michel Dänzer michel at daenzer.net
Fri Oct 13 14:34:48 UTC 2017


On 12/10/17 07:11 PM, Christian König wrote:
> Am 12.10.2017 um 18:49 schrieb Michel Dänzer:
>> On 12/10/17 01:00 PM, Michel Dänzer wrote:
>>> [0] I also got this, but I don't know yet if it's related:
>> No, that seems to be a separate issue; I can still reproduce it with the
>> huge page related changes reverted. Unfortunately, it doesn't seem to
>> happen reliably on every piglit run.
> 
> Can you enable KASAN in your kernel,

KASAN caught something else at the beginning of piglit, see the attached
dmesg excerpt. Not sure it's related though.

amdgpu_job_free_cb+0x13d/0x160 decodes to:

amd_sched_get_job_priority at .../drivers/gpu/drm/amd/amdgpu/../scheduler/gpu_scheduler.h:182

static inline enum amd_sched_priority
amd_sched_get_job_priority(struct amd_sched_job *job)
{
	return (job->s_entity->rq - job->sched->sched_rq); <===
}

 (inlined by) amdgpu_job_free_cb at .../drivers/gpu/drm/amd/amdgpu/amdgpu_job.c:107

	amdgpu_ring_priority_put(job->ring, amd_sched_get_job_priority(s_job));


> and please look up at which line number amdgpu_vm_bo_invalidate+0x88
> is.

Looks like it's this line:

		if (evicted && bo->tbo.resv == vm->root.base.bo->tbo.resv) {

Maybe vm or vm->root.base.bo is NULL?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
-------------- next part --------------
[   89.594368] ==================================================================
[   89.594440] BUG: KASAN: use-after-free in amdgpu_job_free_cb+0x13d/0x160 [amdgpu]
[   89.594444] Read of size 8 at addr ffff880367cc22c0 by task kworker/8:1/142

[   89.594449] CPU: 8 PID: 142 Comm: kworker/8:1 Tainted: G        W       4.13.0-rc5+ #29
[   89.594451] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
[   89.594516] Workqueue: events amd_sched_job_finish [amdgpu]
[   89.594517] Call Trace:
[   89.594522]  dump_stack+0xb8/0x152
[   89.594524]  ? dma_virt_map_sg+0x1fe/0x1fe
[   89.594527]  ? show_regs_print_info+0x62/0x62
[   89.594531]  print_address_description+0x6f/0x280
[   89.594533]  kasan_report+0x27a/0x370
[   89.594596]  ? amdgpu_job_free_cb+0x13d/0x160 [amdgpu]
[   89.594599]  __asan_report_load8_noabort+0x19/0x20
[   89.594662]  amdgpu_job_free_cb+0x13d/0x160 [amdgpu]
[   89.594726]  amd_sched_job_finish+0x36e/0x630 [amdgpu]
[   89.594790]  ? trace_event_raw_event_amd_sched_process_job+0x180/0x180 [amdgpu]
[   89.594792]  ? pick_next_task_fair+0x435/0x15c0
[   89.594795]  ? pwq_dec_nr_in_flight+0x1c2/0x4d0
[   89.594797]  ? cpu_load_update_active+0x330/0x330
[   89.594800]  ? __switch_to+0x685/0xda0
[   89.594801]  ? load_balance+0x3490/0x3490
[   89.594803]  process_one_work+0x8a5/0x1a30
[   89.594805]  ? wq_worker_sleeping+0x86/0x310
[   89.594808]  ? create_worker+0x590/0x590
[   89.594810]  ? __schedule+0x83b/0x1c80
[   89.594813]  ? schedule+0x10e/0x450
[   89.594815]  ? __schedule+0x1c80/0x1c80
[   89.594817]  ? alloc_worker+0x360/0x360
[   89.594819]  ? update_stack_state+0x402/0x780
[   89.594820]  ? update_stack_state+0x402/0x780
[   89.594822]  ? tsc_resume+0x10/0x10
[   89.594824]  worker_thread+0x21f/0x1920
[   89.594825]  ? sched_clock+0x9/0x10
[   89.594826]  ? sched_clock+0x9/0x10
[   89.594828]  ? sched_clock_local+0x43/0x130
[   89.594831]  ? process_one_work+0x1a30/0x1a30
[   89.594832]  ? pick_next_task_fair+0xcd3/0x15c0
[   89.594833]  ? cpu_load_update_active+0x330/0x330
[   89.594835]  ? __switch_to+0x685/0xda0
[   89.594836]  ? load_balance+0x3490/0x3490
[   89.594838]  ? compat_start_thread+0x80/0x80
[   89.594839]  ? sched_clock+0x9/0x10
[   89.594840]  ? sched_clock_local+0x43/0x130
[   89.594843]  ? set_rq_online.part.79+0x130/0x130
[   89.594844]  ? put_prev_entity+0x4e/0x370
[   89.594846]  ? __schedule+0x83b/0x1c80
[   89.594847]  ? kasan_kmalloc+0xad/0xe0
[   89.594849]  ? kmem_cache_alloc_trace+0xe9/0x1f0
[   89.594851]  ? firmware_map_remove+0x80/0x80
[   89.594852]  ? migrate_swap_stop+0x660/0x660
[   89.594855]  ? __schedule+0x1c80/0x1c80
[   89.594856]  ? default_wake_function+0x35/0x50
[   89.594858]  ? __wake_up_common+0xb9/0x150
[   89.594859]  ? print_dl_stats+0x80/0x80
[   89.594861]  kthread+0x310/0x3d0
[   89.594863]  ? process_one_work+0x1a30/0x1a30
[   89.594864]  ? kthread_create_on_node+0xc0/0xc0
[   89.594866]  ret_from_fork+0x25/0x30

[   89.594869] Allocated by task 1701:
[   89.594872]  save_stack_trace+0x1b/0x20
[   89.594873]  save_stack+0x43/0xd0
[   89.594874]  kasan_kmalloc+0xad/0xe0
[   89.594875]  kmem_cache_alloc_trace+0xe9/0x1f0
[   89.594913]  amdgpu_driver_open_kms+0xec/0x3f0 [amdgpu]
[   89.594925]  drm_open+0x7ea/0x13a0 [drm]
[   89.594936]  drm_stub_open+0x2a7/0x420 [drm]
[   89.594939]  chrdev_open+0x24d/0x6f0
[   89.594940]  do_dentry_open+0x5b1/0xd30
[   89.594941]  vfs_open+0xf1/0x260
[   89.594942]  path_openat+0x130a/0x5240
[   89.594944]  do_filp_open+0x23e/0x3c0
[   89.594945]  do_sys_open+0x47a/0x800
[   89.594946]  SyS_open+0x1e/0x20
[   89.594947]  entry_SYSCALL_64_fastpath+0x1e/0xa9

[   89.594949] Freed by task 1737:
[   89.594951]  save_stack_trace+0x1b/0x20
[   89.594952]  save_stack+0x43/0xd0
[   89.594953]  kasan_slab_free+0x72/0xc0
[   89.594954]  kfree+0x94/0x1a0
[   89.594992]  amdgpu_driver_postclose_kms+0x495/0x830 [amdgpu]
[   89.595002]  drm_release+0x9bf/0x1350 [drm]
[   89.595004]  __fput+0x306/0x900
[   89.595005]  ____fput+0xe/0x10
[   89.595006]  task_work_run+0x14d/0x230
[   89.595008]  exit_to_usermode_loop+0x1f5/0x230
[   89.595009]  syscall_return_slowpath+0x1d8/0x240
[   89.595011]  entry_SYSCALL_64_fastpath+0xa7/0xa9

[   89.595013] The buggy address belongs to the object at ffff880367cc2200
                which belongs to the cache kmalloc-2048 of size 2048
[   89.595015] The buggy address is located 192 bytes inside of
                2048-byte region [ffff880367cc2200, ffff880367cc2a00)
[   89.595017] The buggy address belongs to the page:
[   89.595019] page:ffffea000d9f3000 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[   89.595022] flags: 0x17fffc000008100(slab|head)
[   89.595026] raw: 017fffc000008100 0000000000000000 0000000000000000 00000001800f000f
[   89.595028] raw: dead000000000100 dead000000000200 ffff88038e00ea00 0000000000000000
[   89.595029] page dumped because: kasan: bad access detected

[   89.595031] Memory state around the buggy address:
[   89.595032]  ffff880367cc2180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   89.595034]  ffff880367cc2200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   89.595036] >ffff880367cc2280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   89.595037]                                            ^
[   89.595038]  ffff880367cc2300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   89.595040]  ffff880367cc2380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   89.595041] ==================================================================
[   89.595042] Disabling lock debugging due to kernel taint
[  239.309507] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (4294967296, 2, 4096, -12)
[  239.340689] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (4294967296, 2, 4096, -12)
[  323.869197] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (2147483648, 2, 4096, -12)
[  323.869377] [drm:amdgpu_gem_object_create [amdgpu]] *ERROR* Failed to allocate GEM object (2147483648, 2, 4096, -12)
[  349.580299] kasan: CONFIG_KASAN_INLINE enabled
[  349.580305] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  349.580311] general protection fault: 0000 [#1] SMP KASAN
[  349.580313] Modules linked in: snd_hda_codec_realtek snd_hda_codec_generic cpufreq_powersave cpufreq_userspace cpufreq_conservative binfmt_misc nls_ascii nls_cp437 vfat fat edac_mce_amd kvm amdkfd irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel amdgpu pcbc snd_hda_codec_hdmi efi_pstore chash ttm snd_hda_intel snd_hda_codec drm_kms_helper snd_hda_core snd_hwdep wmi_bmof drm snd_pcm aesni_intel i2c_algo_bit snd_timer aes_x86_64 crypto_simd glue_helper sp5100_tco snd fb_sys_fops ccp syscopyarea r8169 ppdev sysfillrect cryptd pcspkr efivars sysimgblt i2c_piix4 mii sg rng_core mfd_core soundcore wmi parport_pc parport i2c_designware_platform i2c_designware_core button acpi_cpufreq tcp_bbr sch_fq nct6775 hwmon_vid sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto raid10
[  349.580366]  raid1 raid0 multipath linear md_mod dm_mod sd_mod evdev hid_generic usbhid hid ahci crc32c_intel libahci xhci_pci libata xhci_hcd usbcore scsi_mod shpchp gpio_amdpt gpio_generic
[  349.580385] CPU: 5 PID: 529 Comm: max-texture-siz Tainted: G    B   W       4.13.0-rc5+ #29
[  349.580388] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.80 09/13/2017
[  349.580392] task: ffff88036d258000 task.stack: ffff880274260000
[  349.580466] RIP: 0010:amdgpu_vm_bo_invalidate+0x277/0xd60 [amdgpu]
[  349.580469] RSP: 0018:ffff880274266810 EFLAGS: 00010216
[  349.580472] RAX: 0000000000000000 RBX: ffff8801d9ac2d00 RCX: 0000000000000000
[  349.580474] RDX: ffff880389c74408 RSI: ffff880389c73da8 RDI: 0000000000000220
[  349.580476] RBP: ffff880274266aa8 R08: 0000000000000010 R09: 0000000000000044
[  349.580479] R10: ffff880274266d88 R11: 0000000000000010 R12: dffffc0000000000
[  349.580481] R13: ffff8801d9ac2d20 R14: ffff880259586600 R15: ffff880389c74400
[  349.580484] FS:  00007ff180510300(0000) GS:ffff88038e540000(0000) knlGS:0000000000000000
[  349.580487] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  349.580489] CR2: 00007fe4e75c1000 CR3: 000000037f1bc000 CR4: 00000000003406e0
[  349.580491] Call Trace:
[  349.580502]  ? ttm_mem_global_alloc_zone.constprop.4+0x1c6/0x290 [ttm]
[  349.580509]  ? ttm_bo_init_reserved+0x9b6/0x1300 [ttm]
[  349.580570]  ? amdgpu_bo_do_create+0x594/0x1420 [amdgpu]
[  349.580634]  ? amdgpu_vm_bo_rmv+0xd60/0xd60 [amdgpu]
[  349.580641]  ? ttm_mem_global_alloc_page+0x9e/0xf0 [ttm]
[  349.580649]  ? ttm_pool_populate+0x46e/0xd10 [ttm]
[  349.580655]  ? kasan_kmalloc+0xad/0xe0
[  349.580664]  ? ttm_pool_unpopulate+0x290/0x290 [ttm]
[  349.580669]  ? kvmalloc_node+0x75/0x80
[  349.580676]  ? ttm_dma_tt_init+0x285/0x620 [ttm]
[  349.580679]  ? kasan_unpoison_shadow+0x35/0x50
[  349.580686]  ? ttm_tt_init+0x300/0x300 [ttm]
[  349.580691]  ? nommu_map_page+0x5c/0xa0
[  349.580754]  ? amdgpu_ttm_backend_bind+0x10f/0xc70 [amdgpu]
[  349.580817]  ? amdgpu_ttm_tt_create+0x132/0x2c0 [amdgpu]
[  349.580879]  amdgpu_bo_move_notify+0x10f/0x350 [amdgpu]
[  349.580942]  ? amdgpu_bo_get_metadata+0x200/0x200 [amdgpu]
[  349.580950]  ttm_bo_handle_move_mem+0x782/0x22f0 [ttm]
[  349.580955]  ? reservation_object_reserve_shared+0x167/0x200
[  349.580962]  ? ttm_bo_add_move_fence.isra.17+0x29/0x160 [ttm]
[  349.580969]  ? ttm_bo_mem_space+0x518/0xdf0 [ttm]
[  349.580974]  ? free_one_page+0x1560/0x1560
[  349.580983]  ttm_bo_evict+0x3ff/0xf70 [ttm]
[  349.580990]  ? ttm_bo_release_list+0x910/0x910 [ttm]
[  349.580998]  ? ttm_bo_handle_move_mem+0x22f0/0x22f0 [ttm]
[  349.581005]  ? ttm_bo_add_to_lru+0x46c/0x870 [ttm]
[  349.581009]  ? kasan_kmalloc_large+0x9c/0xd0
[  349.581017]  ? ttm_mem_global_alloc_zone.constprop.4+0x1c6/0x290 [ttm]
[  349.581020]  ? kmem_cache_alloc+0xb7/0x1c0
[  349.581081]  ? amdgpu_ttm_bo_eviction_valuable+0x199/0x2b0 [amdgpu]
[  349.581143]  ? amdgpu_vram_mgr_new+0x4e4/0x780 [amdgpu]
[  349.581152]  ttm_mem_evict_first+0x312/0x4a0 [ttm]
[  349.581215]  ? amdgpu_vram_mgr_new+0x4e4/0x780 [amdgpu]
[  349.581222]  ? ttm_bo_evict+0xf70/0xf70 [ttm]
[  349.581230]  ttm_bo_mem_space+0x84e/0xdf0 [ttm]
[  349.581233]  ? __shmem_file_setup+0x250/0x520
[  349.581241]  ttm_bo_validate+0x322/0x580 [ttm]
[  349.581244]  ? unwind_dump+0x4e0/0x4e0
[  349.581251]  ? ttm_bo_evict_mm+0xa0/0xa0 [ttm]
[  349.581255]  ? cpufreq_default_governor+0x20/0x20
[  349.581277]  ? drm_add_edid_modes+0x44a0/0x67e0 [drm]
[  349.581286]  ttm_bo_init_reserved+0x9b6/0x1300 [ttm]
[  349.581294]  ? ttm_bo_validate+0x580/0x580 [ttm]
[  349.581298]  ? dentry_path_raw+0x10/0x10
[  349.581302]  ? proc_nr_files+0x30/0x30
[  349.581305]  ? shmem_get_inode+0x668/0x8f0
[  349.581308]  ? shmem_fh_to_dentry+0x160/0x160
[  349.581313]  ? entry_SYSCALL_64_fastpath+0x1e/0xa9
[  349.581318]  ? _copy_to_user+0x90/0x90
[  349.581321]  ? alloc_file+0x16d/0x440
[  349.581324]  ? __shmem_file_setup+0x2e0/0x520
[  349.581327]  ? shmem_fill_super+0xa10/0xa10
[  349.581344]  ? drm_gem_private_object_init+0x189/0x300 [drm]
[  349.581347]  ? kasan_kmalloc+0xad/0xe0
[  349.581412]  amdgpu_bo_do_create+0x594/0x1420 [amdgpu]
[  349.581486]  ? amdgpu_fill_buffer+0xb80/0xb80 [amdgpu]
[  349.581489]  ? update_stack_state+0x402/0x780
[  349.581559]  ? amdgpu_ttm_placement_from_domain+0x8d0/0x8d0 [amdgpu]
[  349.581565]  ? show_initstate+0xb0/0xb0
[  349.581570]  ? bpf_prog_alloc+0x320/0x320
[  349.581574]  ? unwind_next_frame.part.5+0x1bb/0xc90
[  349.581582]  ? __free_insn_slot+0x6a0/0x6a0
[  349.581585]  ? unwind_dump+0x4e0/0x4e0
[  349.581590]  ? rb_erase+0x3540/0x3540
[  349.581595]  ? __mem_cgroup_threshold+0x7b0/0x7b0
[  349.581599]  ? memory_max_write+0x420/0x420
[  349.581605]  ? __kernel_text_address+0xbf/0xf0
[  349.581608]  ? unwind_get_return_address+0x66/0xb0
[  349.581612]  ? __save_stack_trace+0x92/0x100
[  349.581685]  amdgpu_bo_create+0xba/0xa00 [amdgpu]
[  349.581759]  ? amdgpu_bo_do_create+0x1420/0x1420 [amdgpu]
[  349.581763]  ? mem_cgroup_uncharge_swap+0xc0/0xc0
[  349.581767]  ? kmem_cache_alloc+0xb7/0x1c0
[  349.581771]  ? __anon_vma_prepare+0x37a/0x650
[  349.581775]  ? __handle_mm_fault+0x31ac/0x5070
[  349.581778]  ? handle_mm_fault+0x292/0x800
[  349.581782]  ? __do_page_fault+0x412/0xa00
[  349.581785]  ? do_page_fault+0x22/0x30
[  349.581788]  ? page_fault+0x28/0x30
[  349.581792]  ? memcg_oom_wake_function+0x6a0/0x6a0
[  349.581866]  amdgpu_gem_object_create+0x11f/0x240 [amdgpu]
[  349.581942]  ? amdgpu_gem_object_free+0x1d0/0x1d0 [amdgpu]
[  349.581945]  ? __alloc_pages_nodemask+0x3d8/0xe50
[  349.582021]  amdgpu_gem_create_ioctl+0x3bb/0xc10 [amdgpu]
[  349.582097]  ? amdgpu_gem_object_close+0x790/0x790 [amdgpu]
[  349.582101]  ? page_add_new_anon_rmap+0x203/0x3d0
[  349.582105]  ? __check_object_size+0x22e/0x560
[  349.582180]  ? amdgpu_gem_object_close+0x790/0x790 [amdgpu]
[  349.582201]  drm_ioctl_kernel+0x1ce/0x330 [drm]
[  349.582221]  ? drm_ioctl_permit+0x2c0/0x2c0 [drm]
[  349.582225]  ? kasan_check_write+0x14/0x20
[  349.582246]  drm_ioctl+0x79a/0xc30 [drm]
[  349.582321]  ? amdgpu_gem_object_close+0x790/0x790 [amdgpu]
[  349.582342]  ? drm_getstats+0x20/0x20 [drm]
[  349.582347]  ? do_mmap+0x641/0x10f0
[  349.582419]  amdgpu_drm_ioctl+0xd8/0x1b0 [amdgpu]
[  349.582424]  do_vfs_ioctl+0x197/0x1490
[  349.582428]  ? vm_mmap_pgoff+0x1fe/0x2c0
[  349.582431]  ? ioctl_preallocate+0x2c0/0x2c0
[  349.582435]  ? __fget_light+0x2be/0x410
[  349.582438]  ? iterate_fd+0x2e0/0x2e0
[  349.582441]  ? handle_mm_fault+0x292/0x800
[  349.582445]  ? __handle_mm_fault+0x5070/0x5070
[  349.582449]  ? __do_page_fault+0x43a/0xa00
[  349.582453]  SyS_ioctl+0x79/0x90
[  349.582457]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[  349.582461] RIP: 0033:0x7ff17d1a7dc7
[  349.582463] RSP: 002b:00007ffe7a4edbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  349.582467] RAX: ffffffffffffffda RBX: 00007ff17d45eb00 RCX: 00007ff17d1a7dc7
[  349.582470] RDX: 00007ffe7a4edc40 RSI: 00000000c0206440 RDI: 0000000000000006
[  349.582472] RBP: 0000000040000010 R08: 00007ff17d45ebc8 R09: 0000000000000060
[  349.582475] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000040001000
[  349.582477] R13: 00007ff17d45eb58 R14: 0000000000001000 R15: 00007ff17d45eb00
[  349.582480] Code: 49 8b b6 20 02 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 f7 05 00 00 49 8b 47 58 48 8d b8 20 02 00 00 49 89 f9 49 c1 e9 03 <43> 80 3c 21 00 0f 85 e8 08 00 00 48 3b b0 20 02 00 00 0f 84 d9 
[  349.582596] RIP: amdgpu_vm_bo_invalidate+0x277/0xd60 [amdgpu] RSP: ffff880274266810
[  349.582600] ---[ end trace cc9c171d2cdc0539 ]---
[  360.319061] amdgpu 0000:23:00.0: Disabling VM faults because of PRT request!


More information about the amd-gfx mailing list