Raciness with page table shadows being swapped out
Nicolai Hähnle
nhaehnle at gmail.com
Tue Dec 13 10:23:46 UTC 2016
On 13.12.2016 10:48, Christian König wrote:
>>>> The attached patch has fixed these crashes for me so far, but it's
>>>> very heavy-handed: it collects all page table shadows and the page
>>>> directory shadow and adds them all to the reservations for the callers
>>>> of amdgpu_vm_update_page_directory.
>>>
>>> That is most likely just a timing change, cause the shadows should end
>>> up in the duplicates list anyway. So the patch shouldn't have any
>>> effect.
>>
>> Okay, so the reason for the remaining crash is still unclear at least
>> for me.
>
> Yeah, that's a really good question. Can you share the call stack of the
> problem once more?
Attaching the dmesg again.
amdgpu_gtt_mgr_alloc+0x23 resolves to the check
if (node->start != AMDGPU_BO_INVALID_OFFSET)
amdgpu_vm_update_page_directory+0x23f is
r = amdgpu_ttm_bind(&pt_shadow->tbo,
&pt_shadow->tbo.mem);
Nicolai
-------------- next part --------------
[ 545.477646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 545.477689] IP: [<ffffffffc0533ca3>] amdgpu_gtt_mgr_alloc+0x23/0x150 [amdgpu]
[ 545.477764] PGD 7e384a067
[ 545.477775] PUD 7f4a84067
[ 545.477786] PMD 0
[ 545.477797] Oops: 0000 [#1] SMP
[ 545.477810] Modules linked in: binfmt_misc nls_iso8859_1 eeepc_wmi asus_wmi video sparse_keymap mxm_wmi joydev input_leds edac_mce_amd edac_core kvm_amd kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel serio_raw snd_hda_codec snd_hda_core snd_hwdep fam15h_power k10temp snd_pcm snd_seq_midi i2c_piix4 snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer tpm_infineon snd soundcore wmi mac_hid shpchp parport_pc ppdev lp parport autofs4 algif_skcipher af_alg hid_generic usbhid hid dm_crypt amdkfd amd_iommu_v2 amdgpu crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel i2c_algo_bit aes_x86_64 drm_kms_helper glue_helper lrw syscopyarea gf128mul sysfillrect ablk_helper sysimgblt cryptd fb_sys_fops ttm psmouse drm ahci r8169 libahci mii fjes
[ 545.478165] CPU: 5 PID: 29619 Comm: glcts Not tainted 4.9.0-rc6-tip+drm-next-2 #104
[ 545.478191] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 LE R2.0, BIOS 2601 03/24/2015
[ 545.478225] task: ffff8be896f4d580 task.stack: ffffb7af4c3f4000
[ 545.478246] RIP: 0010:[<ffffffffc0533ca3>] [<ffffffffc0533ca3>] amdgpu_gtt_mgr_alloc+0x23/0x150 [amdgpu]
[ 545.478301] RSP: 0018:ffffb7af4c3f7a28 EFLAGS: 00010296
[ 545.478320] RAX: 7fffffffffffffff RBX: ffff8be8967e6180 RCX: ffff8be82806ec90
[ 545.478343] RDX: 0000000000000000 RSI: ffff8be82806ec58 RDI: ffff8be8957c9980
[ 545.478367] RBP: ffffb7af4c3f7a88 R08: ffff8be8bed5c540 R09: ffff8be89e003900
[ 545.478390] R10: ffff8be896af4cc0 R11: ffff8be8957c1900 R12: 0000000000000000
[ 545.478412] R13: 0000000000000000 R14: ffff8be8967e6228 R15: ffff8be82806fc00
[ 545.478437] FS: 00007ff4415f2740(0000) GS:ffff8be8bed40000(0000) knlGS:0000000000000000
[ 545.478462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 545.478481] CR2: 0000000000000048 CR3: 00000007c031a000 CR4: 00000000000406e0
[ 545.478506] Stack:
[ 545.478513] 0000000000000000 00000000d3451c09 ffff8be78308a9d8 0000000000000000
[ 545.478544] ffff8be8957c8000 0000000000001a00 ffff8be863b86000 ffff8be8967e6180
[ 545.478575] ffff8be82806ec90 0000000000000000 ffff8be8967e6228 ffff8be82806fc00
[ 545.478604] Call Trace:
[ 545.478632] [<ffffffffc0516bf1>] amdgpu_ttm_bind+0x61/0x160 [amdgpu]
[ 545.478672] [<ffffffffc052f58f>] amdgpu_vm_update_page_directory+0x23f/0x4c0 [amdgpu]
[ 545.478717] [<ffffffffc052124a>] amdgpu_cs_ioctl+0xd8a/0x1400 [amdgpu]
[ 545.478759] [<ffffffffc02f9e76>] drm_ioctl+0x1f6/0x4a0 [drm]
[ 545.478794] [<ffffffffc05204c0>] ? amdgpu_cs_find_mapping+0xa0/0xa0 [amdgpu]
[ 545.478823] [<ffffffff8b0b8255>] ? update_load_avg+0x75/0x390
[ 545.478858] [<ffffffffc050404c>] amdgpu_drm_ioctl+0x4c/0x80 [amdgpu]
[ 545.478882] [<ffffffff8b241e81>] do_vfs_ioctl+0xa1/0x5d0
[ 545.478902] [<ffffffff8b842e2a>] ? __schedule+0x23a/0x6f0
[ 545.478923] [<ffffffff8b242429>] SyS_ioctl+0x79/0x90
[ 545.478942] [<ffffffff8b848bfb>] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 545.478965] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 b8 ff ff ff ff ff ff ff 7f 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 38 4c 8b 21 <49> 39 44 24 48 74 11 31 c0 48 83 c4 38 5b 41 5c 41 5d 41 5e 41
[ 545.479133] RIP [<ffffffffc0533ca3>] amdgpu_gtt_mgr_alloc+0x23/0x150 [amdgpu]
[ 545.479179] RSP <ffffb7af4c3f7a28>
[ 545.479192] CR2: 0000000000000048
[ 545.485015] ---[ end trace 390c3d6250a76506 ]---
More information about the amd-gfx
mailing list