Raciness with page table shadows being swapped out

Nicolai Hähnle nhaehnle at gmail.com
Tue Dec 13 10:23:46 UTC 2016


On 13.12.2016 10:48, Christian König wrote:
>>>> The attached patch has fixed these crashes for me so far, but it's
>>>> very heavy-handed: it collects all page table shadows and the page
>>>> directory shadow and adds them all to the reservations for the callers
>>>> of amdgpu_vm_update_page_directory.
>>>
>>> That is most likely just a timing change, cause the shadows should end
>>> up in the duplicates list anyway. So the patch shouldn't have any
>>> effect.
>>
>> Okay, so the reason for the remaining crash is still unclear at least
>> for me.
>
> Yeah, that's a really good question. Can you share the call stack of the
> problem once more?

Attaching the dmesg again.

amdgpu_gtt_mgr_alloc+0x23 resolves to the check

    if (node->start != AMDGPU_BO_INVALID_OFFSET)

amdgpu_vm_update_page_directory+0x23f is

	r = amdgpu_ttm_bind(&pt_shadow->tbo,
			    &pt_shadow->tbo.mem);

Nicolai
-------------- next part --------------
[  545.477646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  545.477689] IP: [<ffffffffc0533ca3>] amdgpu_gtt_mgr_alloc+0x23/0x150 [amdgpu]
[  545.477764] PGD 7e384a067
[  545.477775] PUD 7f4a84067
[  545.477786] PMD 0

[  545.477797] Oops: 0000 [#1] SMP
[  545.477810] Modules linked in: binfmt_misc nls_iso8859_1 eeepc_wmi asus_wmi video sparse_keymap mxm_wmi joydev input_leds edac_mce_amd edac_core kvm_amd kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel serio_raw snd_hda_codec snd_hda_core snd_hwdep fam15h_power k10temp snd_pcm snd_seq_midi i2c_piix4 snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer tpm_infineon snd soundcore wmi mac_hid shpchp parport_pc ppdev lp parport autofs4 algif_skcipher af_alg hid_generic usbhid hid dm_crypt amdkfd amd_iommu_v2 amdgpu crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel i2c_algo_bit aes_x86_64 drm_kms_helper glue_helper lrw syscopyarea gf128mul sysfillrect ablk_helper sysimgblt cryptd fb_sys_fops ttm psmouse drm ahci r8169 libahci mii fjes
[  545.478165] CPU: 5 PID: 29619 Comm: glcts Not tainted 4.9.0-rc6-tip+drm-next-2 #104
[  545.478191] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 LE R2.0, BIOS 2601 03/24/2015
[  545.478225] task: ffff8be896f4d580 task.stack: ffffb7af4c3f4000
[  545.478246] RIP: 0010:[<ffffffffc0533ca3>]  [<ffffffffc0533ca3>] amdgpu_gtt_mgr_alloc+0x23/0x150 [amdgpu]
[  545.478301] RSP: 0018:ffffb7af4c3f7a28  EFLAGS: 00010296
[  545.478320] RAX: 7fffffffffffffff RBX: ffff8be8967e6180 RCX: ffff8be82806ec90
[  545.478343] RDX: 0000000000000000 RSI: ffff8be82806ec58 RDI: ffff8be8957c9980
[  545.478367] RBP: ffffb7af4c3f7a88 R08: ffff8be8bed5c540 R09: ffff8be89e003900
[  545.478390] R10: ffff8be896af4cc0 R11: ffff8be8957c1900 R12: 0000000000000000
[  545.478412] R13: 0000000000000000 R14: ffff8be8967e6228 R15: ffff8be82806fc00
[  545.478437] FS:  00007ff4415f2740(0000) GS:ffff8be8bed40000(0000) knlGS:0000000000000000
[  545.478462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  545.478481] CR2: 0000000000000048 CR3: 00000007c031a000 CR4: 00000000000406e0
[  545.478506] Stack:
[  545.478513]  0000000000000000 00000000d3451c09 ffff8be78308a9d8 0000000000000000
[  545.478544]  ffff8be8957c8000 0000000000001a00 ffff8be863b86000 ffff8be8967e6180
[  545.478575]  ffff8be82806ec90 0000000000000000 ffff8be8967e6228 ffff8be82806fc00
[  545.478604] Call Trace:
[  545.478632]  [<ffffffffc0516bf1>] amdgpu_ttm_bind+0x61/0x160 [amdgpu]
[  545.478672]  [<ffffffffc052f58f>] amdgpu_vm_update_page_directory+0x23f/0x4c0 [amdgpu]
[  545.478717]  [<ffffffffc052124a>] amdgpu_cs_ioctl+0xd8a/0x1400 [amdgpu]
[  545.478759]  [<ffffffffc02f9e76>] drm_ioctl+0x1f6/0x4a0 [drm]
[  545.478794]  [<ffffffffc05204c0>] ? amdgpu_cs_find_mapping+0xa0/0xa0 [amdgpu]
[  545.478823]  [<ffffffff8b0b8255>] ? update_load_avg+0x75/0x390
[  545.478858]  [<ffffffffc050404c>] amdgpu_drm_ioctl+0x4c/0x80 [amdgpu]
[  545.478882]  [<ffffffff8b241e81>] do_vfs_ioctl+0xa1/0x5d0
[  545.478902]  [<ffffffff8b842e2a>] ? __schedule+0x23a/0x6f0
[  545.478923]  [<ffffffff8b242429>] SyS_ioctl+0x79/0x90
[  545.478942]  [<ffffffff8b848bfb>] entry_SYSCALL_64_fastpath+0x1e/0xad
[  545.478965] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 b8 ff ff ff ff ff ff ff 7f 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 38 4c 8b 21 <49> 39 44 24 48 74 11 31 c0 48 83 c4 38 5b 41 5c 41 5d 41 5e 41
[  545.479133] RIP  [<ffffffffc0533ca3>] amdgpu_gtt_mgr_alloc+0x23/0x150 [amdgpu]
[  545.479179]  RSP <ffffb7af4c3f7a28>
[  545.479192] CR2: 0000000000000048
[  545.485015] ---[ end trace 390c3d6250a76506 ]---


More information about the amd-gfx mailing list