[PATCH] drm/amdgpu: fix to add buffer funcs check

Christian König deathsimple at vodafone.de
Tue Apr 11 10:44:53 UTC 2017


Am 11.04.2017 um 11:45 schrieb Huang Rui:
> On Tue, Apr 11, 2017 at 02:53:27PM +0800, Christian König wrote:
>> Am 11.04.2017 um 04:58 schrieb Huang Rui:
>>> This patch fixes the case when buffer funcs is empty and bo evict is
>>> executing. It must double check buffer funcs, otherwise, a NULL
>>> pointer dereference kernel panic will be encountered.
>>>
>>>    BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
>>>    IP: [<ffffffffa067b6cd>] amdgpu_evict_flags+0x3d/0xf0 [amdgpu]
>>>    PGD 0
>>>
>>>    Oops: 0000 [#1] SMP
>>>    Modules linked in: amdgpu(OE) ttm drm_kms_helper drm i2c_algo_bit
>> fb_sys_fops syscopyarea sysfillrect sysimgblt fmem(OE) physmem_drv(OE)
>> rpcsec_gss_krb5 nfsv4 nfs fscache intel_rapl x86_pkg_temp_thermal
>> intel_powerclamp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic
>> kvm_intel snd_hda_intel snd_hda_codec kvm snd_hda_core joydev eeepc_wmi
>> asus_wmi sparse_keymap snd_hwdep snd_pcm irqbypass crct10dif_pclmul
>> snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq crc32_pclmul snd_seq_device
>> ghash_clmulni_intel aesni_intel aes_x86_64 snd_timer lrw gf128mul mei_me snd
>> glue_helper ablk_helper cryptd tpm_infineon mei lpc_ich serio_raw soundcore
>> shpchp mac_hid nfsd auth_rpcgss nfs_acl lockd grace coretemp sunrpc parport_pc
>> ppdev lp parport autofs4 hid_generic mxm_wmi r8169 usbhid ahci
>>>     psmouse libahci nvme mii hid nvme_core wmi video
>>>    CPU: 3 PID: 1627 Comm: kworker/u8:17 Tainted: G           OE   4.9.0-custom
>> #1
>>>    Hardware name: ASUS All Series/Z87-A, BIOS 1802 01/28/2014
>>>    Workqueue: events_unbound async_run_entry_fn
>>>    task: ffff88021e7057c0 task.stack: ffffc9000262c000
>>>    RIP: 0010:[<ffffffffa067b6cd>]  [<ffffffffa067b6cd>]
>> amdgpu_evict_flags+0x3d/0xf0 [amdgpu]
>>>    RSP: 0018:ffffc9000262fb30  EFLAGS: 00010246
>>>    RAX: 0000000000000000 RBX: ffff88021e8a5858 RCX: 0000000000000000
>>>    RDX: 0000000000000001 RSI: ffffc9000262fb58 RDI: ffff88021e8a5800
>>>    RBP: ffffc9000262fb48 R08: 0000000000000000 R09: ffff88021e8a5814
>>>    R10: 000000001def8f01 R11: ffff88021def8c80 R12: ffffc9000262fb58
>>>    R13: ffff88021d2b1990 R14: 0000000000000000 R15: ffff88021e8a5858
>>>    FS:  0000000000000000(0000) GS:ffff88022ed80000(0000)
>> knlGS:0000000000000000
>>>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>    CR2: 00000000000001a4 CR3: 0000000001c07000 CR4: 00000000001406e0
>>>    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Can we have the full stack trace please?
>>
>> Eviction should never occur before the buffer funcs are initialized, so
>> that patch just papers over some kind of race condition on startup as
>> far as I can see.
>>
> If we set ip_block_mask=0xff, sdma ip won't enable it. So funcs_ring is
> NULL at that time. Though it is a corner case, but we don't also expect it
> hang with kernel panic. I met it when I was debugging S3.

Good point, you should mention that in the commit message.

With that fixed the patch is Reviewed-by: Christian König 
<christian.koenig at amd.com>.

But I think there are a couple of more cases like this you might need to 
fix to work with the SDMA disabled.

Regards,
Christian.

> Thanks,
> Rui
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx




More information about the amd-gfx mailing list