[PATCH] drm/amdgpu: fix to add buffer funcs check

Huang Rui ray.huang at amd.com
Tue Apr 11 09:45:12 UTC 2017


On Tue, Apr 11, 2017 at 02:53:27PM +0800, Christian König wrote:
> Am 11.04.2017 um 04:58 schrieb Huang Rui:
> > This patch fixes the case when buffer funcs is empty and bo evict is
> > executing. It must double check buffer funcs, otherwise, a NULL
> > pointer dereference kernel panic will be encountered.
> >
> >   BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
> >   IP: [<ffffffffa067b6cd>] amdgpu_evict_flags+0x3d/0xf0 [amdgpu]
> >   PGD 0
> >
> >   Oops: 0000 [#1] SMP
> >   Modules linked in: amdgpu(OE) ttm drm_kms_helper drm i2c_algo_bit
> fb_sys_fops syscopyarea sysfillrect sysimgblt fmem(OE) physmem_drv(OE)
> rpcsec_gss_krb5 nfsv4 nfs fscache intel_rapl x86_pkg_temp_thermal
> intel_powerclamp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic
> kvm_intel snd_hda_intel snd_hda_codec kvm snd_hda_core joydev eeepc_wmi
> asus_wmi sparse_keymap snd_hwdep snd_pcm irqbypass crct10dif_pclmul
> snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq crc32_pclmul snd_seq_device
> ghash_clmulni_intel aesni_intel aes_x86_64 snd_timer lrw gf128mul mei_me snd
> glue_helper ablk_helper cryptd tpm_infineon mei lpc_ich serio_raw soundcore
> shpchp mac_hid nfsd auth_rpcgss nfs_acl lockd grace coretemp sunrpc parport_pc
> ppdev lp parport autofs4 hid_generic mxm_wmi r8169 usbhid ahci
> >    psmouse libahci nvme mii hid nvme_core wmi video
> >   CPU: 3 PID: 1627 Comm: kworker/u8:17 Tainted: G           OE   4.9.0-custom
> #1
> >   Hardware name: ASUS All Series/Z87-A, BIOS 1802 01/28/2014
> >   Workqueue: events_unbound async_run_entry_fn
> >   task: ffff88021e7057c0 task.stack: ffffc9000262c000
> >   RIP: 0010:[<ffffffffa067b6cd>]  [<ffffffffa067b6cd>]
> amdgpu_evict_flags+0x3d/0xf0 [amdgpu]
> >   RSP: 0018:ffffc9000262fb30  EFLAGS: 00010246
> >   RAX: 0000000000000000 RBX: ffff88021e8a5858 RCX: 0000000000000000
> >   RDX: 0000000000000001 RSI: ffffc9000262fb58 RDI: ffff88021e8a5800
> >   RBP: ffffc9000262fb48 R08: 0000000000000000 R09: ffff88021e8a5814
> >   R10: 000000001def8f01 R11: ffff88021def8c80 R12: ffffc9000262fb58
> >   R13: ffff88021d2b1990 R14: 0000000000000000 R15: ffff88021e8a5858
> >   FS:  0000000000000000(0000) GS:ffff88022ed80000(0000)
> knlGS:0000000000000000
> >   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   CR2: 00000000000001a4 CR3: 0000000001c07000 CR4: 00000000001406e0
> >   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 
> Can we have the full stack trace please?
> 
> Eviction should never occur before the buffer funcs are initialized, so
> that patch just papers over some kind of race condition on startup as
> far as I can see.
> 

If we set ip_block_mask=0xff, sdma ip won't enable it. So funcs_ring is
NULL at that time. Though it is a corner case, but we don't also expect it
hang with kernel panic. I met it when I was debugging S3.

Thanks,
Rui


More information about the amd-gfx mailing list