BUG: drm/mgag200 NULL pointer dereference at 0000000000000060

Wed Nov 18 06:55:39 PST 2015

Hi All,
Just found the following bug causing machine hang:

[  487.777538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
[  487.777554] IP: [<ffffffff8158aaee>] _raw_spin_lock+0xe/0x30
[  487.777557] PGD 42e9f7067 PUD 42f2fa067 PMD 0
[  487.777560] Oops: 0002 [#1] SMP
...
[  487.777618] CPU: 21 PID: 3190 Comm: Xorg Tainted: G            E   4.4.0-rc1-3-default+ #6
[  487.777620] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
[  487.777621] task: ffff880853ae4680 ti: ffff8808696d4000 task.ti: ffff8808696d4000
[  487.777625] RIP: 0010:[<ffffffff8158aaee>]  [<ffffffff8158aaee>] _raw_spin_lock+0xe/0x30
[  487.777627] RSP: 0018:ffff8808696d79c0  EFLAGS: 00010246
[  487.777628] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  487.777629] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000060
[  487.777630] RBP: ffff8808696d79e0 R08: 0000000000000000 R09: ffff88086924a780
[  487.777631] R10: 000000000001bb40 R11: 0000000000003246 R12: 0000000000000000
[  487.777632] R13: ffff880463a27360 R14: ffff88046ca50218 R15: 0000000000000080
[  487.777634] FS:  00007f3f81c5a8c0(0000) GS:ffff88086f060000(0000) knlGS:0000000000000000
[  487.777635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  487.777636] CR2: 0000000000000060 CR3: 000000042e678000 CR4: 00000000001406e0
[  487.777638] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  487.777639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  487.777639] Stack:
[  487.777642]  ffffffffa00eb5fa ffff8808696d7b60 ffff88086b87d800 0000000000000000
[  487.777644]  ffff8808696d7ac8 ffffffffa01694b6 ffff8808696d7ae8 ffffffff8109c8d5
[  487.777647]  ffff880469158740 ffff880463a27000 ffff88086b87d800 ffff88086b87d800
[  487.777647] Call Trace:
[  487.777674]  [<ffffffffa00eb5fa>] ? drm_gem_object_lookup+0x1a/0xa0 [drm]
[  487.777681]  [<ffffffffa01694b6>] mga_crtc_cursor_set+0xc6/0xb60 [mgag200]
[  487.777691]  [<ffffffff8109c8d5>] ? find_busiest_group+0x35/0x4a0
[  487.777696]  [<ffffffff81086294>] ? __might_sleep+0x44/0x80
[  487.777699]  [<ffffffff815888c2>] ? __ww_mutex_lock+0x22/0x9c
[  487.777722]  [<ffffffffa0104f64>] ? drm_modeset_lock+0x34/0xf0 [drm]
[  487.777733]  [<ffffffffa0148d9e>] restore_fbdev_mode+0xee/0x2a0 [drm_kms_helper]
[  487.777742]  [<ffffffffa014afce>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x70 [drm_kms_helper]
[  487.777748]  [<ffffffffa014b037>] drm_fb_helper_set_par+0x27/0x50 [drm_kms_helper]
[  487.777752]  [<ffffffff8134560c>] fb_set_var+0x18c/0x3f0
[  487.777777]  [<ffffffffa02a9b0a>] ? __ext4_handle_dirty_metadata+0x8a/0x210 [ext4]
[  487.777783]  [<ffffffff8133cb97>] fbcon_blank+0x1b7/0x2b0
[  487.777790]  [<ffffffff813be2a3>] do_unblank_screen+0xb3/0x1c0
[  487.777795]  [<ffffffff813b5aba>] vt_ioctl+0x118a/0x1210
[  487.777801]  [<ffffffff813a8fe0>] tty_ioctl+0x3f0/0xc90
[  487.777808]  [<ffffffff81172018>] ? kzfree+0x28/0x30
[  487.777813]  [<ffffffff811e053f>] ? mntput+0x1f/0x30
[  487.777817]  [<ffffffff811d3f5d>] do_vfs_ioctl+0x30d/0x570
[  487.777822]  [<ffffffff8107ed3a>] ? task_work_run+0x8a/0xa0
[  487.777825]  [<ffffffff811d4234>] SyS_ioctl+0x74/0x80
[  487.777829]  [<ffffffff8158aeae>] entry_SYSCALL_64_fastpath+0x12/0x71
[  487.777851] Code: 65 ff 0d ce 02 a8 7e 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 e8 b0 01 5d c3 0f 1f 00 65 ff 05 b1 02 a8 7e 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 4e f5 b1 ff 5d
[  487.777854] RIP  [<ffffffff8158aaee>] _raw_spin_lock+0xe/0x30
[  487.777855]  RSP <ffff8808696d79c0>
[  487.777856] CR2: 0000000000000060
[  487.777860] ---[ end trace 672a2cd555e0ebd3 ]---

Analysis:
The faulting instruction is at _raw_spin_lock+0xe/0x30, which is:

<_raw_spin_lock+14>: lock cmpxchg %edx,(%rdi)

It is because rdi is an invalid pointer:0000000000000060

The source code:
drm_gem_object_lookup(struct drm_device *dev, struct drm_file *filp,
                      u32 handle)
{
        struct drm_gem_object *obj;

        spin_lock(&filp->table_lock); <== faulting

In assembly:
<drm_gem_object_lookup>:     push   %rbp
<drm_gem_object_lookup+1>:   lea    0x60(%rsi),%rdi <== rdi=rsi+0x60, so rsi==NULL
<drm_gem_object_lookup+5>:   mov    %rsp,%rbp
<drm_gem_object_lookup+8>:   push   %r12
<drm_gem_object_lookup+10>:  mov    %edx,%r12d
<drm_gem_object_lookup+13>:  push   %rbx
<drm_gem_object_lookup+14>:  mov    %rsi,%rbx
<drm_gem_object_lookup+17>:  sub    $0x8,%rsp
<drm_gem_object_lookup+21>:  callq  0xffffffff8158aae0 <_raw_spin_lock>

conclusion:
%rsi, the second argument of drm_gem_object_lookup(), filp == NULL.

I'll send a patch in a separate Email. 

Regards,
Rui