[DRM] drm_get_connector_name internal static string buffer causes a race

Dmitry Malkin DMalkin at ptsecurity.com
Tue Mar 25 01:08:23 PDT 2014


Hello, Daniel,

Thank you for response. I've got a couple questions for you:

0. What do you think about making integration test with continuous reloading i915 driver and X server (with FLR between iteration)?
Simplified example for ubuntu (root required):

#!/bin/bash
service lightdm stop
rmmod snd_hda_intel
echo -n "0000:00:02.0" > /sys/bus/pci/devices/0000\:00\:02.0/driver/unbind
rmmod i915
echo 1 > /sys/bus/pci/devices/0000\:00\:02.0/reset
modprobe i915
service lightdm start

This will uncover next problems:
* sysfs add/remove i2c connectors (parent/child warning)
* drm static buffer races
* NX bit violation crash (see dump below)
* and bunch of DMAR problems


1. Could you point me out git/branch with FIXME comments?

2. About kfree() problem: what if put string buffer onto stack of caller for drm_get_connector_name and drm_get_encoder_name? 
It will be auto-removed and there is will be the patch about adding new param for these functions.
(yes the patch will be big and awful to read)

======================= NX bit violation crash ========================================
Mar 20 21:53:29 haswell01 kernel: [13447.100849] Console: switching to colour dummy device 80x25
Mar 20 21:53:29 haswell01 kernel: [13447.100950] drm_kms_helper: drm: unregistered panic notifier
Mar 20 21:53:29 haswell01 kernel: [13447.117785] waiting module removal not supported: please upgrade
Mar 20 21:53:29 haswell01 kernel: [13447.117880] [drm] Module unloaded
Mar 20 21:53:29 haswell01 kernel: [13447.118244] waiting module removal not supported: please upgrade
Mar 20 21:53:29 haswell01 kernel: [13447.118360] waiting module removal not supported: please upgradewaiting module removal not supported: please upgrade
Mar 20 21:53:39 haswell01 kernel: [13447.118590] waiting module removal not supported: please upgrade<6>[13457.263808] [drm] Initialized drm 1.1.0 20060810
Mar 20 21:53:39 haswell01 kernel: [13457.333992] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Mar 20 21:53:39 haswell01 kernel: [13457.333996] BUG: unable to handle kernel paging request at ffff8803eb27fcb0
Mar 20 21:53:39 haswell01 kernel: [13457.333998] IP: [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
Mar 20 21:53:39 haswell01 kernel: [13457.334001] PGD 2fd9067 PUD 3e68c3063 PMD 404902063 PTE 80000003eb27f163
Mar 20 21:53:39 haswell01 kernel: [13457.334004] Oops: 0011 [#1] SMP 
Mar 20 21:53:39 haswell01 kernel: [13457.334006] Modules linked in: i915(+) video drm_kms_helper drm i2c_algo_bit snd_hda_codec_hdmi rfcomm bnep bluetooth nls_iso8859_1 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq hid_generic crct10dif_pclmul snd_seq_device usbhid crc32_pclmul snd_timer ghash_clmulni_intel snd aesni_intel psmouse hid mei_me aes_x86_64 lrw ppdev gf128mul mei glue_helper parport_pc ablk_helper lp cryptd soundcore parport serio_raw lpc_ich mac_hid e1000e ahci ptp libahci pps_core [last unloaded: video]
Mar 20 21:53:39 haswell01 kernel: [13457.334031] CPU: 0 PID: 4974 Comm: modprobe Not tainted 3.13.6 #10
Mar 20 21:53:39 haswell01 kernel: [13457.334032] Hardware name:                  /DQ87PG, BIOS PGQ8710H.86A.0144.2014.0113.1604 01/13/2014
Mar 20 21:53:39 haswell01 kernel: [13457.334034] task: ffff88002e6c5fc0 ti: ffff880406394000 task.ti: ffff880406394000
Mar 20 21:53:39 haswell01 kernel: [13457.334035] RIP: 0010:[<ffff8803eb27fcb0>]  [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
Mar 20 21:53:39 haswell01 kernel: [13457.334037] RSP: 0018:ffff880406395af0  EFLAGS: 00010282
Mar 20 21:53:39 haswell01 kernel: [13457.334038] RAX: ffff8803eb27f4b0 RBX: ffff8803f6016000 RCX: ffffffffa0238630
Mar 20 21:53:39 haswell01 kernel: [13457.334039] RDX: ffff8803eb27fcb0 RSI: ffffffffa03ba3c4 RDI: ffff8803f6016000
Mar 20 21:53:39 haswell01 kernel: [13457.334040] RBP: ffff880406395b10 R08: 0000000000000000 R09: ffff88041ea172e0
Mar 20 21:53:39 haswell01 kernel: [13457.334042] R10: ffffea00101d7200 R11: 0000000000000000 R12: 0000000000000000
Mar 20 21:53:39 haswell01 kernel: [13457.334043] R13: ffffffffa03ba3c4 R14: ffffffffa03dd100 R15: 0000000000000000
Mar 20 21:53:39 haswell01 kernel: [13457.334044] FS:  00007f3f392d0740(0000) GS:ffff88041ea00000(0000) knlGS:0000000000000000
Mar 20 21:53:39 haswell01 kernel: [13457.334046] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 21:53:39 haswell01 kernel: [13457.334047] CR2: ffff8803eb27fcb0 CR3: 00000003e7358000 CR4: 00000000001407f0
Mar 20 21:53:39 haswell01 kernel: [13457.334048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 20 21:53:39 haswell01 kernel: [13457.334050] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 20 21:53:39 haswell01 kernel: [13457.334051] Stack:
Mar 20 21:53:39 haswell01 kernel: [13457.334052]  ffffffffa0213cb2 ffff8803f6016000 ffff880408199000 ffff880408199098
Mar 20 21:53:39 haswell01 kernel: [13457.334054]  ffff880406395b60 ffffffffa0215b92 0000000000000004 ffff880408199140
Mar 20 21:53:39 haswell01 kernel: [13457.334057]  ffffffffa03b9920 ffff880408199000 0000000000000000 ffffffffa03dd000
Mar 20 21:53:39 haswell01 kernel: [13457.334059] Call Trace:
Mar 20 21:53:39 haswell01 kernel: [13457.334067]  [<ffffffffa0213cb2>] ? drm_dev_register+0xa2/0x1e0 [drm]
Mar 20 21:53:39 haswell01 kernel: [13457.334073]  [<ffffffffa0215b92>] drm_get_pci_dev+0x92/0x140 [drm]
Mar 20 21:53:39 haswell01 kernel: [13457.334082]  [<ffffffffa033d67c>] i915_pci_probe+0x3c/0x90 [i915]
Mar 20 21:53:39 haswell01 kernel: [13457.334086]  [<ffffffff813982b5>] local_pci_probe+0x45/0xa0
Mar 20 21:53:39 haswell01 kernel: [13457.334088]  [<ffffffff81399555>] ? pci_match_device+0xc5/0xd0
Mar 20 21:53:39 haswell01 kernel: [13457.334090]  [<ffffffff81399679>] pci_device_probe+0xd9/0x130
Mar 20 21:53:39 haswell01 kernel: [13457.334093]  [<ffffffff81484795>] driver_probe_device+0x125/0x3b0
Mar 20 21:53:39 haswell01 kernel: [13457.334095]  [<ffffffff81484af3>] __driver_attach+0x93/0xa0
Mar 20 21:53:39 haswell01 kernel: [13457.334098]  [<ffffffff81484a60>] ? __device_attach+0x40/0x40
Mar 20 21:53:39 haswell01 kernel: [13457.334100]  [<ffffffff81482703>] bus_for_each_dev+0x63/0xa0
Mar 20 21:53:39 haswell01 kernel: [13457.334102]  [<ffffffff8148414e>] driver_attach+0x1e/0x20
Mar 20 21:53:39 haswell01 kernel: [13457.334104]  [<ffffffff81483d30>] bus_add_driver+0x180/0x250
Mar 20 21:53:39 haswell01 kernel: [13457.334106]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
Mar 20 21:53:39 haswell01 kernel: [13457.334109]  [<ffffffff81485174>] driver_register+0x64/0xf0
Mar 20 21:53:39 haswell01 kernel: [13457.334110]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
Mar 20 21:53:39 haswell01 kernel: [13457.334113]  [<ffffffff81397c4c>] __pci_register_driver+0x4c/0x50
Mar 20 21:53:39 haswell01 kernel: [13457.334117]  [<ffffffffa0215d5a>] drm_pci_init+0x11a/0x130 [drm]
Mar 20 21:53:39 haswell01 kernel: [13457.334119]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
Mar 20 21:53:39 haswell01 kernel: [13457.334127]  [<ffffffffa03fe066>] i915_init+0x66/0x68 [i915]
Mar 20 21:53:39 haswell01 kernel: [13457.334130]  [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
Mar 20 21:53:39 haswell01 kernel: [13457.334133]  [<ffffffff810589e3>] ? set_memory_nx+0x43/0x50
Mar 20 21:53:39 haswell01 kernel: [13457.334137]  [<ffffffff810e0630>] load_module+0x2050/0x26f0
Mar 20 21:53:39 haswell01 kernel: [13457.334139]  [<ffffffff810dbfa0>] ? store_uevent+0x40/0x40
Mar 20 21:53:39 haswell01 kernel: [13457.334141]  [<ffffffff810e0e46>] SyS_finit_module+0x86/0xb0
Mar 20 21:53:39 haswell01 kernel: [13457.334144]  [<ffffffff8171823f>] tracesys+0xe1/0xe6
Mar 20 21:53:39 haswell01 kernel: [13457.334145] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 <f0> b1 ca 81 ff ff ff ff b0 f4 27 eb 03 88 ff ff ff ff ff 7f 00 
Mar 20 21:53:39 haswell01 kernel: [13457.334164] RIP  [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
Mar 20 21:53:39 haswell01 kernel: [13457.334166]  RSP <ffff880406395af0>
Mar 20 21:53:39 haswell01 kernel: [13457.334167] CR2: ffff8803eb27fcb0
Mar 20 21:53:39 haswell01 kernel: [13457.334168] ---[ end trace e2598b1fc83a65fd ]---


________________________________________
From: Daniel Vetter <daniel.vetter at ffwll.ch> on behalf of Daniel Vetter <daniel at ffwll.ch>
Sent: Tuesday, March 25, 2014 12:31 AM
To: Dmitry Malkin
Cc: dri-devel at lists.freedesktop.org
Subject: Re: [DRM] drm_get_connector_name internal static string buffer causes a race

On Mon, Mar 24, 2014 at 12:06:21PM +0000, Dmitry Malkin wrote:
>
> Hello guys,
>
> I've been playing with reloading intel gfx driver (i915) in a cycle, for a while,
> and at some point I've found a non-deterministic kernel crash with a highly-variable
> iteration dependency -- 2 to 200 driver reload iterations.
>
> The apparent race is over the shared internal string buffer in drm_get_connector_name().
> It is mostly harmless, due to its results being mostly used for log output, but in at least one
> case  -- drm_sysfs_connector_add() -- this leads to a more critical error.
>
> Race scenario:
>
> - drm_sysfs_connector_add()
>    - drm_get_connector_name()
> vs.
> - anything that generates log messages involving DRM connectors
>    - drm_get_connector_name()
>
>  (and many other from drm_crtc.c) shares with caller const char* to internal static char buffer.
> If something call it from other thread, while main thread strore and use returned pointer it may overwrite connector name.
>
> Here are we go: registering HDMI connecter  (drm_sysfs_connector_add store and use pointer from drm_get_connector_name)
> and the same time got VGA connector name down through the stack. (the second thread is upowerd who watch continuously sysfs)

Yeah, in my recent kerneldoc series I've noticed this too and added FIXME
comments. There's a lot more than just those you've pointed out. The
problem is that fixing these will be an awful lot of work since you need
to add proper cleanup code (calling kfree()) to all the required places.

So a full subsystem wide code audit for every single use-site of these.
-Daniel


More information about the dri-devel mailing list