[DRM] drm_get_connector_name internal static string buffer causes a race

Daniel Vetter daniel at ffwll.ch
Tue Mar 25 05:36:14 PDT 2014


On Tue, Mar 25, 2014 at 11:23:46AM +0000, Dmitry Malkin wrote:
> Hi,
> will you accept bash scripts for reloading driver/X/FLR for intel-gpu-tools to uncover exists and future bugs besides those patches?

i-g-t/tests/drv_module_reload we have already, so basic coverage is there.
It it's run in our nightly test runs on drm-intel-nightly.

But that scripts tries very hard not to cause a race so more fancy stuff
is still needed. But until we have a more solid driver I don't think
there's much point for crazier scripts, it'll just cause headaches.
-Daniel


> ________________________________________
> From: Daniel Vetter <daniel.vetter at ffwll.ch> on behalf of Daniel Vetter <daniel at ffwll.ch>
> Sent: Tuesday, March 25, 2014 2:43 PM
> To: Dmitry Malkin
> Cc: Daniel Vetter; dri-devel at lists.freedesktop.org
> Subject: Re: [DRM] drm_get_connector_name internal static string buffer causes a race
> 
> On Tue, Mar 25, 2014 at 08:08:23AM +0000, Dmitry Malkin wrote:
> > Hello, Daniel,
> >
> > Thank you for response. I've got a couple questions for you:
> >
> > 0. What do you think about making integration test with continuous reloading i915 driver and X server (with FLR between iteration)?
> > Simplified example for ubuntu (root required):
> 
> Module reloading is know to be horribly racy atm. Don't do that in any
> kind of stress-test situation before fixing up piles of bugs ;-)
> 
> Will mean: You'll uncover at _lot_ more than just this. Patches for all
> this highly welcome though.
> 
> Thanks, Daniel
> 
> >
> > #!/bin/bash
> > service lightdm stop
> > rmmod snd_hda_intel
> > echo -n "0000:00:02.0" > /sys/bus/pci/devices/0000\:00\:02.0/driver/unbind
> > rmmod i915
> > echo 1 > /sys/bus/pci/devices/0000\:00\:02.0/reset
> > modprobe i915
> > service lightdm start
> >
> > This will uncover next problems:
> > * sysfs add/remove i2c connectors (parent/child warning)
> > * drm static buffer races
> > * NX bit violation crash (see dump below)
> > * and bunch of DMAR problems
> >
> >
> > 1. Could you point me out git/branch with FIXME comments?
> >
> > 2. About kfree() problem: what if put string buffer onto stack of caller for drm_get_connector_name and drm_get_encoder_name?
> > It will be auto-removed and there is will be the patch about adding new param for these functions.
> > (yes the patch will be big and awful to read)
> >
> > ======================= NX bit violation crash ========================================
> > Mar 20 21:53:29 haswell01 kernel: [13447.100849] Console: switching to colour dummy device 80x25
> > Mar 20 21:53:29 haswell01 kernel: [13447.100950] drm_kms_helper: drm: unregistered panic notifier
> > Mar 20 21:53:29 haswell01 kernel: [13447.117785] waiting module removal not supported: please upgrade
> > Mar 20 21:53:29 haswell01 kernel: [13447.117880] [drm] Module unloaded
> > Mar 20 21:53:29 haswell01 kernel: [13447.118244] waiting module removal not supported: please upgrade
> > Mar 20 21:53:29 haswell01 kernel: [13447.118360] waiting module removal not supported: please upgradewaiting module removal not supported: please upgrade
> > Mar 20 21:53:39 haswell01 kernel: [13447.118590] waiting module removal not supported: please upgrade<6>[13457.263808] [drm] Initialized drm 1.1.0 20060810
> > Mar 20 21:53:39 haswell01 kernel: [13457.333992] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> > Mar 20 21:53:39 haswell01 kernel: [13457.333996] BUG: unable to handle kernel paging request at ffff8803eb27fcb0
> > Mar 20 21:53:39 haswell01 kernel: [13457.333998] IP: [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334001] PGD 2fd9067 PUD 3e68c3063 PMD 404902063 PTE 80000003eb27f163
> > Mar 20 21:53:39 haswell01 kernel: [13457.334004] Oops: 0011 [#1] SMP
> > Mar 20 21:53:39 haswell01 kernel: [13457.334006] Modules linked in: i915(+) video drm_kms_helper drm i2c_algo_bit snd_hda_codec_hdmi rfcomm bnep bluetooth nls_iso8859_1 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq hid_generic crct10dif_pclmul snd_seq_device usbhid crc32_pclmul snd_timer ghash_clmulni_intel snd aesni_intel psmouse hid mei_me aes_x86_64 lrw ppdev gf128mul mei glue_helper parport_pc ablk_helper lp cryptd soundcore parport serio_raw lpc_ich mac_hid e1000e ahci ptp libahci pps_core [last unloaded: video]
> > Mar 20 21:53:39 haswell01 kernel: [13457.334031] CPU: 0 PID: 4974 Comm: modprobe Not tainted 3.13.6 #10
> > Mar 20 21:53:39 haswell01 kernel: [13457.334032] Hardware name:                  /DQ87PG, BIOS PGQ8710H.86A.0144.2014.0113.1604 01/13/2014
> > Mar 20 21:53:39 haswell01 kernel: [13457.334034] task: ffff88002e6c5fc0 ti: ffff880406394000 task.ti: ffff880406394000
> > Mar 20 21:53:39 haswell01 kernel: [13457.334035] RIP: 0010:[<ffff8803eb27fcb0>]  [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334037] RSP: 0018:ffff880406395af0  EFLAGS: 00010282
> > Mar 20 21:53:39 haswell01 kernel: [13457.334038] RAX: ffff8803eb27f4b0 RBX: ffff8803f6016000 RCX: ffffffffa0238630
> > Mar 20 21:53:39 haswell01 kernel: [13457.334039] RDX: ffff8803eb27fcb0 RSI: ffffffffa03ba3c4 RDI: ffff8803f6016000
> > Mar 20 21:53:39 haswell01 kernel: [13457.334040] RBP: ffff880406395b10 R08: 0000000000000000 R09: ffff88041ea172e0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334042] R10: ffffea00101d7200 R11: 0000000000000000 R12: 0000000000000000
> > Mar 20 21:53:39 haswell01 kernel: [13457.334043] R13: ffffffffa03ba3c4 R14: ffffffffa03dd100 R15: 0000000000000000
> > Mar 20 21:53:39 haswell01 kernel: [13457.334044] FS:  00007f3f392d0740(0000) GS:ffff88041ea00000(0000) knlGS:0000000000000000
> > Mar 20 21:53:39 haswell01 kernel: [13457.334046] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Mar 20 21:53:39 haswell01 kernel: [13457.334047] CR2: ffff8803eb27fcb0 CR3: 00000003e7358000 CR4: 00000000001407f0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > Mar 20 21:53:39 haswell01 kernel: [13457.334050] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Mar 20 21:53:39 haswell01 kernel: [13457.334051] Stack:
> > Mar 20 21:53:39 haswell01 kernel: [13457.334052]  ffffffffa0213cb2 ffff8803f6016000 ffff880408199000 ffff880408199098
> > Mar 20 21:53:39 haswell01 kernel: [13457.334054]  ffff880406395b60 ffffffffa0215b92 0000000000000004 ffff880408199140
> > Mar 20 21:53:39 haswell01 kernel: [13457.334057]  ffffffffa03b9920 ffff880408199000 0000000000000000 ffffffffa03dd000
> > Mar 20 21:53:39 haswell01 kernel: [13457.334059] Call Trace:
> > Mar 20 21:53:39 haswell01 kernel: [13457.334067]  [<ffffffffa0213cb2>] ? drm_dev_register+0xa2/0x1e0 [drm]
> > Mar 20 21:53:39 haswell01 kernel: [13457.334073]  [<ffffffffa0215b92>] drm_get_pci_dev+0x92/0x140 [drm]
> > Mar 20 21:53:39 haswell01 kernel: [13457.334082]  [<ffffffffa033d67c>] i915_pci_probe+0x3c/0x90 [i915]
> > Mar 20 21:53:39 haswell01 kernel: [13457.334086]  [<ffffffff813982b5>] local_pci_probe+0x45/0xa0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334088]  [<ffffffff81399555>] ? pci_match_device+0xc5/0xd0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334090]  [<ffffffff81399679>] pci_device_probe+0xd9/0x130
> > Mar 20 21:53:39 haswell01 kernel: [13457.334093]  [<ffffffff81484795>] driver_probe_device+0x125/0x3b0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334095]  [<ffffffff81484af3>] __driver_attach+0x93/0xa0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334098]  [<ffffffff81484a60>] ? __device_attach+0x40/0x40
> > Mar 20 21:53:39 haswell01 kernel: [13457.334100]  [<ffffffff81482703>] bus_for_each_dev+0x63/0xa0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334102]  [<ffffffff8148414e>] driver_attach+0x1e/0x20
> > Mar 20 21:53:39 haswell01 kernel: [13457.334104]  [<ffffffff81483d30>] bus_add_driver+0x180/0x250
> > Mar 20 21:53:39 haswell01 kernel: [13457.334106]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
> > Mar 20 21:53:39 haswell01 kernel: [13457.334109]  [<ffffffff81485174>] driver_register+0x64/0xf0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334110]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
> > Mar 20 21:53:39 haswell01 kernel: [13457.334113]  [<ffffffff81397c4c>] __pci_register_driver+0x4c/0x50
> > Mar 20 21:53:39 haswell01 kernel: [13457.334117]  [<ffffffffa0215d5a>] drm_pci_init+0x11a/0x130 [drm]
> > Mar 20 21:53:39 haswell01 kernel: [13457.334119]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
> > Mar 20 21:53:39 haswell01 kernel: [13457.334127]  [<ffffffffa03fe066>] i915_init+0x66/0x68 [i915]
> > Mar 20 21:53:39 haswell01 kernel: [13457.334130]  [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334133]  [<ffffffff810589e3>] ? set_memory_nx+0x43/0x50
> > Mar 20 21:53:39 haswell01 kernel: [13457.334137]  [<ffffffff810e0630>] load_module+0x2050/0x26f0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334139]  [<ffffffff810dbfa0>] ? store_uevent+0x40/0x40
> > Mar 20 21:53:39 haswell01 kernel: [13457.334141]  [<ffffffff810e0e46>] SyS_finit_module+0x86/0xb0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334144]  [<ffffffff8171823f>] tracesys+0xe1/0xe6
> > Mar 20 21:53:39 haswell01 kernel: [13457.334145] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 <f0> b1 ca 81 ff ff ff ff b0 f4 27 eb 03 88 ff ff ff ff ff 7f 00
> > Mar 20 21:53:39 haswell01 kernel: [13457.334164] RIP  [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334166]  RSP <ffff880406395af0>
> > Mar 20 21:53:39 haswell01 kernel: [13457.334167] CR2: ffff8803eb27fcb0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334168] ---[ end trace e2598b1fc83a65fd ]---
> >
> >
> > ________________________________________
> > From: Daniel Vetter <daniel.vetter at ffwll.ch> on behalf of Daniel Vetter <daniel at ffwll.ch>
> > Sent: Tuesday, March 25, 2014 12:31 AM
> > To: Dmitry Malkin
> > Cc: dri-devel at lists.freedesktop.org
> > Subject: Re: [DRM] drm_get_connector_name internal static string buffer causes a race
> >
> > On Mon, Mar 24, 2014 at 12:06:21PM +0000, Dmitry Malkin wrote:
> > >
> > > Hello guys,
> > >
> > > I've been playing with reloading intel gfx driver (i915) in a cycle, for a while,
> > > and at some point I've found a non-deterministic kernel crash with a highly-variable
> > > iteration dependency -- 2 to 200 driver reload iterations.
> > >
> > > The apparent race is over the shared internal string buffer in drm_get_connector_name().
> > > It is mostly harmless, due to its results being mostly used for log output, but in at least one
> > > case  -- drm_sysfs_connector_add() -- this leads to a more critical error.
> > >
> > > Race scenario:
> > >
> > > - drm_sysfs_connector_add()
> > >    - drm_get_connector_name()
> > > vs.
> > > - anything that generates log messages involving DRM connectors
> > >    - drm_get_connector_name()
> > >
> > >  (and many other from drm_crtc.c) shares with caller const char* to internal static char buffer.
> > > If something call it from other thread, while main thread strore and use returned pointer it may overwrite connector name.
> > >
> > > Here are we go: registering HDMI connecter  (drm_sysfs_connector_add store and use pointer from drm_get_connector_name)
> > > and the same time got VGA connector name down through the stack. (the second thread is upowerd who watch continuously sysfs)
> >
> > Yeah, in my recent kerneldoc series I've noticed this too and added FIXME
> > comments. There's a lot more than just those you've pointed out. The
> > problem is that fixing these will be an awful lot of work since you need
> > to add proper cleanup code (calling kfree()) to all the required places.
> >
> > So a full subsystem wide code audit for every single use-site of these.
> > -Daniel
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list