[Intel-gfx] 4.10-rc2 oops in DRM connector code

Dave Hansen dave.hansen at intel.com
Mon Jan 9 17:22:52 UTC 2017


On 01/09/2017 08:59 AM, Daniel Vetter wrote:
> On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen <dave.hansen at intel.com> wrote:
>> On 01/09/2017 08:41 AM, Daniel Vetter wrote:
>>> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen <dave.hansen at intel.com> wrote:
>>>> Well, now I found where the -2 comes from.
>>>> intel_dp_register_mst_connector() calls drm_connector_register(), which
>>>> fails to add the kobject (warning below).  But, it does zero error
>>>> checking on the drm_connector_register() call and leaves the
>>>> partially-constructed connector in place.
>>>>
>>>> The next time some poor, hapless code goes and tries to do anything with
>>>> that kdev, they oops.  I'm perplexed by this, though.  The
>>>> drm_dp_mst_topology_cbs->register_connector just returns void.  It seems
>>>> a bit goofy that it can't even _return_ failure.
>>>>
>>>> Is there some stable code to go back to here?  Or, is there something
>>>> about my configuration that's unique?  I really wonder why nobody else
>>>> is running into this.
>>>>
>>>> There's probably some other race going on here.  This warning doesn't
>>>> happen on every boot.
>>> This smells more like the root-cause: Something goes wrong on boot
>>> that prevents connectors from properly registering, then we fall over
>>> later on. And the register callback is intentionally void, assuming
>>> that any prep work has been done earlier and that therefore the
>>> register step can't fail. Can you pls check whether the oops later on
>>> only happens together with this warning at boot, or whether they're
>>> not correlated?
>>
>> Looking through my logs, I can't find any instance of the oops without
>> the warning at boot.  So I do think the later oops is entirely caused by
>> the issue warned about in early boot.
> 
> Hm, I guess then we'd need to fix that boot-up warning. Can you try to
> figure out why it's unhappy? On a hunch it could be that we call
> drm_connector_register from the mst probe worker before the main
> driver load thread has reached the drm_dev_register call. A few printk
> to decide whether that's the case (plus a few boot-up tests to gather
> the statistics, sorry about that) would be real great.
> 
> If that's inconclusive I'm again a bit low on ideas ...

I'll do that shortly.  But, for now I can confirm that the failure is
precipitated by the !parent check in sysfs_create_dir_ns().

I also can't reproduce this if I build i915 as a module.  It only
happens when built in.

> Jan  9 09:07:34 ray kernel: [    1.400547] sysfs_create_dir_ns()::53 error: -2
> Jan  9 09:07:34 ray kernel: [    1.400554] create_dir()::75 error: -2
> Jan  9 09:07:34 ray kernel: [    1.400558] ------------[ cut here ]------------
> Jan  9 09:07:34 ray kernel: [    1.400565] WARNING: CPU: 1 PID: 90 at lib/kobject.c:249 kobject_add_internal+0x273/0x320
> Jan  9 09:07:34 ray kernel: [    1.400569] kobject_add_internal failed for card0-DP-3 (error: -2 parent: card0)
> Jan  9 09:07:34 ray kernel: [    1.400572] Modules linked in:
> Jan  9 09:07:34 ray kernel: [    1.400577] CPU: 1 PID: 90 Comm: kworker/1:2 Not tainted 4.10.0-rc3-dirty #61
> Jan  9 09:07:34 ray kernel: [    1.400579] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016
> Jan  9 09:07:34 ray kernel: [    1.400585] Workqueue: events_long drm_dp_mst_link_probe_work
> Jan  9 09:07:34 ray kernel: [    1.400588] Call Trace:
> Jan  9 09:07:34 ray kernel: [    1.400593]  dump_stack+0x67/0x99
> Jan  9 09:07:34 ray kernel: [    1.400598]  __warn+0xd1/0xf0
> Jan  9 09:07:34 ray kernel: [    1.400601]  warn_slowpath_fmt+0x4f/0x60
> Jan  9 09:07:34 ray kernel: [    1.400604]  kobject_add_internal+0x273/0x320
> Jan  9 09:07:34 ray kernel: [    1.400607]  kobject_add+0x65/0xb0
> Jan  9 09:07:34 ray kernel: [    1.400611]  ? klist_init+0x31/0x40
> Jan  9 09:07:34 ray kernel: [    1.400615]  device_add+0x102/0x5d0
> Jan  9 09:07:34 ray kernel: [    1.400619]  ? kfree_const+0x22/0x30
> Jan  9 09:07:34 ray kernel: [    1.400623]  device_create_groups_vargs+0xd8/0x100
> Jan  9 09:07:34 ray kernel: [    1.400626]  device_create_with_groups+0x36/0x40
> Jan  9 09:07:34 ray kernel: [    1.400631]  ? drm_fb_helper_add_one_connector+0x57/0xd0
> Jan  9 09:07:34 ray kernel: [    1.400636]  ? kmem_cache_alloc_trace+0x1d2/0x1f0
> Jan  9 09:07:34 ray kernel: [    1.400641]  drm_sysfs_connector_add+0x60/0xe0
> Jan  9 09:07:34 ray kernel: [    1.400645]  drm_connector_register+0x21/0xc0
> Jan  9 09:07:34 ray kernel: [    1.400649]  intel_dp_register_mst_connector+0x41/0x50
> Jan  9 09:07:34 ray kernel: [    1.400653]  drm_dp_add_port+0x350/0x450
> Jan  9 09:07:34 ray kernel: [    1.400657]  ? rcu_early_boot_tests+0x1/0x10
> Jan  9 09:07:34 ray kernel: [    1.400660]  ? schedule_timeout+0x1cd/0x390
> Jan  9 09:07:34 ray kernel: [    1.400664]  ? __might_sleep+0x4a/0x90
> Jan  9 09:07:34 ray kernel: [    1.400667]  ? mutex_lock+0x25/0x50
> Jan  9 09:07:34 ray kernel: [    1.400670]  ? drm_dp_mst_wait_tx_reply+0x118/0x1e0
> Jan  9 09:07:34 ray kernel: [    1.400673]  ? prepare_to_wait_event+0x120/0x120
> Jan  9 09:07:34 ray kernel: [    1.400675] drm_sysfs_connector_add() connector: ffff88040c778000 kdev: ffff88040ef15000
> Jan  9 09:07:34 ray kernel: [    1.400681]  ? drm_dp_check_mstb_guid+0x3d/0x120
> Jan  9 09:07:34 ray kernel: [    1.400684]  drm_dp_send_link_address+0x185/0x1f0
> Jan  9 09:07:34 ray kernel: [    1.400688]  drm_dp_check_and_send_link_address+0xad/0xc0
> Jan  9 09:07:34 ray kernel: [    1.400691]  drm_dp_mst_link_probe_work+0x57/0xa0
> Jan  9 09:07:34 ray kernel: [    1.400694]  process_one_work+0x14b/0x430
> Jan  9 09:07:34 ray kernel: [    1.400697]  worker_thread+0x12b/0x4a0
> Jan  9 09:07:34 ray kernel: [    1.400700]  kthread+0x10c/0x140
> Jan  9 09:07:34 ray kernel: [    1.400703]  ? process_one_work+0x430/0x430
> Jan  9 09:07:34 ray kernel: [    1.400706]  ? kthread_create_on_node+0x40/0x40
> Jan  9 09:07:34 ray kernel: [    1.400709]  ret_from_fork+0x27/0x40
> Jan  9 09:07:34 ray kernel: [    1.400714] ---[ end trace 0009c9dc7b253d9c ]---




More information about the Intel-gfx mailing list