[PATCH v2] fbcon: Do not takeover the console from atomic context
Hans de Goede
hdegoede at redhat.com
Fri Aug 10 09:32:28 UTC 2018
HI,
On 10-08-18 10:50, Daniel Vetter wrote:
> On Fri, Aug 10, 2018 at 10:42 AM, Hans de Goede <hdegoede at redhat.com> wrote:
>> Hi,
>>
>>
>> On 09-08-18 15:29, Daniel Vetter wrote:
>>>
>>> On Thu, Aug 9, 2018 at 1:42 PM, Hans de Goede <hdegoede at redhat.com> wrote:
>>>>
>>>> Taking over the console involves allocating mem with GFP_KERNEL, talking
>>>> to drm drivers, etc. So this should not be done from an atomic context.
>>>>
>>>> But the console-output trigger deferred console takeover may happen from
>>>> an
>>>> atomic context, which leads to "BUG: sleeping function called from
>>>> invalid
>>>> context" errors.
>>>>
>>>> This commit fixes these errors by doing the deferred takeover from a
>>>> workqueue when the notifier runs from an atomic context.
>>>>
>>>> Note this uses in_atomic, as checkpatch points out this should not be
>>>> done from driver code. But the console subsys is not really normal driver
>>>> code, specifically it plays some tricks where it disables some locking
>>>> when logging an oops, or when logging a lockdep bug when lockdep
>>>> debugging
>>>> is turned on, so we need to make an exception here.
>>>>
>>>> Signed-off-by: Hans de Goede <hdegoede at redhat.com>
>>>> ---
>>>> Changes in v2:
>>>> -Add a comment fbcon.c and the commit message about why we need to use
>>>> in_atomic here, no functional changes
>>>> ---
>>>> drivers/video/fbdev/core/fbcon.c | 29 +++++++++++++++++++++++++++--
>>>> 1 file changed, 27 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/video/fbdev/core/fbcon.c
>>>> b/drivers/video/fbdev/core/fbcon.c
>>>> index ef8b2d0b7071..31f518f8dde7 100644
>>>> --- a/drivers/video/fbdev/core/fbcon.c
>>>> +++ b/drivers/video/fbdev/core/fbcon.c
>>>> @@ -3592,7 +3592,20 @@ static int fbcon_init_device(void)
>>>> }
>>>>
>>>> #ifdef CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER
>>>> +static void fbcon_register_existing_fbs(struct work_struct *work)
>>>> +{
>>>> + int i;
>>>> +
>>>> + console_lock();
>>>> +
>>>> + for_each_registered_fb(i)
>>>> + fbcon_fb_registered(registered_fb[i]);
>>>> +
>>>> + console_unlock();
>>>> +}
>>>> +
>>>> static struct notifier_block fbcon_output_nb;
>>>> +static DECLARE_WORK(fbcon_deferred_takeover_work,
>>>> fbcon_register_existing_fbs);
>>>>
>>>> static int fbcon_output_notifier(struct notifier_block *nb,
>>>> unsigned long action, void *data)
>>>> @@ -3607,8 +3620,20 @@ static int fbcon_output_notifier(struct
>>>> notifier_block *nb,
>>>> deferred_takeover = false;
>>>> logo_shown = FBCON_LOGO_DONTSHOW;
>>>>
>>>> - for_each_registered_fb(i)
>>>> - fbcon_fb_registered(registered_fb[i]);
>>>> + /*
>>>> + * Normally all console output happens in a sleeping context, but
>>>> + * during oopses the kernel goes into a special mode where the
>>>> + * console code may not sleep. We check for this using in_atomic
>>>> + * as the note about in_atomic in preempt.h mentions, in_atomic
>>>> + * does not detect a spinlock being held when non-preemptible, so
>>>> + * we also check for irqs_disabled which covers this case.
>>>> + */
>>>
>>>
>>> If this is only for oopses, then opps_in_progress is what you're
>>> looking for.
>>
>>
>> Unfortunately that is not good enough as mentioned in the v2 commit
>> message, console output may also happen in atomic context when the
>> lockdep code has found an issue (from print_circular_bug).
>>
>> I've tried replacing:
>>
>> if (in_atomic() || irqs_disabled()) {
>>
>> With:
>>
>> if (oops_in_progress) {
>
> You can't use in_atomic() either, because it can't detect atomic
> regions on non-preemptible kernels. Unconditionally punting to a
> worker is the only solution here I think.
>
> See e.g the entire history around how we call the ->dirty callback in
> the drm fbdev emulation. The only generic approach that actually works
> is drm_fb_helper_dirty unconditionally offloading everything to a
> worker.
Ok, I don't see any downsides to doing the takeover in a worker
unconditionally, so I will prepare a v3 doing this.
Regards,
Hans
> -Daniel
>
>> On a system where I know the lockdep code will report a problem
>> wrt the bttv driver and here is what happened:
>>
>> [ 7.937690] fbcon: Taking over console
>> [ 7.937691] BUG: sleeping function called from invalid context at
>> mm/slab.h:421
>> [ 7.937691] in_atomic(): 1, irqs_disabled(): 1, pid: 561, name:
>> systemd-udevd
>> [ 7.937692] INFO: lockdep is turned off.
>> [ 7.937692] irq event stamp: 196513
>> [ 7.937692] hardirqs last enabled at (196513): [<ffffffffbba4314b>]
>> _raw_spin_unlock_irqrestore+0x4b/0x60
>> [ 7.937692] hardirqs last disabled at (196512): [<ffffffffbba43962>]
>> _raw_spin_lock_irqsave+0x22/0x81
>> [ 7.937693] softirqs last enabled at (196504): [<ffffffffbbe0038c>]
>> __do_softirq+0x38c/0x4f7
>> [ 7.937693] softirqs last disabled at (196401): [<ffffffffbb0c19be>]
>> irq_exit+0x10e/0x120
>> [ 7.937694] CPU: 1 PID: 561 Comm: systemd-udevd Not tainted
>> 4.18.0-0.rc8.git1.1.hdg1.fc29.x86_64 #1
>> [ 7.937694] Hardware name: To Be Filled By O.E.M. To Be Filled By
>> O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
>> [ 7.937694] Call Trace:
>> [ 7.937694] dump_stack+0x85/0xc0
>> [ 7.937695] ___might_sleep.cold.72+0xac/0xbc
>> [ 7.937695] kmem_cache_alloc_trace+0x202/0x2f0
>> [ 7.937695] ? fbcon_startup+0xae/0x300
>> [ 7.937695] fbcon_startup+0xae/0x300
>> [ 7.937696] do_take_over_console+0x6d/0x180
>> [ 7.937696] do_fbcon_takeover+0x58/0xb0
>> [ 7.937696] fbcon_output_notifier.cold.35+0x18/0x44
>> [ 7.937697] notifier_call_chain+0x39/0x90
>> [ 7.937697] vt_console_print+0x363/0x420
>> [ 7.937697] console_unlock+0x422/0x610
>> [ 7.937697] vprintk_emit+0x268/0x540
>> [ 7.937698] ? kernel_text_address+0xe5/0xf0
>> [ 7.937698] printk+0x58/0x6f
>> [ 7.937698] print_circular_bug_header.cold.56+0x17/0x9c
>> [ 7.937698] print_circular_bug.isra.38+0x7c/0xb0
>> [ 7.937699] __lock_acquire+0x139a/0x16c0
>> [ 7.937699] lock_acquire+0x9e/0x1b0
>> [ 7.937699] ? find_ref_lock+0x21/0x40 [videodev]
>> [ 7.937699] ? find_ref_lock+0x21/0x40 [videodev]
>> [ 7.937700] __mutex_lock+0x88/0xa10
>> [ 7.937700] ? find_ref_lock+0x21/0x40 [videodev]
>> [ 7.937700] ? bttv_gpio_bits+0x22/0x60 [bttv]
>> [ 7.937700] ? native_sched_clock+0x3e/0xa0
>> [ 7.937701] ? mark_held_locks+0x57/0x80
>> [ 7.937701] ? _raw_spin_unlock_irqrestore+0x4b/0x60
>> [ 7.937701] ? find_ref_lock+0x21/0x40 [videodev]
>> [ 7.937701] find_ref_lock+0x21/0x40 [videodev]
>> [ 7.937702] v4l2_ctrl_find+0xa/0x20 [videodev]
>> [ 7.937702] audio_mute+0x38/0x120 [bttv]
>> [ 7.937702] bttv_s_ctrl+0x2d5/0x3b0 [bttv]
>> [ 7.937702] __v4l2_ctrl_handler_setup+0xdf/0x130 [videodev]
>> [ 7.937703] v4l2_ctrl_handler_setup+0x27/0x40 [videodev]
>> [ 7.937703] bttv_probe.cold.45+0x8f4/0xca7 [bttv]
>> [ 7.937703] local_pci_probe+0x41/0x90
>> [ 7.937704] pci_device_probe+0x118/0x1a0
>> [ 7.937704] driver_probe_device+0x2da/0x450
>> [ 7.937704] __driver_attach+0xe1/0x110
>> [ 7.937704] ? driver_probe_device+0x450/0x450
>> [ 7.937705] ? driver_probe_device+0x450/0x450
>> [ 7.937705] bus_for_each_dev+0x79/0xc0
>> [ 7.937705] ? deferred_probe_initcall+0x30/0x30
>> [ 7.937705] bus_add_driver+0x155/0x230
>> [ 7.937706] ? 0xffffffffc07a4000
>> [ 7.937706] driver_register+0x6b/0xb0
>> [ 7.937706] ? 0xffffffffc07a4000
>> [ 7.937706] bttv_init_module+0xcd/0xe3 [bttv]
>> [ 7.937706] do_one_initcall+0x5d/0x362
>> [ 7.937707] ? rcu_read_lock_sched_held+0x6b/0x80
>> [ 7.937707] ? kmem_cache_alloc_trace+0x293/0x2f0
>> [ 7.937707] ? do_init_module+0x22/0x210
>> [ 7.937707] do_init_module+0x5a/0x210
>> [ 7.937708] load_module+0x21e6/0x2610
>> [ 7.937708] ? ima_post_read_file+0x10c/0x119
>> [ 7.937708] ? __do_sys_finit_module+0xad/0x110
>> [ 7.937708] __do_sys_finit_module+0xad/0x110
>> [ 7.937709] do_syscall_64+0x60/0x1f0
>> [ 7.937709] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Follow by a printing of the actual lockdep issue.
>>
>> This is what I wrote this patch for in the first place
>> and checking for oops_in_progress does not fix this case.
>>
>> So I believe we need to keep using the:
>>
>> if (in_atomic() || irqs_disabled()) {
>>
>> Check, Bartlomiej can you apply v2 please, or let me know if
>> you want any other changes?
>>
>>> At least that's what we've switched to in drm_fb_helper.c
>>> instead of a cargo-culted pile of in_atimc/irq/kgdb checks. And since
>>> the box will die right afterwards, you might as well just bail out
>>> directly and not bother with scheduling the worker?
>>
>>
>> Well with a lockdep bug the box won't die, depending on the actual
>> locking issue it may be quite harmless.
>>
>> Regards,
>>
>> Hans
>>
>>
>>
>>
>>> Again, that's what
>>> we've been doing in the drm fbdev emulation code.
>>> -Daniel
>>>
>>>> + if (in_atomic() || irqs_disabled()) {
>>>> + schedule_work(&fbcon_deferred_takeover_work);
>>>> + } else {
>>>> + for_each_registered_fb(i)
>>>> + fbcon_fb_registered(registered_fb[i]);
>>>> + }
>>>>
>>>> return NOTIFY_OK;
>>>> }
>>>> --
>>>> 2.18.0
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>>>
>>>
>>
>
>
>
More information about the dri-devel
mailing list