[Intel-gfx] [PATCH] drm/i915: disable VGA mem when reenabling the power well
Daniel Vetter
daniel at ffwll.ch
Wed Dec 11 20:02:13 CET 2013
On Wed, Dec 11, 2013 at 7:50 PM, Paulo Zanoni <przanoni at gmail.com> wrote:
> 2013/12/11 Daniel Vetter <daniel at ffwll.ch>:
>> On Wed, Dec 11, 2013 at 6:44 PM, Paulo Zanoni <przanoni at gmail.com> wrote:
>>>
>>> After we re-enable the power well, every time we read/write the VGA
>>> address 0x3d5 we get an "unclaimed register" interrupt (ERR_INT) and
>>> print error messages. If we write anything to the VGA MSR register (it
>>> doesn't really matter which value you write to bit 0), any
>>> reads/writes to 0x3d5 _don't_ trigger the "unclaimed register" errors
>>> anymore (even if MSR bit 0 is zero). So what happens with the current
>>> code is that when we unbind i915 and bind vgacon, we call
>>> console_unlock(). Function console_unlock() is responsible for
>>> printing any messages that were supposed to be print when the console
>>> was locked, so it calls the TTY layer, which calls the console layer,
>>> which calls vgacon to print the messages. At this point, vgacon
>>> eventually calls vgacon_set_cursor_size(), which touches 0x3d5, which
>>> triggers unclaimed register interrupts. The problem is that when we
>>> get these interrupts, we print the error messages, so we add more work
>>> to console_unlock(), which will try to print it again, and then call
>>> vgacon again, trigger a new interrupt, which will put more stuff to
>>> the buffer, and then we'll be stuck at console_unlock() forever.
>>>
>>> If you patch intel_uncore.c to not print anything when we detect
>>> unclaimed registers, we won't get into the console_unlock() infinite
>>> loop and the driver unbind will work just fine. We will still be
>>> getting interrupts every time vgacon touches those registers, but we
>>> will survive. This is a valid experiment, but IMHO it's not the real
>>> fix: if we don't print any error messages we will still keep getting
>>> the interrupts, and if we disable ERR_INT we won't get the interrupt
>>> anymore, but we will also stop getting all the other error interrupts.
>>>
>>> I talked about this problem with the HW engineer and his
>>> recommendation is "So don't do any VGA I/O or memory access while the
>>> power well is disabled, and make to re-program MSR after enabling the
>>> power well and before using VGA I/O or memory accesses.".
>>
>> Impressive piece of debugging. And a great story ;-)
>>
>>> Based on this, I could write an even-smaller patch that just touches
>>> the MSR register to avoid the problem, also providing a nice comment
>>> and an improved commit message. Or I could also write another
>>> implementation that actually saves the value of MSR before we disable
>>> the power well, then restores the correct value after we re-enable it:
>>> but that wouldn't help vgacon too much. IMHO one of these new patches
>>> would be the appropriate way to solve the problem, but it seems that
>>> Daniel/Ville didn't like the previous patches, so I'm not sure they
>>> will be willing to accept the next version either. Since so far I
>>> haven't seen any alternative implementation/suggestion that actually
>>> fixes the problem, and since module_reload is not working on Haswell
>>> for months (possibly hiding other module_reload bugs), my vote is
>>> obviously to accept the next patch because at least it fixes the
>>> 6-months-old regression. But if I don't get an "ack" on this I won't
>>> spend my time working on this. What do you think?
>>
>> I'd go with re-disabling vga unconditionally through the msr. We don't
>> bother with restoring the vga plane anyway at module unload, so vga is
>> hosed no matter what. And it sounds like this is the more minimal fix.
>> So ack for the minimal fix to just poke the MSR at the right times.
>
> Just to be clear: by "at the right times" you mean "when re-enabling
> the power well", right?
Yup.
>>> Please also notice that this only fixes the case where the power well
>>> is already enabled when we unbind. The fix for when the power well is
>>> _disabled_ when we unbind depends on this, and IMHO we should discuss
>>> it later.
>>
>> I guess I need to resurrect my "kill vgacon" patch and figure out why
>> it doesn't work.
>
> See comment #45 of the bug report for an explanation on how it fails.
> I'm not sure if it's possible to do what you want.
>
>
>> Or could we just force-enable the power well when we
>> start our module unload sequence instead?
>
> As I mentioned before, driver unbind (a requirement for module unload)
> happens before we call any of the unload functions, so we never even
> get to run i915_driver_unload.
Meh, I always forget that. And thanks to the awesome ownership model
in fbcon we can't use anything existing to pluck that power well
enabling code in. I hate this code ... So I guess preventing vgacon
from ever reloading on our systems would be the right solution then
for the case where the power well is down at console unbind time. Or
just CONFIG_FB=n ;-)
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list