[Intel-gfx] [PATCH] drm/i915: disable VGA mem when reenabling the power well

Paulo Zanoni przanoni at gmail.com
Wed Dec 11 18:44:34 CET 2013


Hi

2013/11/29 Paulo Zanoni <przanoni at gmail.com>:
> From: Paulo Zanoni <paulo.r.zanoni at intel.com>
>
> Fixes regression introduced by:
>     commit bf51d5e2cda5d36d98e4b46ac7fca9461e512c41
>     Author: Paulo Zanoni <paulo.r.zanoni at intel.com>
>     Date:   Wed Jul 3 17:12:13 2013 -0300
>         drm/i915: switch disable_power_well default value to 1
>
> The bug I'm seeing can be reproduced with:
> - Boot your Haswell machine
> - Make sure the power well gets disabled, then enabled. You can check
>   this by seeing the messages print by hsw_set_power_well
> - Stop your display manager
> - echo 0 > /sys/class/vtconsole/vtcon1/bind
>
> I can easily reproduce this by blacklising snd_hda_intel and booting
> with eDP+HDMI.
>
> If you do this and then look at dmesg, you'll see we're printing
> infinite "Unclaimed register" messages. This is happening because
> we're stuck on an infinite loop inside console_unlock(), which is
> calling many functions from vgacon.c. And the code that's triggering
> the error messages is from vgacon_set_cursor_size().
>
> For some reason, what fixes the problem is the code that clears
> VGA_MSR_MEM_EN from VGA_MSR_WRITE. If you add some debug messages,
> you'll see that we're reading a value of 0 and writing 0 back to it. I
> really don't know why this fixes the problem, but I am sure that if
> you remove the code line that writes 0 back the bug won't stop. I
> suspect this is probably some problem with how the hardware gets
> reinitialized when we reenable the power well.

Due to even more questioning, I decided to investigate more.

After we re-enable the power well, every time we read/write the VGA
address 0x3d5 we get an "unclaimed register" interrupt (ERR_INT) and
print error messages. If we write anything to the VGA MSR register (it
doesn't really matter which value you write to bit 0), any
reads/writes to 0x3d5 _don't_ trigger the "unclaimed register" errors
anymore (even if MSR bit 0 is zero). So what happens with the current
code is that when we unbind i915 and bind vgacon, we call
console_unlock(). Function console_unlock() is responsible for
printing any messages that were supposed to be print when the console
was locked, so it calls the TTY layer, which calls the console layer,
which calls vgacon to print the messages. At this point, vgacon
eventually calls vgacon_set_cursor_size(), which touches 0x3d5, which
triggers unclaimed register interrupts. The problem is that when we
get these interrupts, we print the error messages, so we add more work
to console_unlock(), which will try to print it again, and then call
vgacon again, trigger a new interrupt, which will put more stuff to
the buffer, and then we'll be stuck at console_unlock() forever.

If you patch intel_uncore.c to not print anything when we detect
unclaimed registers, we won't get into the console_unlock() infinite
loop and the driver unbind will work just fine. We will still be
getting interrupts every time vgacon touches those registers, but we
will survive. This is a valid experiment, but IMHO it's not the real
fix: if we don't print any error messages we will still keep getting
the interrupts, and if we disable ERR_INT we won't get the interrupt
anymore, but we will also stop getting all the other error interrupts.

I talked about this problem with the HW engineer and his
recommendation is "So don't do any VGA I/O or memory access while the
power well is disabled, and make to re-program MSR after enabling the
power well and before using VGA I/O or memory accesses.".

Based on this, I could write an even-smaller patch that just touches
the MSR register to avoid the problem, also providing a nice comment
and an improved commit message. Or I could also write another
implementation that actually saves the value of MSR before we disable
the power well, then restores the correct value after we re-enable it:
but that wouldn't help vgacon too much. IMHO one of these new patches
would be the appropriate way to solve the problem, but it seems that
Daniel/Ville didn't like the previous patches, so I'm not sure they
will be willing to accept the next version either. Since so far I
haven't seen any alternative implementation/suggestion that actually
fixes the problem, and since module_reload is not working on Haswell
for months (possibly hiding other module_reload bugs), my vote is
obviously to accept the next patch because at least it fixes the
6-months-old regression. But if I don't get an "ack" on this I won't
spend my time working on this. What do you think?

Please also notice that this only fixes the case where the power well
is already enabled when we unbind. The fix for when the power well is
_disabled_ when we unbind depends on this, and IMHO we should discuss
it later.

Thanks,
Paulo


>
> Also notice that function i915_disable_vga_mem already existed and was
> removed on commit 6e1b4fdad5157bb9e88777d525704aba24389bee, I just
> brought it back. Notice that the intel_drv.h declaration was still
> there too.
>
> I know some people will probably request to deeply investigate why
> exactly vgacon is doing the wrong thing, and ask me to replace the
> "for (;;)" that exists inside console_unlock() with something else,
> but my goal here is to just fix the regression. I don't think it's
> worth spending that much time on vgacon code. And this is our
> regression anyway.
>
> Notice that this is just a partial fix to fd.o #67813. This fixes the
> case where the power well is already enabled when we unbind, not when
> it's disabled when we unbind.
>
> V2: - Rebase (first version was sent in September).
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67813
> Signed-off-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
>  drivers/gpu/drm/i915/intel_pm.c      |  2 ++
>  2 files changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 45a87d1..bf22bea 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11064,6 +11064,18 @@ void i915_redisable_vga(struct drm_device *dev)
>         }
>  }
>
> +void i915_disable_vga_mem(struct drm_device *dev)
> +{
> +       if (HAS_PCH_SPLIT(dev)) {
> +               vga_get_uninterruptible(dev->pdev, VGA_RSRC_LEGACY_IO);
> +               outb(inb(VGA_MSR_READ) & ~VGA_MSR_MEM_EN, VGA_MSR_WRITE);
> +               vga_set_legacy_decoding(dev->pdev, VGA_RSRC_LEGACY_IO |
> +                                                  VGA_RSRC_NORMAL_IO |
> +                                                  VGA_RSRC_NORMAL_MEM);
> +               vga_put(dev->pdev, VGA_RSRC_LEGACY_IO);
> +       }
> +}
> +
>  static void intel_modeset_readout_hw_state(struct drm_device *dev)
>  {
>         struct drm_i915_private *dev_priv = dev->dev_private;
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index e6d98fe..7ebe1bd 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5703,6 +5703,8 @@ static void hsw_set_power_well(struct drm_device *dev,
>                         if (wait_for((I915_READ(HSW_PWR_WELL_DRIVER) &
>                                       HSW_PWR_WELL_STATE_ENABLED), 20))
>                                 DRM_ERROR("Timeout enabling power well\n");
> +
> +                       i915_disable_vga_mem(dev);
>                 }
>
>                 if (IS_BROADWELL(dev)) {
> --
> 1.8.3.1
>



-- 
Paulo Zanoni



More information about the Intel-gfx mailing list