[Intel-gfx] The mysterious case of IRQs, failed DP aux ch transactions, and Skylake

Lyude cpaul at redhat.com
Tue Mar 1 17:14:58 UTC 2016


Hi! Daniel Vetter referred me to you since you're a hardware guy, and
also suggested I include the whole intel-gfx list on this.

So as of late I've been testing the mainline kernel on some new
production Skylake machines. While it worked perfectly on most of them,
I've been stumped with a weird issue that arose with a Lenovo ThinkPad
T560. If I place the laptop in a dock, suspend it, and then resume it,
monitors connected to the dock don't come back up on resume. It should
be noted that these docks use DP MST for all of the monitor connections
they have, so it's basically just an MST hub. As far as I can tell, it
looks like this only occurs while using the dock. Using normal MST
monitors doesn't show this issue.

After doing some investigation, I managed to find where the problem
starts. So, the main functions of concern in the driver when it comes
to resume are i915_drm_resume_early() and i915_drm_resume(). The
problem starts in the latter function, where we reenable interrupts for
the GPU by calling intel_runtime_pm_enable_interrupts(). If we go down
a little further, the exact line where the problem starts is
drivers/gpu/drm/i915/i915_irq.c:3756:

	I915_WRITE(GEN8_MASTER_IRQ, DE_MASTER_IRQ_CONTROL);
	POSTING_READ(GEN8_MASTER_IRQ);

Simple explanation: this writes to the master IRQ control register and
toggles bit 31 to on, the bit that enables/disables all interrupts on
the GPU. So this is where things get weird: if we start resuming DP MST
before doing this single register write (by calling
intel_dp_mst_resume()), everything works perfectly and the screen turns
back on. If we try resuming DP MST after this register write, all of
the DP aux transactions timeout according to the hardware:

[   23.928507] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7d40001f
[   23.938506] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7d40001f
[   23.948587] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7d40001f
[   24.006942] [drm:drm_dp_check_act_status] failed to get ACT bit 1 after 30 retries

It looks like what happens is that after doing this register write, we
become unable to successfully do any DP aux transactions for about 15-
20 msec. After that time passes, everything goes back to normal and DP
aux works fine again. In fact, if we just wait for 15 msec before
trying to resume DP MST, the monitors come on perfectly as a result.
While I'd love to just have a fix as simple as that, unfortunately we'd
like to know what's actually causing this to happen. What's strange is
that it doesn't seem like we actually get any interrupts from the GPU
during that 15-20 msec duration where DP aux stops working, since I
don't see our IRQ handler for i915 getting called at all during that
time. Daniel Vetter has suggested it might be the DMC firmware doing
aux transactions using the PSR block, resulting in the bus being busy,
but preventing the firmware for the DMC from being loaded at all
doesn't seem to make a difference.

Hopefully as a hardware guy you might be able to give us some insight
as to what's going on. If anyone notices I've missed any important
details about this, feel free to reply and mention them. Thanks ahead
of time for the help.


-- 
Cheers,
	Lyude



More information about the Intel-gfx mailing list