[Bug 105128] [CI] igt@* - dmesg-warn - *ERROR* dp aux hw did not signal timeout!

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Sep 18 20:18:55 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=105128

Matt Roper <matthew.d.roper at intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matthew.d.roper at intel.com

--- Comment #8 from Matt Roper <matthew.d.roper at intel.com> ---
The "(has irq: 1)" part was removed by commit:

    commit 8a29c778fa1a50a25a3e66cf9589888758858d24
    Author: Lucas De Marchi <lucas.demarchi at intel.com>
    Date:   Wed May 23 11:04:35 2018 -0700

        drm/i915: remove check for aux irq

since all relevant platforms have the ability to utilize AUX interrupts these
days; removing it from the bug title.

When we perform an aux transfer, we ask the hardware to give us an interrupt
when the transfer completes, and we expect the 'busy' bit of the AUX control
register to be zero at that point.  We set a timeout of 10ms for this
completion to happen and print out the error message here if the control
message still has the 'busy' bit asserted at that time --- given that the
hardware itself is programmed to timeout after 1600us we should definitely have
ended the transfer by this point, either through completion or through
timeout).

It sounds like the hardware isn't behaving as we expect here; the next question
is whether the hardware is sending us a completion interrupt, but failing to
clear the 'busy' bit, or whether it's doing neither.  Right now we use the
interrupt just as a notification to our workqueue to wake up and check the
register bit again; if we seem to be still getting the interrupts even though
the control register bit isn't updated, we could avoid waiting for the timeout
to be declared.  The bspec (page 4301) does say "AUX Transaction complete
interrupt if set OR when DDI_AUX_CTL_*[31:30] = ‘01’" --- given the emphasis on
"OR," maybe it is valid in some cases for the hardware to not clear the bit
even though it notified us via interrupt.

We should probably also update this error message to print out the value of the
control register, just so that we can see which error bits and such are set at
the time of timeout so that we'll have a better idea of what state the hardware
is really in.

Impact-wise, I don't believe this should have any impact for an end-user (the
hardware isn't behaving as we expect, but it doesn't interfere with the system
otherwise); the main impact here is for CI since this behavior will lead to
random dmesg-warn results.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190918/a1cf6ce1/attachment-0001.html>


More information about the intel-gfx-bugs mailing list