[Intel-gfx] [RFC 0/3] drm/i915: expose fifo pipe underrun counts

Wed Jan 27 00:29:31 PST 2016

On Tue, Jan 26, 2016 at 01:05:28PM -0800, Joe Konno wrote:
> On 01/26/2016 12:51 PM, Daniel Vetter wrote:
> > On Tue, Jan 26, 2016 at 11:36:54AM -0800, Joe Konno wrote:
> >> From: Joe Konno <joe.konno at intel.com>
> >>
> >> In tracking down a watermark bug, I discovered the pch and cpu underrun
> >> interrupt handlers would disable themselves after initial reports to prevent an
> >> interrupt/dmesg storm. Storms are bad, but underrun interrupt handling should
> >> not cease. For my case, I need to be able to count pch and cpu underruns for
> >> each pipe or transcoder. Displaying this information in the 'i915_display_info'
> >> node seemed the best course of action.
> >>
> >> In order to do this, however, I had to revisit some long-standing behaviors in
> >> the underrun interrupt handlers. One problem became three. Thanks in advance
> >> for your review and feedback.
> >>
> >> Requesting comment on the following solutions I came up with (corresponding to
> >> each patch in the series):
> >>
> >>   1. provide simple 'getter' mechanisms for pch and cpu underrun reporting
> >>      ("is it enabled?")-- and base dmesg output on the answer to that question;
> >>
> >>   2. don't allow the interrupt handlers to disable or filter themselves (and
> >>      prevent accurate counting); and finally
> >>
> >>   3. atomically-incremented pch and cpu underrun counters, with those counters
> >>      displayed in debugfs i915_display_info per-pipe, per-transcoder
> >>
> >> For: https://bugs.freedesktop.org/show_bug.cgi?id=93865
> > 
> > It's more complicated than this, replied with the technicalities to patch
> > 2. But what I've forgotten to ask: What do you want to use this for? We
> > make sure that after a full modeset underrun reporting state is restored,
> > so you can retest for a given bug essentially forever, with no need to
> > reboot.
> 
> I see a correlation between pipe underruns and display flicker-- that's
> the particular case I'm working presently, so an underrun counter (in
> whatever form) is extremely useful. Assuming, of course, I'm barking up
> the right tree.
> 
> Such a counter seems a reliable metric for expressing "how bad"
> insert-display-issue-here is. Or, "how much better/worse" said issue is
> after hacking or applying patch(es).

Yeah, display underruns usually cause flicker (sometimes just so small you
can't see it any more), in bad cases also complete loss of link sync and
resulting bad screen. Any kind of fifo underrun is a bug really, and CI
treats them as such.

Imo the right approach here would be to periodically re-enable underruns.
The problem with that is synchronizing with modesets - enabling/disabling
the display pipe is known/expected to cause underruns, so that re-enable
needs to sync with modesets. And with atomic those can be async, which
will make this all a pain. But I think even that should be opt-in to avoid
spamming dmesg too badly.

Another option would be to add a debugfs interface to re-enable fifo
reporting for every pipe. Would that work for your use-case of
experimenting around with watermark issues?

Or do you really want to have a measure for how badly you're underrunning
(continuously, or only sporadically), which your patch here gives? If
once per vblank isn't good enough, or re-arming isn't good enough then
we'd indeed need something like your patch series does, but it must be
opt-in somehow. Also we need to make sure that we don't accidentally
ratelimit underruns right after modesets (where they tend to happen most
often). Doing that means tests fail less often, which causes lots of
trouble with unstable tests in CI.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch