[Intel-gfx] Regression of v4.6-rc vs. v4.5 bisected: a98ee79317b4 "drm/i915/fbc: enable FBC by default on HSW and BDW"

Thu May 5 23:55:56 UTC 2016

Em Sex, 2016-05-06 às 00:54 +0200, Stefan Richter escreveu:
> On May 05 Zanoni, Paulo R wrote:
> > 
> > Em Qui, 2016-05-05 às 19:45 +0200, Stefan Richter escreveu:
> > > 
> > >     Oh, and in case you - the person reading this commit message
> > > - found
> > >     this commit through git bisect, please do the following:
> > >      - Check your dmesg and see if there are error messages
> > > mentioning
> > >        underruns around the time your problem started happening.
> > > 
> > > Well, I always had the followings lines in dmesg:
> > > [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* uncleared
> > > fifo underrun on pipe A
> > > [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO
> > > underrun  
> > Oh, well... I had a patch that would just disable FBC in case we
> > saw a
> > FIFO underrun, but it was rejected. Maybe this is the time to think
> > about it again? Otherwise, I can't think of much besides disabling
> > FBC
> > on HSW until all the underruns and watermarks regressions are fixed
> > forever.
> Just to be clear though, I know that these messages are emitted when
> the
> monitor is switched on, and when sddm is being shut down --- but I do
> not
> know whether there is any sort of underrun when I get the FBC related
> freeze (since I just don't get any kernel messages at that point).

The fact that underruns have occurred earlier is enough to know that
something is wrong (most probably, bad watermarks): we stop reporting
underruns once we get the first one. In addition, we already know that
FBC has the tendency to amplify apparently-harmless FIFO underruns into
black screens, and I wouldn't be surprised to learn that it could also
cause full machine lockups.

> 
> Is there a chance that a serial console would fare better than
> netconsole?  This board and another PC in its vicinity have got
> onboard
> serial ports but I don't have cables at the moment.

In the past, for some specific cases not related to FBC, I had more
luck with serial console than with netconsole. But if this is really
caused by FBC and watermarks, I don't think you'll be able to grab any
specific message at the time of the machine hang. OTOH, if something
actually shows up, it could help invalidate our current assumption of
the relationship between the problem and FBC and underruns.

> 
> > 
> > > 
> > >      - Download intel-gpu-tools, compile it, and run:
> > >        $ sudo ./tests/kms_frontbuffer_tracking --run-subtest
> > > '*fbc-*' 2>&1 | tee fbc.txt  
> > >        Then send us the fbc.txt file, especially if you get a
> > > failure.
> > >        This will really maximize your chances of getting the bug
> > > fixed
> > >        quickly.
> > > 
> > > Do you need this while FBC is enabled, or can I run it while FBC
> > > is
> > > disabled?  
> > FBC enabled. Considering your description, my hope is that maybe
> > some
> > specific subtest will be able to hang your machine, so testing this
> > again will require only running the specific subtest instead of
> > waiting
> > 18 hours.
> The kms_frontbuffer_tracking runs from which I posted output two
> hours
> ago did not trigger a lockup.
> 
> (I ran them while X11 was shut down because otherwise
> kms_frontbuffer_tracking would skip all tests with "Can't become DRM
> master, please check if no other DRM client is running.")

Yes, this is the correct way.

> 
> > 
> > > 
> > > PS:
> > > I am mentioning the following just in case that it has any
> > > relationship
> > > with the FBC related kernel freezes.  Maybe it doesn't...  There
> > > is
> > > another recent regression on this PC, but I have not yet figured
> > > out
> > > whether it was introduced by any particular kernel version.  The
> > > regression is:  When switching from X11 to text console by
> > > [Ctrl][Alt][Fx]
> > > or by shutting down sddm, I often only get a blank screen.  I
> > > suspect
> > > that this regression was introduced when I replaced kdm by sddm,
> > > but
> > > I am not sure about that.  
> > Maybe there is some relationship, since this operation involves a
> > mode
> > change. You can also try checking dmesg to see if there are
> > underruns
> > right when you do the change.
> Yes, this is accompanied by
> [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* uncleared fifo
> underrun on pipe A
> [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO
> underrun