[BUG 4.17] etnaviv-gpu f1840000.gpu: recover hung GPU!

Tue Jun 19 15:56:12 UTC 2018

On Tue, Jun 19, 2018 at 02:28:46PM +0200, Lucas Stach wrote:
> Am Dienstag, den 19.06.2018, 12:42 +0100 schrieb Russell King - ARM Linux:
> > On Tue, Jun 19, 2018 at 01:11:29PM +0200, Lucas Stach wrote:
> > > Am Dienstag, den 19.06.2018, 12:00 +0100 schrieb Russell King - ARM Linux:
> > > > No, it's not "a really big job" - it's just that the Dove GC600 is not
> > > > fast enough to complete _two_ 1080p sized GPU operations within 500ms.
> > > > The preceeding job contained two blits - one of them a non-alphablend
> > > > copy of:
> > > > 
> > > >                 00180000 04200780  0,24,1920,1056 -> 0,24,1920,1056
> > > > 
> > > > and one an alpha blended copy of:
> > > > 
> > > >                 00000000 04380780  0,0,1920,1080 -> 0,0,1920,1080
> > > > 
> > > > This is (iirc) something I already fixed with the addition of the
> > > > progress detection back before etnaviv was merged into the mainline
> > > > kernel.
> > > 
> > > I hadn't expected it to be this slow. I see that we might need to bring
> > > back the progress detection to fix the userspace regression, but I'm
> > > not fond of this, as it might lead to really bad QoS.
> > 
> > Well, the choices are that or worse overall performance through having
> > to ignore the GPU entirely.
> > 
> > > I would prefer userspace tracking the size of the blits and flushing
> > > the cmdstream at an appropriate time, so we don't end up with really
> > > long running jobs, but I'm not sure if this would be acceptable to
> > > you...
> > 
> > The question becomes how to split up two operations.  Yes, we could
> > submit them individually, but if they're together taking in excess of
> > 500ms, then it's likely that individually, each operation will take in
> > excess of 250ms which is still a long time.
> > 
> > In any case, I think we need to fix this for 4.17-stable and then try
> > to work (a) which operations are taking a long time, and (b) how to
> > solve this issue.
> 
> Agreed. I'll look into bringing back the process detection for 4.17
> stable.
> 
> I'm still curious why the GC600 on the Dove is that slow. With
> performance like this moving a big(ish) window on the screen must be a
> horrible user experience.

I _think_ it's down to the blend being slow on GC600 - one of the
problems of running modern "desktops" on the Dove is that with
Xorg and a compositing window manager (eg, modern metacity) then
yes, dragging windows around is very slow because of the multiple
GPU operations required - even dragging a small window results in
almost the entire screen being re-blended.

I don't think that's fair to blame on the Dove though - that's just
total inefficiency on the Xorg/compositing side to basically redraw
the _entire_ screen for small changes.

The compositing window manager brings with it other issues as well,
in particular with colour-keyed overlay and detecting whether anything
obscures the overlay.  For example, if, as a memory bandwidth
optimisation, you detect that the overlay window is unobscured in
the Xvideo extension, and disable the primary plane and colourkeying,
this works fine with non-compositing managers.  However, with a
compositing manager, you can end up with there being some graphics
that is blended _on top_ of the Xvideo window which is unknown to the
Xvideo backend... which results in the graphics not being displayed.

The blending also has a detrimental effect on the colourkeying when
the graphics is displayed - because of the blending, the colourkey is
no longer the expected RGB value around objects, so you get the
colourkey bleeding through around (eg) a menu.

I've now disabled compositing in metacity which makes things a whole
lot nicer (I've actually been meaning to track down the "slow window
dragging" problem for a good few months now) and solves the overlay
issue too.

> > Do we have any way to track how long each submitted job has actually
> > taken on the GPU?  (Eg, by recording the times that we receive the
> > events?)  It wouldn't be very accurate for small jobs, but given this
> > operation is taking so long, it would give an indication of how long
> > this operation is actually taking.  etnaviv doesn't appear to have
> > any tracepoints, which would've been ideal for that.  Maybe this is
> > a reason to add some? ;)
> 
> See attached patch (which I apparently forgot to send out). The DRM GPU
> scheduler has some tracepoints, which might be helpful. The attached
> patch adds a drm_sched_job_run tracepoint when a job is queued in the
> hardware ring. Together with the existing drm_sched_process_job, this
> should get you an idea how long a job takes to process. Note that at
> any time up to 4 jobs are allowed in the hardware queue, so you need to
> match up the end times.

Thanks, I'll try to get some data in the next week or so.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up