[BUG 4.17] etnaviv-gpu f1840000.gpu: recover hung GPU!
Russell King - ARM Linux
linux at armlinux.org.uk
Tue Jun 19 15:56:12 UTC 2018
On Tue, Jun 19, 2018 at 02:28:46PM +0200, Lucas Stach wrote:
> Am Dienstag, den 19.06.2018, 12:42 +0100 schrieb Russell King - ARM Linux:
> > On Tue, Jun 19, 2018 at 01:11:29PM +0200, Lucas Stach wrote:
> > > Am Dienstag, den 19.06.2018, 12:00 +0100 schrieb Russell King - ARM Linux:
> > > > No, it's not "a really big job" - it's just that the Dove GC600 is not
> > > > fast enough to complete _two_ 1080p sized GPU operations within 500ms.
> > > > The preceeding job contained two blits - one of them a non-alphablend
> > > > copy of:
> > > >
> > > > 00180000 04200780 0,24,1920,1056 -> 0,24,1920,1056
> > > >
> > > > and one an alpha blended copy of:
> > > >
> > > > 00000000 04380780 0,0,1920,1080 -> 0,0,1920,1080
> > > >
> > > > This is (iirc) something I already fixed with the addition of the
> > > > progress detection back before etnaviv was merged into the mainline
> > > > kernel.
> > >
> > > I hadn't expected it to be this slow. I see that we might need to bring
> > > back the progress detection to fix the userspace regression, but I'm
> > > not fond of this, as it might lead to really bad QoS.
> >
> > Well, the choices are that or worse overall performance through having
> > to ignore the GPU entirely.
> >
> > > I would prefer userspace tracking the size of the blits and flushing
> > > the cmdstream at an appropriate time, so we don't end up with really
> > > long running jobs, but I'm not sure if this would be acceptable to
> > > you...
> >
> > The question becomes how to split up two operations. Yes, we could
> > submit them individually, but if they're together taking in excess of
> > 500ms, then it's likely that individually, each operation will take in
> > excess of 250ms which is still a long time.
> >
> > In any case, I think we need to fix this for 4.17-stable and then try
> > to work (a) which operations are taking a long time, and (b) how to
> > solve this issue.
>
> Agreed. I'll look into bringing back the process detection for 4.17
> stable.
>
> I'm still curious why the GC600 on the Dove is that slow. With
> performance like this moving a big(ish) window on the screen must be a
> horrible user experience.
I _think_ it's down to the blend being slow on GC600 - one of the
problems of running modern "desktops" on the Dove is that with
Xorg and a compositing window manager (eg, modern metacity) then
yes, dragging windows around is very slow because of the multiple
GPU operations required - even dragging a small window results in
almost the entire screen being re-blended.
I don't think that's fair to blame on the Dove though - that's just
total inefficiency on the Xorg/compositing side to basically redraw
the _entire_ screen for small changes.
The compositing window manager brings with it other issues as well,
in particular with colour-keyed overlay and detecting whether anything
obscures the overlay. For example, if, as a memory bandwidth
optimisation, you detect that the overlay window is unobscured in
the Xvideo extension, and disable the primary plane and colourkeying,
this works fine with non-compositing managers. However, with a
compositing manager, you can end up with there being some graphics
that is blended _on top_ of the Xvideo window which is unknown to the
Xvideo backend... which results in the graphics not being displayed.
The blending also has a detrimental effect on the colourkeying when
the graphics is displayed - because of the blending, the colourkey is
no longer the expected RGB value around objects, so you get the
colourkey bleeding through around (eg) a menu.
I've now disabled compositing in metacity which makes things a whole
lot nicer (I've actually been meaning to track down the "slow window
dragging" problem for a good few months now) and solves the overlay
issue too.
> > Do we have any way to track how long each submitted job has actually
> > taken on the GPU? (Eg, by recording the times that we receive the
> > events?) It wouldn't be very accurate for small jobs, but given this
> > operation is taking so long, it would give an indication of how long
> > this operation is actually taking. etnaviv doesn't appear to have
> > any tracepoints, which would've been ideal for that. Maybe this is
> > a reason to add some? ;)
>
> See attached patch (which I apparently forgot to send out). The DRM GPU
> scheduler has some tracepoints, which might be helpful. The attached
> patch adds a drm_sched_job_run tracepoint when a job is queued in the
> hardware ring. Together with the existing drm_sched_process_job, this
> should get you an idea how long a job takes to process. Note that at
> any time up to 4 jobs are allowed in the hardware queue, so you need to
> match up the end times.
Thanks, I'll try to get some data in the next week or so.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
More information about the dri-devel
mailing list