[Mesa-dev] Performance regression on Tegra/GK20A since commit 363b53f00069

Thu Nov 20 03:41:48 PST 2014

On Thu, 20 Nov 2014 18:24:34 +0900
Alexandre Courbot <acourbot at nvidia.com> wrote:

> Hi Pekka,
> 
> On 11/19/2014 04:34 PM, Pekka Paalanen wrote:
> > On Wed, 19 Nov 2014 15:32:38 +0900
> > Alexandre Courbot <acourbot at nvidia.com> wrote:
> >
> >> Some more information: CPU usage of the EGL app (glmark2 here) is much
> >> higher when this patch is applied, which I presume is what triggers the
> >> frame skips.
> >>
> >> On 11/19/2014 03:05 PM, Alexandre Courbot wrote:
> >>> Hi guys,
> >>>
> >>> I am seeing a severe performance regression (lots frame drops when
> >>> running EGL apps in Weston) on Tegra/GK20A since the following commit:
> >>>
> >>> commit 363b53f00069af718f64cf047f19ad5681a8bf6d
> >>> Author: Marek Olšák <marek.olsak at amd.com>
> >>> Date:   Sat Nov 1 14:31:09 2014 +0100
> >>>
> >>>       egl: remove egl_gallium from the loader
> >>>
> >>> Reverting said commit on top of master brings the expected performance
> >>> back. I am not knowledgeable enough about Mesa to speculate about the
> >>> reason, but could we try to investigate why this happens and how we
> >>> could fix this?
> >
> > Hi,
> >
> > that sounds like you used to get egl_gallium as the EGL driver, and
> > after that patch you get egl_dri2. These two have separate Wayland
> > platform implementations (and probably all other platforms as well?). I
> > think that might be a lead for investigation. EGL debug environment
> > variable (EGL_LOG_LEVEL=debug) should confirm which EGL driver gets
> > loaded. You can force the EGL driver with e.g. EGL_DRIVER=egl_dri2.
> 
> You are spot on, EGL_LOG_LEVEL revealed that I was using the egl_gallium 
> driver while this patch makes Wayland applications use egl_dri2. If I 
> set EGL_DRIVER=egl_gallium things go back to the expected behavior.
> 
> >
> > Note, that there are two different EGL platforms in play: DRM/GBM for
> > Weston, and Wayland for the app. Have you confirmed the problem is in
> > the app side? That is, does Weston itself run smoothly?
> 
> Weston seems to be fine, and since setting EGL_DRIVER=egl_gallium after 
> starting it brings things back to the previous behavior I believe we can 
> consider it is not part of this problem.

Agreed.

> > You say "frame drops", how do you see that? Only on screen, or also in
> > the app performance profile? How's the average framerate for the app?
> 
> Looking at it again this is actually quite interesting. The misbehaving 
> app is glmark2, and what happens is that despite working nicely 
> otherwise, a given frame sometimes remain displayed for up to half a 
> second. Now looking at the framerates reported by glmark2, I noticed 
> that while they are capped at 60fps when using the gallium driver, the 
> numbers are much higher when using dri2 (which is nice!). Excepted when 
> these "stuck frames" happen, then the reported framerate drops 
> dramatically, indicating that the app itself is also blocked by this.

I have a hunch (wl_buffer.release not delivered in time, and client
side EGL running out of available buffers), but confirming that would
require a Wayland protocol dump up to such hickup. You could try to get
one by setting the enviroment variable WAYLAND_DEBUG=client for
glmark2. It will be a flood, especially if glmark2 succeeds in running
at uncapped framerates. The trace will come to stderr, so you want to
redirect that to file. You need to find out where in the trace the
hickup happened. The timestamps are in milliseconds. I could then take
a look (will need the whole trace).

At this point, I think it would be best to open a bug report against
Mesa, and continue there. Such freezes obviously should not happen on
either EGL driver. Please add me to CC on the bug.

> Interestingly, if I run weston-simple-egl with dri2, the framerate is 
> again capped at 60fps, so this may be something specific to glmark2.

weston-simple-egl does not even try to exceed the monitor framerate. I'd
expect glmark2 OTOH to set eglSwapInterval to 0 (unlimited), which
means it should be limited only by Wayland roundtrips, when it replaces
wl_surface's buffer and eventually needs to wait for one of the buffers
to come back for re-use, so that it can continue rendering.

> Also, and I cannot explain why, but if there is other activity happening 
> in Weston (e.g. another egl application, even another instance of 
> glmark2 itself), the issue seems to not manifest itself.

That's probably even more proof that my hunch might be right. But it
would also mean the bug is in Weston rather than Mesa. Or in the
combination of the two, plus a race. We can easily move the bug
report from Mesa to Weston if that's true, anyway.

> Just for my education, is the egl_gallium driver going to be removed? 

That was recently discussed in the thread
http://lists.freedesktop.org/archives/mesa-dev/2014-November/thread.html
I'm not sure.

> What are egl_gallium and egl_dri2 doing differently?

I don't know exactly, but egl_gallium is a Gallium3D state tracker and
can support only Gallium3D drivers, while egl_dri2 can load all kinds
of Mesa hardware drivers, AFAIU. I'm fairly sure the differences are in
the internal interfaces.

Thanks,
pq