[PATCH 1/2] glamor: Add GLAMOR_ACCESS_WO

Michel Dänzer michel at daenzer.net
Wed Aug 24 03:27:30 UTC 2016

Adding Marek and Nicolai, maybe they have some feedback from a GL
(driver) perspective.

On 23/08/16 10:41 AM, Dave Airlie wrote:
> From: Michel Dänzer <michel.daenzer at amd.com>
> [airlied: rebased onto master -
> I left WO alone as it's more like the GL interface
> review suggested changing it to bits.]

After realizing that patch 2 can only affect the !ZPixmap case, I tested
this series with

 x11perf -putimagexy{10,100,500} -shmputxy{10,100,500}

and to my surprise, all of the numbers went down by around an order of
magnitude (using radeonsi on Kaveri). I investigated a little bit what's
going on:

> @@ -86,7 +86,10 @@ glamor_prep_pixmap_box(PixmapPtr pixmap, glamor_access_t access, BoxPtr box)
>              if (priv->pbo == 0)
>                  glGenBuffers(1, &priv->pbo);
> -            gl_usage = GL_STREAM_READ;
> +            if (access == GLAMOR_ACCESS_WO)
> +                gl_usage = GL_STREAM_DRAW;
> +            else
> +                gl_usage = GL_STREAM_READ;
>              glBindBuffer(GL_PIXEL_PACK_BUFFER, priv->pbo);
>              glBufferData(GL_PIXEL_PACK_BUFFER,

This change results in write-combining for the PBO CPU mapping.
Apparently, fbPutXYImage ends up either reading from the PBO, or at
least writing to it in a WC-unfriendly manner, causing a big slowdown.

Reverting this hunk makes performance match or exceed the level before
this series, except for -putimagexy10 remaining at ~50%. Another issue
there is the PBO allocation overhead[0]. Changing glamor_prep_pixmap_box
to use the non-PBO path for GLAMOR_ACCESS_WO[1] makes all numbers match
or exceed (by several times for the 10x10 tests) those before this series.

[0] The PBO must be large enough to hold the destination pixmap, which
is the screen pixmap in the non-composited case, so it's several MBs.
This probably blows up the userspace BO cache, and TTM clears all pages
of the BOs allocated from the kernel, even though most of them end up
completely unused.

[1] A better heuristic for not using a PBO might be based on the
relative size (and maybe also the absolute sizes) of the box and the pixmap.

Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

More information about the xorg-devel mailing list