Putting a pixmap into a window every frame

Mon Aug 23 11:11:25 UTC 2021

On Sun, 22 Aug 2021 17:13:26 +0100 "Andrew Bainbridge" <andy at deadfrog.co.uk>
said:

There are so many "it depends" in the answer. I'll try and break it down. I
will gloss over a fee things and be a bit rough - so this below is able to be
nitpicked but it's the broad strokes of what is going on. Some of this is
confusing because X11 is from the 80's and has been added to and extended and
thus there are things that are still there and work and are used and more
modern extensions.

xeyes is an old-school X app so it just renders directly to it's window.
without any compositing this would involve xeyes sending draw commands to the
xserver. The xserver will then draw into a single front-framebuffer that
everything shares, clipping just for the region xeyes lives in. If it's
hidden/behind things the Xserver throws away the rendering commands entirely.
It is possible for X clients to know if their window is unobscured, partially
or fully obscured to avoid sending any rendering commands at all in some cases.

Now ... compositing. This has the compositor tell the Xswerver: "All drawing
that normally goes to this window - please allocate a pixmap and redirect the
drawing to go to that pixmap". The compositor can get access to this pixmap to
use as a source to render from. So there will be one pixmap (buffer of pixels).
Where this lives will entirely depend on the driver. It may live in  system
memory - maybe on the GPU. It may even move around. For most modern drivers
with GPUs that have their own video memory, this will live in actual video
memory (until you run out of it or something forces it to migrate around).
Again - this is will maybe vary from GPU to GPU and driver to driver.
Integrated GPUs have no "video memory" and share it with the system - though
here video memory would be the memory MAPPED to the GPU (e.g. may not be
cacheable on the CPU side).

Almost every single traditional 2D X app will just render to the window
directly - either with basic XFillRetangle and friends, XPutImage/XShmPutImage
(which is relevant to you - you will want to be using this to basically
blast/upload a blob of pixels to your window - XShm will only work on a local
xserver and not over a network, so yoku need to have code to detect this and
use only if it works, but it is much faster if you do use it vs. XPutImage) or
Xrender (more advanced rendering with alpha channels and ARGB etc. but if you
custom render pixels on the CPU you still need to get them to Xrender vis
X(Shm)PutImage) or perhaps the more adventurous of apps will be using OpenGL (or
really bleeding-edge - Vulkan). OGL/VK will use the X11 DRI2/DRI3 extension
protocol behind the scenes to swap buffers (tell the Xserver to show/present
some buffer that was once a backbuffer that the client allocated - these buffers
will not be pixmaps but are the same thing pretty much - a blob of pixels
accessible to the GPU).

There is an old extension called the double buffer extension (XDBE) for
rendering to a backbuffer then swap. I don't actually ever remember seeing this
used in the wild, but I did try it out many years ago and it provided no
benefits vs. X(Shm)PutImage, but this was pre-compositing. I suspect it's
pretty much ignored and no optimizations have been made due to zero usage, so
know that this exists, but ... ignore it.

Xpresent is newer. It kind of is like XDBE but allows for timestamps to show a
pixmap and is more likely to actually allow for buffer SWAPS. Now here comes
the catch. The way most WMs work, swapping actually ends up degrading to a copy
anyway. So let me get stuck into that.

Client applications tend to have their window not use CSD (client side
decorations). The Window Manager does this. The way it is almost always done is
by taking the client window (your app), then placing it as a child inside a
parent WM frame window that is larger. Your window is at an offset and the WM
draws titlebars and borders in the extra space around your window in its parent
window. To move your window the WM just moves its window and yours follows with
it. Resizing is more involved but WM resizes its window and resizes your window
and redraws the frame too. A compositor will request to redirect the WM's
parent frame window, not your client window. This means the whole frame
including titlebars is redirected to a pixmap with normal X clipping it'd do in
the basic xeyes case I first mentioned being applied within this sub-tree. Thus
your swaps of any buffers end up having to COPY the pixels from your window to
the redirected pixmap (at an offset etc.). Yes - in theory you could do the
inverse. Swap buffers then just copy the frame regions from the previous buffer
to the new one thus reducing the amount copied. I do not think this is actually
done (I could be wrong). If you have a separate compositor vs WM client then
this case here absolutely applies. I actually believe it applies to most cases.

Some WM's have both compositor and WM rolled into one. I now can only talk
about mine and it does NOT do the above. There is a parent frame window but it
is identical in size to the client. It's just used for control. The
frame/border is drawn inside the compositor itself and not with ye-olde 2D
rendering and in the parent frame (and some magic with the shape extension is
done to calculate input regions to pretend there is a frame window there but to
direct input events for that area to the WM). The compositor runs a full
scene graph and borders are just more scene graph objects drawn with
everything else (so with software rendering just like you or with OpenGL to
accelerate it all with everything being textures, triangles etc.)

This means your client window == redirected pixmap in size. it's an exact
match. This means it's a very simple:

if ((x == 0) && (y == 0) &&
    (buffer_width == pixmap_width) && (buffer_height == pixmap_height)) {
  do a buffer swap where pximap ID now points to the newly swapped buffer and
avoid a copy
} else {
  ye olde copy
}.

This has to be done on the Xserver-side. I know the Xserver drivers already
optimize this case for GL apps using DRI protocol when the window is fullscreen
to drop from a copy to a buffer exchange to cut costs, so It is a very minimal
extension of that logic to do it for a composited pixmap.

So given this - even with Xpresent, you will be doing a copy from your
X(shm)Putimage to the pixmap, THEN presenting that pixmap (maybe will be
another copy - details above). So on a best case basis you have just as many
copies as going directly to the window, at worst it may be 2x the copies.
Admittedly the copies here will probably be on-GPU as opposed to the PutImage
which will be a CPU -> GPU copy. So there still may be a copy and this still
may have tearing happen

In theory you could allocate your own DMABUFs and use DRI2 protocol - software
render into the mmaped dmabuf then show it like opengl does.

As for waiting for compositor to be ready - you can't do that. You don't know
when the compositor will consume your pixmap and updates and even if it will
consume it at all. It may choose not to update/render your window (it's hidden,
it may be dropping down to only rendering every 4th frame or something). The
best thing for you to do is either render with a fixed timer (eg at 60hz) do
that on your side, open /dev/dri/card0 and try get vblank events (use libdrm to
do this), or probably a bit better is to use the xpresent (XPresentNotifyMSC()
to request events for screen refreshes).

> Hi
> 
> I'm a reformed Windows programmer trying to understand the big-picture of how
> X11 manages frame buffers. With a typical compositing manager and, say, the
> xeyes app, how many frame buffers are there? How many are in system memory
> and how many in GPU memory? Is flipping employed?
> 
> Is this kind of thing documented anywhere?
> 
> I have a software rendering 2D library and various apps that depend on it
> (*). I'm porting it from Windows to Linux. The apps do smooth animation by
> drawing to a window-sized pixmap (bitmap in Windows speak) in system memory,
> sending it to the compositor every frame and then waiting until the
> compositor is ready for another. 
> 
> Is the Present extension the best way to do that on X11 today?
> 
> * If curious, see https://www.youtube.com/watch?v=-xVune0NEsA for an example
> app.
> 
> Thanks,
> Andy

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
Carsten Haitzler - raster at rasterman.com