[Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland

Thu Nov 7 08:13:35 PST 2013

These patches enable using DRI_PRIME to use a different card
than the compositor card (with render-nodes).

At the time of writing, Mesa Wayland egl backend doesn't
support render-nodes, because it uses the dri2 backend, which
require using GEM names (render-nodes aren't allowed to use GEM
names). But I'm confident this week or next week, the __DRIimage
remplacement will be ready, thanks to Keith Packard, Kristian Hosberg
and Christopher James Halse Rogers.
That's why I'm publishing these patches now, so they have the time
to be reviewed.

Initially, I wanted to use driconf too, as a complement of DRI_PRIME,
but driconf doesn't support string parameters yet, so it'll come later.

To choose a specific device, the user has to specify the id_path_tag of
the device he wants to use. We get the id_path_tag with udev. Systemd
didn't fill this field for render-nodes, so it has to be set as an additional
rule. David Herrmann has sent a patch for that for Systemd, but I don't know if
it is already pushed.

The choice to use id_path_tag comes to the fact that the id_path is stable,
and that it describes non-pci graphic devices too (usb devices, etc). 

An alternative to choose the device to use is to set DRI_PRIME to "1",
which means "choose any other card than the one used by the compositor".

If Mesa doesn't find the device asked by the user, it will use the same
card than the Wayland compositor.

The Wayland Prime support implemented with these patches is different
from X Prime support.

A client using an other card than the compositor will allocate buffers
with no-tiling to render to, and share them with the compositor, unlike
on X, where it would render to a tiled buffer, not shared with the other card,
and a copy mechanism will make the main card receive an untiled buffer.

That means that these (Wayland) clients will perform slowly, compared to
if they weren't using Prime.
In fact it is not how the user is supposed to run a game, for example,
on its dedicated card.

Using a shared, untiled-buffer, but avoiding any copy, is better for 
application which wouldn't do much rendering.

An example of such an application is an embedded Wayland compositor.

To use an heavy application, the user is supposed to launch an
embedded Wayland compositor on the dedicated card, and run the game
inside. The compositor will render into the shared, untiled buffer,
and will copy the content of the game buffers.

Note that the game know it is using the same cards than its compositor,
that's why it enables tiling.

I'm planning to write a Weston shell, designed to run embedded fullscreen games,
that would make Weston resize to the game size, and close when it closes.

Pros:
.If you launch a fullscreen Wayland compositor on the dedicated card,
inside a compositor supporting composite bypass, you'll render the whole
desktop on the dedicated card. The integrated card would only display
the buffer generated, without doing any copy.
.More flexibility

Cons: 
.The user has to use a script to launch a game on the dedicated card.

Pros over X dri2 Prime support:
.Vsync works, whatever the cards used by the client 
.You can understand easily how prime support works

As a last note, this Prime support suffers too from the
lack of dma-buf fences (glitches when the client is still writing
on the buffer when the compositor's card reads it).
Using an embedded compositor suppress all the glitches when
it doesn't take (1/refresh_rate) seconds for it to render a frame,
that is when you don't have an input lag.

Axel Davy (3):
  Move the code to open the graphic device.     Support for
    render-nodes.
  Create untiled buffers in get_back_bo when needed.
  Implement choosing the device to use with DRI_PRIME

 src/egl/drivers/dri2/egl_dri2.h         |   1 +
 src/egl/drivers/dri2/platform_wayland.c | 262 +++++++++++++++++++++++++++-----
 2 files changed, 226 insertions(+), 37 deletions(-)

-- 
1.8.1.2