Remote display with 3D acceleration using Wayland/Weston

Thu Dec 15 15:55:44 UTC 2016

On 12/15/16 3:01 AM, Pekka Paalanen wrote:
> I assure you, this is a limitation of the RDP-backend itself. Nothing
> outside of Weston creates this restriction.
> 
> The current RDP-backed is written to set up and use only the Pixman
> renderer. Pixman renderer is a software renderer, and will not
> initialize EGL in the compositor. Therefore no support for hardware
> accelerated OpenGL gets advertised to clients, and clients fall back to
> software GL.
> 
> You can fix this purely by modifying libweston/compositor-rdp.c file,
> writing the support for initializing the GL-renderer. Then you get
> hardware accelerated GL support for all Wayland clients without any
> other modifications anywhere.
> 
> Why that has not been done already is because it was thought that
> having clients using hardware OpenGL while the compositor is not cannot
> be performant enough to justify the effort. Also, it pulls in the
> dependency to EGL and GL libs, which are huge. Obviously your use case
> is different and this rationale does not apply.

Like many things, it depends on the application.  GLXgears may not
perform better in a hardware-accelerated remote 3D environment vs. using
software OpenGL, but real-world applications with larger geometries
certainly will.  In a VirtualGL environment, the overhead is per-frame
rather than per-primitive, so geometric throughput is essentially as
fast as it would be in the local case (the OpenGL applications are still
using direct rendering.)  The main performance limiters are pixel
readback and transmission.  Modern GPUs have pretty fast readback--
800-1000 Mpixels/sec in the case of a mid-range Quadro, for instance, if
you use synchronous readback.  VirtualGL uses PBO readback, which is a
bit slower than synchronous readback but which uses practically zero CPU
cycles and does not block at the driver level (this is what enables many
users to share the same GPU without conflict.)  VGL also uses a frame
queueing/spoiling system to send the 3D frames from the rendering thread
into another thread for transmission and/or display, so it can be
displaying or transmitting the last frame while the application renders
the next frame.  TurboVNC (and most other X proxies that people use with
VGL) is based on libjpeg-turbo, which can compress JPEG images at
hundreds of Mpixels/sec on modern CPUs.  In total, you can pretty easily
push 60+ Megapixels/sec with perceptually lossless image quality to
clients on even a 100 Megabit network, and 20 Megapixels/sec across a 10
Megabit network (with reduced quality.)  Our biggest success stories are
large companies who have replaced their 3D workstation infrastructure
with 8 or 10 beefy servers running VirtualGL+TurboVNC with laptop
clients running the TurboVNC Viewer.  In most cases, they claim that the
perceived performance is as good as or better than their old workstations.

To put some numbers on this, our GLXspheres benchmark uses a geometry
size that is relatively small (~60,000 polygons) but still a lot more
realistic than GLXgears (which has a polygon count only in the hundreds,
if I recall correctly.)  When running on a 1920x1200 remote display
session (TurboVNC), this benchmark will perform at about 14 Hz with
llvmpipe but 43 Hz with VirtualGL.  So software OpenGL definitely does
slow things down, even with a relatively modest geometry size and in an
environment where there is a lot of per-frame overhead.

> The hardest part in adding the support to the RDP-backend is
> implementing the buffer content access efficiently. RDP requires pixel
> data in system memory so the CPU can read it, but GL-renderer has all
> pixel data in graphics memory which often cannot be directly read by
> the CPU. Accessing that pixel data requires a copy (glReadPixels), and
> there is nowadays a helper: weston_surface_copy_content(), however the
> function is not efficient and is so far meant only for debugging and
> testing.

I could probably reuse some of the VirtualGL code for this, since it
already does a good job of buffer management.

Thanks so much for all of the helpful info.  I guess I have my work cut
out for me.  :|