External DMA to gpu

Thu Oct 4 08:52:03 UTC 2018

On Thu, 4 Oct 2018 10:15:52 +0200
Dirk Eibach <dirk.eibach at googlemail.com> wrote:

> Hi Pekka,
> 
> > > > I suppose that means you still do a copy from the gbm_bo/dmabuf into a
> > > > window surface? If you used zwp_linux_dmabuf manually from your Wayland
> > > > client, you could avoid even that copy. It has the same caveat as below
> > > > though.  
> > >
> > > I don't think so. The grabber does direct DMA to the VRAM, making the
> > > texture should be zero copy. Or am I missing something?  
> >
> > Below you say you use glEGLImageTargetTexture2DOES(). That gets you a
> > GL texture. To actually get that GL texture on screen, you have to do a
> > GL drawing command to copy the pixels into an EGLSurface created from a
> > wl_surface. That's the copy I'm referring to and which would be
> > avoidable if you don't have to e.g. convert the color format in the app.
> >
> > Or are you using some other tricks?
> >
> > Once the pixels are on a wl_surface, the compositor will do one more
> > copy to get those into a framebuffer, unless the requirements for
> > scanning out directly from the client buffer are met. But I would guess
> > it is more important to optimize the grabber-to-VRAM path than the
> > wl_surface-to-scanout path which is likely just VRAM-to-VRAM so pretty
> > good already.  
> 
> If I have to use a shader for colorspace conversion I cannot use this
> approach, right?

Sorry, which "this"?

If you use a shader in your app, you are making a copy, and you need to
use glEGLImageTargetTexture2DOES() to make the grabbed buffer available
to the shader. So yes, in that case you don't use zwp_linux_dmabuf
directly but you rely on EGL instead to send your final image to the
compositor.

If instead you're asking about what the compositor does, then the
scanout-ability on the client side is determined how the EGL
implementation chooses to allocate the buffer it will send to the
compositor. Often it is scanout-able, so you don't lose that
opportunity.

> > > > > My only problem left is that glEGLImageTargetTexture2DOES() does only
> > > > > accept ARGB8888 and not RGB888, which means I have to waste a lot of
> > > > > PCIe bandwidth. Any ideas how to get around this? Or what would be a
> > > > > more appropriate place to post this question?  
> > > >
> > > > Yeah, I suppose support for true 24-bit-storage formats is rare
> > > > nowadays.
> > > >
> > > > The format list advertised via zwp_linux_dmabuf, visible via e.g.
> > > > weston-info, can tell you what you could use directly. After all, a
> > > > Wayland compositor does the same EGLImage import as you do in the
> > > > simple case.
> > > >
> > > > You could probably use the GPU to convert from 24-bit to 32-bit format
> > > > though, by importing the image as R8 format instead of RGB888 and
> > > > pretend the width is 3x. Then you could use a fragment shader to sample
> > > > the real R, G and B separately and write out a 32-bit format image for
> > > > display.  
> > >
> > > Is there any example code for a gl noob? I already did some research but
> > > didn't find anything useful.  
> >
> > Nothing much come to mind. Weston uses similar tricks to convert YUV
> > data to RGB by lying to EGL and GL that the incoming buffer is R8 or
> > RG88 and using a fragment shader to compute the proper RGB values. It
> > is really just about lying to EGL when you import the dmabuf: instead of
> > the actual pixel format, you use R8 and adjust the width/height/stride
> > to match so that you can sample each byte correctly. Then in the
> > fragment shader, you compute the correct texture coordinates to read
> > each of R, G and B values for an output pixel and then combine those
> > into an output color.
> >
> > Reading YUV is more tricky than reading 24-bit RGB, because YUV is
> > usually arranged in multiple planes, some of which are sub-sampled,
> > e.g. half resolution.  
> 
> Thanks, that was very helpful, as always. This is what we came up
> with, and it works nicely:
>  float x_int = floor(3840.0 * vTexCoord.x) * 3.0;\n"
>  float r = texture2D(uTexture, vec2((x_int + 0.0) / (3840.0 * 3.0),
> vTexCoord.y)).r;
>  float g = texture2D(uTexture, vec2((x_int + 1.0) / (3840.0 * 3.0),
> vTexCoord.y)).r;
>  float b = texture2D(uTexture, vec2((x_int + 2.0) / (3840.0 * 3.0),
> vTexCoord.y)).r;
>  gl_FragColor = vec4(r, g, b, 1.0);            //  add alpha component
> 
> We have to pass the horizontal resolution to the shader, I suppose
> there is no way around this, right?

Correct.

I wonder if if you should add 0.5 to x_int to hit the middle of the
texel, just to be sure NEAREST interpolation gives you the right texel.
I'd have to draw it on paper to see if that formula is exactly right,
so can't say off-hand.

> I was afraid that the unaligned access in the shader would have some
> performace  penalty. But in fact performace is better than the 32-bit
> version. Thumbs up!

Nice!

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20181004/81fc45a6/attachment.sig>