[Mesa-dev] GBM YUV planar support

Thu Jun 2 21:51:48 UTC 2016

On Thu, Jun 2, 2016 at 2:41 PM, Kristian Høgsberg <krh at bitplanet.net> wrote:
> On Thu, Jun 2, 2016 at 11:52 AM, Rob Herring <robh at kernel.org> wrote:
>> On Thu, Jun 2, 2016 at 1:31 PM, Rob Clark <robdclark at gmail.com> wrote:
>>> On Thu, Jun 2, 2016 at 1:51 PM, Rob Herring <robh at kernel.org> wrote:
>>>> As discussed on irc yesterday, I've been looking at adding YUV planar
>>>> (YV12) to GBM as it is a requirement for Android gralloc. It is
>>>> possible to disable the requirement as that is what the Android
>>>> emulator and android-x86 do. But that results in un-optimized s/w CSC.
>>>>
>>>> Given most/all Android targeted h/w can support YUV overlays (except
>>>> virgl?), I only need to allocate, map, and import (to GBM only)
>>>> buffers. The outputting of YUV would remain the responsibility of HWC
>>>> (also a missing feature in drm_hwcomposer), and the gpu never touches
>>>> the YUV buffers.
>>>>
>>>> With that, I see a couple of options:
>>>>
>>>> For allocation, at some level we need to translate to a single buffer
>>>> perhaps using R8 format. This could be done in gralloc, GBM, gallium
>>>> ST, or individual drivers. Also, somewhere we'd have to adjust stride
>>>> or height. I don't know what assumptions like the U or V stride is
>>>> half the Y stride are acceptable? Trying to propagate per plane stride
>>>> and offsets all the way down to the drivers looks difficult.
>>>>
>>>> Then for importing, we can translate the planes to R8/GR88 and use the
>>>> import support Stanimir is doing[1]. Again, the question is at what
>>>> level to do this: either gralloc or GBM? The complicating factor here
>>>> is I don't think we want to end up with 2 GBM BOs. So maybe GBM BOs
>>>> need to support multiple DRIImages? However, it seems that we're
>>>> creating 2 ways to import planar buffers either as a single DRIimage
>>>> with planes (as i965 does) or a DRIimage per plane.
>>>
>>> hmm, 2 gbm bo's (and 2 DRIImages) seems kind of ideal if you actually
>>> did want to use them w/ gl (as two textures, doing CSC in shader)..
>>> although I'm not sure to what extent that breaks the android world (if
>>> it was exposed in GL as two textures).
>>
>> But for allocation, you can't just allocate 2 buffers as the planes do
>> have defined ordering and offsets. So somewhere you would need to
>> create a 2nd BO and attach it to the buffer. Perhaps dupImage() is for
>> this purpose?
>
> We added dupImage for the cases where you import some resource
> (wl_buffer or EGLImage) that is also backed by a __DRIimage to a
> gbm_bo. We wanted the lilfetime of the gbm_bo to be independent of the
> imported resource lifetime.
>
>>> AFAIU 99% of the time, in practice, android puts YUV on the screen via
>>> overlay, but I'd be curious to see, for example, what the shaders used
>>> for window transitions look like.  Depending on to what extent it's
>>> possible to change android to support the R8+RG88 + frag shader that
>>> does CSC, we might end up with no choice but to add direct support for
>>> YUV..
>>>
>>>> Another option is make gralloc open both render and card nodes using
>>>> the card GBM device to just allocate dumb buffers for YUV buffers.
>>>> This complicates gralloc a bit and the answer is always don't use dumb
>>>> buffers. :) However, the assumption here is the buffers are just
>>>> scanout buffers.
>>>
>>> Note fwiw, what we were doing on linux was actually allocating from
>>> the v4l viddec device (and importing in to mesa).  In fact I remember
>>> there were some problems going in the other direction.. some specific
>>> pitch requirements for the UV plane, or something like that.  And also
>>> some really strange corruption (not sure if v4l dmabuf export is
>>> broken when the device has an iommu??  maybe it was giving the dmabuf
>>> importer device page addresses instead of physical addresses??)
>>>
>>> Possibly that is an argument for actually allocating video decode
>>> buffers from the v4l device directly?
>>
>> Yes, I'd expect that to be the case for h/w decoders. However, I'm
>> using s/w decoders ATM.
>
> I agree with Rob for HW decoders, that's how libva also works for
> Intel. It seems like the problem left to solve is how to allocate a
> gbm_bo that can be mapped and written to by a SW decoder and then
> (potentially exported to another process) and used as a texture by GL.

The gbm allocation flags are supposed to solve this, aren't they? But
yes, especially on intel where overlays are YUYV while the decoder
does some other YUV variant, that can be an issue -- which one are you
prioritizing: overlay use or performance when falling back to GL?

> My feeling is that we just want to teach gbm_bo_create() about the YUV
> formats and then allow gbm_bo_map to map separate planes (maybe
> through new flags). Whether or not to use multiple BOs or not can be a
> decision the driver makes then.

This is what we did for minigbm, we have entry points to get the bo
and the offset for each plane. You might end up with a single bo +
multiple offsets inside of it, or multiple bos with (usually) offset
== 0, but that's abstracted by the API.

Stéphane

> As long as the planes have proper
> alignments, I don't see a problem just using one BO and creating
> multiple textures from that.
>
> Kristian
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev