Unix Device Memory Allocation project

Wed Jan 4 16:26:58 UTC 2017

Am 04.01.2017 um 17:16 schrieb Rob Clark:
> On Wed, Jan 4, 2017 at 11:02 AM, Christian König
> <deathsimple at vodafone.de> wrote:
>> Am 04.01.2017 um 16:47 schrieb Rob Clark:
>>> On Wed, Jan 4, 2017 at 9:54 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
>>>> On Wed, Jan 04, 2017 at 08:06:24AM -0500, Rob Clark wrote:
>>>>> On Wed, Jan 4, 2017 at 7:03 AM, Daniel Stone <daniel at fooishbar.org>
>>>>> wrote:
>>>>>>> Speaking of compression for display, especially the separate
>>>>>>> compression buffer: That should be fully contained in the main DMABUF
>>>>>>> and described by the per-BO metadata. Some other drivers want to use a
>>>>>>> separate DMABUF for the compression buffer - while that may sound good
>>>>>>> in theory, it's not economical for the reason described above.
>>>>>> 'Some other drivers want to use a separate DMABUF', or 'some other
>>>>>> hardware demands the data be separate'. Same with luma/chroma plane
>>>>>> separation. Anyway, it doesn't really matter unless you're sharing
>>>>>> render-compression formats across vendors, and AFBC is the only case
>>>>>> of that I know of currently.
>>>>>
>>>>> jfwiw, UBWC on newer snapdragons too.. seems like we can share these
>>>>> not just between gpu (render to and sample from) and display, but also
>>>>> v4l2 decoder/encoder (and maybe camera?)
>>>>>
>>>>> I *think* we probably can treat the metadata buffers as a separate
>>>>> plane.. at least we can for render target and blit src/dst, but not
>>>>> 100% sure about sampling from a UBWC buffer.. that might force us to
>>>>> have them in a single buffer.
>>>> Conceptually treating them as two planes, and everywhere requiring that
>>>> they're allocated from the same BO are orthogonal things. At least that's
>>>> our plan with intel render compression last time I understood the current
>>>> state ;-)
>>> If the position of the different parts of the buffer are somewhere
>>> required to be a function of w/h/bpp/etc then I'm not sure if there is
>>> a strong advantage to treating them as separate BOs.. although I
>>> suppose it doesn't preclude it either.  As far as plumbing it through
>>> mesa/st, it seems convenient to have a single buffer.  (We have kind
>>> of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
>>> that.. but I've not thought through those details so much yet.)
>>
>> Well I don't want to ruin your day, but there are different requirements
>> from different hardware.
>>
>> For example the UVD engine found in all AMD graphics cards since r600 must
>> have both planes in a single BO because the memory controller can only
>> handle a rather small offset between the planes.
>>
>> On the other hand I know of embedded MPEG2/H264 decoders where the different
>> planes must be on different memory channels. In this case I can imagine that
>> you want one BO for each plane, because otherwise the device must stitch
>> together one buffer object from two different memory regions (of course
>> possible, but rather ugly).
> true, but for a vendor specific compression/metadata plane, I think I
> can ignore oddball settop box SoC constraints and care more about just
> other devices that support the same compression.
>
>> So if we want to cover everything we essentially need to support all
>> variants of one plane per BO as well as all planes in one BO with DMA-Buf. A
>> bit tricky isn't it?
> Just to make sure we are on same page, I was only really talking about
> whether to have color+meta in same bo or treat it similar to two plane
> yuv (ie. pair of fd+offset tuples).  Not generic/vanilla (untiled,
> uncompressed, etc) multiplanar YUV.

Ups, sorry. I didn't realized that.

Na, putting the metadata into the BO is probably only a good idea if the 
Metadata can be evaluated by the device and not the CPU as well.

>
> It probably isn't even important that various different vendor's
> compression schemes are handled the same way.  Maybe on intel it is
> easier to treat it as two planes everywhere, but qcom easier to treat
> as one.  Application just sees it as one or more fd+offset tuples
> (when it queries EGL img) and passes those blindly through to addfb2.

Yeah, I mean that's the real core of the problem.

On the one hand we want device from different vendors to understand each 
other and there are certain cases where even completely different 
devices can work with the same data.

On the other hand each vendor has extremely specialized data formats for 
certain use cases and it is unlikely that somebody else can handle those.

> Oh, and for some extra fun, I think video decoder can hand me
> compressed NV12 where both Y and UV have their own meta buffer.  So if
> we treat as separate planes, that becomes four planes.  (Hopefully no
> compressed I420, or that becomes 6 planes! :-P)

Well talking about extra fun. We additionally have this neat interlaced 
NV12 format that both NVidia and AMD uses for their video decoding.

E.g. one Y plane top field, one UV plane top field, one Y plane bottom 
field and UV plane bottom field.

That makes 4 planes where plane 1 & 3 and 2 & 4 must have the same 
stride but are otherwise unrelated to each other and can have separate 
metadata.

Regards,
Christian.

>
> BR,
> -R
>
>> Regards,
>> Christian.
>>
>>> BR,
>>> -R
>>>
>>>> -Daniel
>>>> --
>>>> Daniel Vetter
>>>> Software Engineer, Intel Corporation
>>>> http://blog.ffwll.ch
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>