Unix Device Memory Allocation project

Mon Jan 16 22:54:14 UTC 2017

Thanks for all the feedback. Things are much clearer now.

Yeah, we can use the BO modifiers for simple 2D images / planes if
that's the general direction. I think we can even stuff the
compression data buffer offset into those 64 bits, considering it's
not very large (e.g. below 4GB and low bits are unused due to
alignment).

For OpenCL at least, we have to keep using the 256-bytes-large per-BO
metadata to describe more complex allocations.

Marek

On Wed, Jan 4, 2017 at 5:59 PM, Daniel Stone <daniel at fooishbar.org> wrote:
> Hi Christian,
>
> On 4 January 2017 at 16:02, Christian König <deathsimple at vodafone.de> wrote:
>> Am 04.01.2017 um 16:47 schrieb Rob Clark:
>>> If the position of the different parts of the buffer are somewhere
>>> required to be a function of w/h/bpp/etc then I'm not sure if there is
>>> a strong advantage to treating them as separate BOs.. although I
>>> suppose it doesn't preclude it either.  As far as plumbing it through
>>> mesa/st, it seems convenient to have a single buffer.  (We have kind
>>> of a hack to deal w/ multi-planar yuv, but I'd rather not propagate
>>> that.. but I've not thought through those details so much yet.)
>>
>> Well I don't want to ruin your day, but there are different requirements
>> from different hardware.
>>
>> For example the UVD engine found in all AMD graphics cards since r600 must
>> have both planes in a single BO because the memory controller can only
>> handle a rather small offset between the planes.
>
> This is, to a large extent, also true of Intel.
>
>> On the other hand I know of embedded MPEG2/H264 decoders where the different
>> planes must be on different memory channels. In this case I can imagine
>> you want one BO for each plane, because otherwise the device must stitch
>> together one buffer object from two different memory regions (of course
>> possible, but rather ugly).
>
> Not just embedded, but quite a few platforms where the ratio of
> required to available memory bandwidth is ... somewhat different to
> larger discrete systems. Striping allocations such that luma and
> chroma live on different memory channels isn't uncommon.
>
> But I think this is all orthogonal. If you keep auxiliary planes in
> separate BOs to metadata, you can still handle both cases. How to
> place buffers is purely an _allocation_ concern, where single vs.
> multiple BO is purely about addressing them. So your allocator API may
> become a little more complex - something which only device-specific
> userspace will ever address - whilst keeping a unified
> addressing/handle system for the generic parts of userspace which
> shouldn't have to care about whether the underlying hardware demands a
> small offset or a completely separate allocation.
>
> Having API pegged to the single-underlying-BO concept has been a giant
> pain for those who can't use single BOs. I don't see anything good
> coming of the idea for cross-device/cross-vendor sharing either, since
> it encodes yet more magic implicit detail into buffer sharing. Since
> that detail ultimately has to be resolved _somewhere_, it's a problem
> avoided rather than a problem solved.
>
> Cheers,
> Daniel
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel