Unix Device Memory Allocation project

Tue Jan 3 23:38:32 UTC 2017

On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>> We've had per buffer metadata in Radeon since KMS, which I believe first
>>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>>>> flags between Mesa, DDX, and the kernel display code. It was a widely
>>>> accepted solution back then and Red Hat was the main developer. So yeah,
>>>> pretty much all people except Intel were collaborating on "sneaking" this
>>>> in in 2009. I think radeon driver developers deserve an apology for that
>>>> language.
>>>>
>>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way as
>>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes for use
>>>> by userspace drivers only. The kernel driver isn't supposed to read it or
>>>> parse it. The format is negotiated between userspace driver developers for
>>>> sharing of more complex allocations than 2D displayable surfaces.
>>>
>>> Metadata needed for kms (what Christian also pointed out) is what everyone
>>> did (intel included) and I think that's perfectly reasonable. And I was
>>> aware of that radeon is doing that since the dawn of ages since forever.
>>>
>>> What I think is not really ok is opaque metadata blobs that the kernel
>>> never ever inspect, but just carries around. That essentially means you're
>>> reimplementing some bad form of IPC, and I dont think that's something the
>>> drm subsystem (or dma-buf) really should be doing. Because you still have
>>> that real protocol in userspace (dri2/3, wayland, whatever), but now with
>>> a side channel with no documented ordering and synchronization. It gets
>>> the job done for single-vendor buffer metadata transport, but as soon as
>>> there's more than one vendor, or as soon as you need to reallocate buffers
>>> dynamically because the usage changes it gets bad imo (and I've seen what
>>
>> The metadata is immutable after allocation, so it's not a
>> communication channel. There is no synchronization or ordering needed
>> for immutable metadata. That implies that a shared buffer can't be
>> reused for an entirely different purpose. It can only be used as-is or
>> freed.
>>
>> For suballocated memory, the idea is to reallocate it as a separate
>> buffer on the first "handle" export, so that shared suballocated
>> buffers don't exist.
>
> Yeah, once it becomes mutable the fun starts imo. I didn't realize
> that you're treating it strictly immutable since at least the kernel
> ioctl has both set and get (and that's the thing I looked at).
> Immutable stuff shouldn't be any problem (except that of course it
> won't work cross-driver in any fashion)
>
>>> that looks like on android in various forms). And that consensus (at least
>>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
>>> meeting we've had over 5 years ago. Not sure we're gaining anything with a
>>> "who's older" competition.
>>>
>>> Anyways it's there and it's uabi so will never disappear. Just wanted to
>>> make sure it's clear that for dma-buf we've discussed this years ago, and
>>> decided it wasn't a great idea. And I think that's still correct.
>>
>> The arguments against blob metadata sound reasonable to me. I'm pretty
>> sceptic that window system protocols will make driver-specific
>> metadata blobs redundant anytime soon though. It seems the protocols
>> don't get much attention nowadays and there is no incentive to do
>> things differently in that area. At least that's how it appears to me,
>> but I'm not involved in that.
>
> Folks are working on protocols again, at least I think the plan is to
> make all that shared buffer allocation dance also work over
> compositor/client situation (would be a bit pointless without that).
> And agreed there'll always be driver-specific stuff which is opaque to
> everyone else, but I hope at least in the future that all gets
> shuffled around through protocol extensions. And not in the way every
> Android gfx stack seems to work, where everyone has their own
> vendor-private ipc-over-dma-buf thing. Wayland definitely got this
> right, both protocol versioning and being able to add any kind of
> new/vendor-private protocol endpoints to any wayland protocol. X is a
> lot more pain, but since it finally looks like the world is switching
> away from it we might get away with  a simpler protocol there. At
> least all the tricky reallocation dances seem to matter a lot more on
> mobile/tablets/phones, and there Wayland starts to rule.

I've been thinking about it, and it looks like we're gonna continue
using immutable per-BO metadata (buffer layout, tiling description,
compression flags). The reasons are that everything else is less
economical, and the current "modifier" work done in EGL/GBM is
insufficient for our hardware - we need approx. 96 bytes of metadata
for proper buffer sharing (not just for display, but also 3D interop -
MSAA, mipmapping, compression), while EGL modifiers only support 8
bytes of metadata. However, that doesn't matter, because:

These are the components that need to work with the BO metadata:
- Mesa driver backend
- AMDGPU kernel driver

These are the components that should never know about the BO metadata:
- Any Mesa shared code
- EGL
- GBM
- Window system protocols
- Display servers
- DDXs

The more components you need to change when the requirements change,
the less economical the whole thing is, and the more painful the
deployment is.

Interop with other vendors would be trivial - the kernel drivers can
exchange buffer layouts, and DRM can have an interface for it.
Userspace doesn't have to know about any of that. (It also seems kinda
dangerous to use userspace as a middle man for passing the
metadata/modifiers around)

Speaking of compression for display, especially the separate
compression buffer: That should be fully contained in the main DMABUF
and described by the per-BO metadata. Some other drivers want to use a
separate DMABUF for the compression buffer - while that may sound good
in theory, it's not economical for the reason described above.

Marek