Unix Device Memory Allocation project

Wed Oct 19 00:08:49 UTC 2016

Thanks for the detailed writeup, and it was good to meet you at XDC.  Below:

On 10/18/2016 04:40 PM, Marek Olšák wrote:
> Hi,
>
> The text below describes how open source AMDGPU buffer sharing works.
> I hope you'll find some useful bits in it.
>
>
> Producer = allocates a buffer (or texture), and exports its handle
> (DMABUF, etc.), and can use the buffer in various ways
>
> Consumer = imports the handle, and can use the buffer in various ways
>
>
> *** Producer-consumer interaction. ***
>
> 1) On handle export, the producer receives these flags:
>
> - READ, WRITE, READ+WRITE: Describe the expected usage in the consumer.
>   * The producer decides if it needs to disable compression based on
> those flags.
>
> - EXPLICIT_FLUSH flag: Meaning that the producer will explicitly
> receive a "flush_resource" call before the consumer starts using the
> buffer. This is a hint that the producer doesn't have to keep track of
> "when to do decompression" when sharing the buffer with the consumer.
>
>
> 2) Passing metadata (tiling, pixel ordering, format, layout) info
> between the producer and consumer:
>
> - All AMDGPU buffer/texture allocations have 256 bytes (64 dwords) of
> internal per-allocation metadata storage that lives in the kernel
> space. There are amdgpu-specific ioctls that can "set" and "get" the
> metadata. Any process that has a buffer handle can do that.
>   * The produces writes the metadata, the consumer reads it.
>
> - The producer-consumer interop API doesn't know about the metadata.
> All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
>   * There was a note during the talk that DMABUF doesn't have any
> metadata. Well, I just told you that it has, but it's private to
> amdgpu and possibly accessible to other kernel drivers too.

OK.  I believe someone pointed this out during my talk or afterwards as 
well.  Some drivers are using this method, but there seems to be some 
debate over whether this is the preferred general design.  Others have 
told me this isn't the right mechanism to store this sort of metadata, 
but I'm not familiar with the specific counter arguments.

>   * We can build upon this idea. I think the worst thing to do would
> be to add metadata handling to driver-agnostic userspace APIs. Really,
> driver-agnostic APIs shouldn't know about that, because they can't
> understand all the hw-specific information encoded in the metadata.
> Also, when you want to change the metadata format, you only have to
> update the affected drivers, not userspace APIs.

How does this kernel-side metadata interact with userspace driver 
suballocation, or application-managed suballocation in APIs such as Vulkan?

Thanks,
-James

> 3) Internal AMDGPU metadata storage format
> - The header contains: Vendor ID, PCI ID, and version number.
> - The header is followed by PCI-ID-specific data. The PCI ID and the
> version number define the format.
> - If the consumer runs on a different device, it must read the header
> and parse the metadata based on that. It implies that the
> driver-specific consumer code needs to know about all potential
> producer devices.
>
>
> Bottom line: DMABUF handles alone are fully sufficient for sharing
> buffers/textures between devices and processes from the AMDGPU point
> of view.
>
> HW driver implementation: The driver doesn't know anything about the
> users of exported or imported buffers. It only acts based on the few
> flags described in section 1. So far that's all we've needed.
>
>
> *** Use cases ***
>
> 1) DRI (producer: application; consumer: X server)
> - The producer receives these flags: READ, EXPLICIT_FLUSH. The X
> server will treat the shared "texture" as read-only. EXPLICIT_FLUSH
> ensures the texture can be compressed, and "flush_resource" will be
> called as part of SwapBuffers and "glFlush: GL_FRONT".
> - The X server can run on a different device. In that case, the window
> system API passes the "LINEAR" flag to the driver during allocation.
> That's suboptimal and fixable.
>
>
> 2) OpenGL-OpenCL interop (OpenGL always exports handles, OpenCL always
> imports handles)
> - Possible flags: READ, WRITE, READ+WRITE
> - OpenCL doesn't give us any other flags, so we are stuck with those.
> - Inter-device sharing is possible if the consumer understands the
> producer's metadata and tiling layouts.
>
> (amdgpu actually stores 2 different metadata blocks per allocation,
> but the simpler one is too limited and has only 8 bytes)
>
> Marek
>
>
> On Wed, Oct 5, 2016 at 1:47 AM, James Jones <jajones at nvidia.com> wrote:
>> Hello everyone,
>>
>> As many are aware, we took up the issue of surface/memory allocation at XDC
>> this year.  The outcome of that discussion was the beginnings of a design
>> proposal for a library that would server as a cross-device, cross-process
>> surface allocator.  In the past week I've started to condense some of my
>> notes from that discussion down to code & a design document.  I've posted
>> the first pieces to a github repository here:
>>
>>   https://github.com/cubanismo/allocator
>>
>> This isn't anything close to usable code yet.  Just headers and docs, and
>> incomplete ones at that.  However, feel free to check it out if you're
>> interested in discussing the design.
>>
>> Thanks,
>> -James
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel