[v2,2/2] doc: uapi: Add document describing dma-buf semantics
suijingfeng
suijingfeng at loongson.cn
Fri Aug 18 15:37:17 UTC 2023
Hi,
On 2023/8/3 23:47, Daniel Stone wrote:
> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
Probably, best practices?
> using buffers when crossing context/process/device/subsystem boundaries.
>
> This ties up all of dma-buf, formats and modifiers, and their usage.
>
> Signed-off-by: Daniel Stone <daniels at collabora.com>
> ---
> Documentation/driver-api/dma-buf.rst | 8 +
> Documentation/gpu/drm-uapi.rst | 7 +
> .../userspace-api/dma-buf-alloc-exchange.rst | 384 ++++++++++++++++++
> Documentation/userspace-api/index.rst | 1 +
> 4 files changed, 400 insertions(+)
> create mode 100644 Documentation/userspace-api/dma-buf-alloc-exchange.rst
>
> v2:
> - Moved to general uAPI section, cross-referenced from dma-buf/DRM
> - Added Pekka's suggested glossary with some small changes
> - Cleanups and clarifications from Simon and James
>
> diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
> index 862dbc2759d0..0c153d79ccc4 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -22,6 +22,14 @@ interact with the three main primitives offered by dma-buf:
> allowing implicit (kernel-ordered) synchronization of work to
> preserve the illusion of coherent access
>
> +
> +Userspace API principles and use
> +--------------------------------
> +
> +For more details on how to design your subsystem's API for dma-buf use, please
> +see Documentation/userspace-api/dma-buf-alloc-exchange.rst.
> +
> +
> Shared DMA Buffers
> ------------------
>
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index 65fb3036a580..eef5fd19bc92 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
> @@ -486,3 +486,10 @@ and the CRTC index is its position in this array.
>
> .. kernel-doc:: include/uapi/drm/drm_mode.h
> :internal:
> +
> +
> +dma-buf interoperability
> +========================
> +
> +Please see Documentation/userspace-api/dma-buf-alloc-exchange.rst for
> +information on how dma-buf is integrated and exposed within DRM.
> diff --git a/Documentation/userspace-api/dma-buf-alloc-exchange.rst b/Documentation/userspace-api/dma-buf-alloc-exchange.rst
> new file mode 100644
> index 000000000000..090453d2ad78
> --- /dev/null
> +++ b/Documentation/userspace-api/dma-buf-alloc-exchange.rst
> @@ -0,0 +1,384 @@
> +.. Copyright 2021-2023 Collabora Ltd.
> +
> +========================
> +Exchanging pixel buffers
> +========================
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and advice.
> +
> +
> +Glossary of terms
> +=================
> +
> +.. glossary::
> +
> + image:
> + Conceptually a two-dimensional array of pixels. The pixels may be stored
> + in one or more memory buffers. Has width and height in pixels, pixel
> + format and modifier (implicit or explicit).
> +
> + row:
> + A span along a single y-axis value, e.g. from co-ordinates (0,100) to
> + (200,100).
> +
> + scanline:
> + Synonym for row.
> +
> + column:
> + A span along a single x-axis value, e.g. from co-ordinates (100,0) to
> + (100,100).
> +
> + memory buffer:
> + A piece of memory for storing (parts of) pixel data. Has stride and size
> + in bytes and at least one handle in some API. May contain one or more
> + planes.
> +
> + plane:
> + A two-dimensional array of some or all of an image's color and alpha
> + channel values.
> +
> + pixel:
> + A picture element. Has a single color value which is defined by one or
> + more color channels values, e.g. R, G and B, or Y, Cb and Cr. May also
> + have an alpha value as an additional channel.
> +
> + pixel data:
> + Bytes or bits that represent some or all of the color/alpha channel values
> + of a pixel or an image. The data for one pixel may be spread over several
> + planes or memory buffers depending on format and modifier.
> +
> + color value:
> + A tuple of numbers, representing a color. Each element in the tuple is a
> + color channel value.
> +
> + color channel:
> + One of the dimensions in a color model. For example, RGB model has
> + channels R, G, and B. Alpha channel is sometimes counted as a color
> + channel as well.
> +
> + pixel format:
> + A description of how pixel data represents the pixel's color and alpha
> + values.
> +
> + modifier:
> + A description of how pixel data is laid out in memory buffers.
> +
> + alpha:
> + A value that denotes the color coverage in a pixel. Sometimes used for
> + translucency instead.
> +
> + stride:
> + A value that denotes the relationship between pixel-location co-ordinates
> + and byte-offset values. Typically used as the byte offset between two
> + pixels at the start of vertically-consecutive tiling blocks. For linear
> + layouts, the byte offset between two vertically-adjacent pixels.
> +
> + pitch:
> + Synonym for stride.
> +
> +
> +Formats and modifiers
> +=====================
> +
> +Each buffer must have an underlying format. This format describes the color
> +values provided for each pixel. Although each subsystem has its own format
> +descriptions (e.g. V4L2 and fbdev), the ``DRM_FORMAT_*`` tokens should be reused
> +wherever possible, as they are the standard descriptions used for interchange.
> +These tokens are described in the ``drm_fourcc.h`` file, which is a part of
> +DRM's uAPI.
> +
> +Each ``DRM_FORMAT_*`` token describes the translation between a pixel
> +co-ordinate in an image, and the color values for that pixel contained within
> +its memory buffers. The number and type of color channels are described:
> +whether they are RGB or YUV, integer or floating-point, the size of each channel
> +and their locations within the pixel memory, and the relationship between color
> +planes.
> +
> +For example, ``DRM_FORMAT_ARGB8888`` describes a format in which each pixel has
> +a single 32-bit value in memory. Alpha, red, green, and blue, color channels are
> +available at 8-bit precision per channel, ordered respectively from most to
> +least significant bits in little-endian storage. ``DRM_FORMAT_*`` is not
> +affected by either CPU or device endianness; the byte pattern in memory is
> +always as described in the format definition, which is usually little-endian.
> +
> +As a more complex example, ``DRM_FORMAT_NV12`` describes a format in which luma
> +and chroma YUV samples are stored in separate planes, where the chroma plane is
> +stored at half the resolution in both dimensions (i.e. one U/V chroma
> +sample is stored for each 2x2 pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel memory
> +samples, and the actual memory storage for the buffer. The most straightforward
> +modifier is ``DRM_FORMAT_MOD_LINEAR``, describing a scheme in which each plane
> +is laid out row-sequentially, from the top-left to the bottom-right corner.
> +This is considered the baseline interchange format, and most convenient for CPU
> +access.
> +
> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +``DRM_FORMAT_MOD_VIVANTE_TILED`` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile in
> +a plane stores pixels (0,0) to (3,3) inclusive, and the second tile in a plane
> +stores pixels (4,0) to (7,3) inclusive.
> +
> +Some modifiers may modify the number of planes required for an image; for
> +example, the ``I915_FORMAT_MOD_Y_TILED_CCS`` modifier adds a second plane to RGB
> +formats in which it stores data about the status of every tile, notably
> +including whether the tile is fully populated with pixel data, or can be
> +expanded from a single solid color.
> +
> +These extended layouts are highly vendor-specific, and even specific to
> +particular generations or configurations of devices per-vendor. For this reason,
> +support of modifiers must be explicitly enumerated and negotiated by all users
> +in order to ensure a compatible and optimal pipeline, as discussed below.
> +
> +
> +Dimensions and size
> +===================
> +
> +Each pixel buffer must be accompanied by logical pixel dimensions. This refers
> +to the number of unique samples which can be extracted from, or stored to, the
> +underlying memory storage. For example, even though a 1920x1080
> +``DRM_FORMAT_NV12`` buffer has a luma plane containing 1920x1080 samples for the Y
> +component, and 960x540 samples for the U and V components, the overall buffer is
> +still described as having dimensions of 1920x1080.
> +
> +The in-memory storage of a buffer is not guaranteed to begin immediately at the
> +base address of the underlying memory, nor is it guaranteed that the memory
> +storage is tightly clipped to either dimension.
> +
> +Each plane must therefore be described with an ``offset`` in bytes, which will be
> +added to the base address of the memory storage before performing any per-pixel
> +calculations. This may be used to combine multiple planes into a single memory
> +buffer; for example, ``DRM_FORMAT_NV12`` may be stored in a single memory buffer
> +where the luma plane's storage begins immediately at the start of the buffer
> +with an offset of 0, and the chroma plane's storage follows within the same buffer
> +beginning from the byte offset for that plane.
> +
> +Each plane must also have a ``stride`` in bytes, expressing the offset in memory
> +between two contiguous row. For example, a ``DRM_FORMAT_MOD_LINEAR`` buffer
> +with dimensions of 1000x1000 may have been allocated as if it were 1024x1000, in
> +order to allow for aligned access patterns. In this case, the buffer will still
> +be described with a width of 1000, however the stride will be ``1024 * bpp``,
> +indicating that there are 24 pixels at the positive extreme of the x axis whose
> +values are not significant.
> +
> +Buffers may also be padded further in the y dimension, simply by allocating a
> +larger area than would ordinarily be required. For example, many media decoders
> +are not able to natively output buffers of height 1080, but instead require an
> +effective height of 1088 pixels. In this case, the buffer continues to be
> +described as having a height of 1080, with the memory allocation for each buffer
> +being increased to account for the extra padding.
> +
> +
> +Enumeration
> +===========
> +
> +Every user of pixel buffers must be able to enumerate a set of supported formats
> +and modifiers, described together. Within KMS, this is achieved with the
> +``IN_FORMATS`` property on each DRM plane, listing the supported DRM formats, and
> +the modifiers supported for each format. In userspace, this is supported through
> +the `EGL_EXT_image_dma_buf_import_modifiers`_ extension entrypoints for EGL, the
> +`VK_EXT_image_drm_format_modifier`_ extension for Vulkan, and the
> +`zwp_linux_dmabuf_v1`_ extension for Wayland.
> +
> +Each of these interfaces allows users to query a set of supported
> +format+modifier combinations.
> +
> +
> +Negotiation
> +===========
> +
> +It is the responsibility of userspace to negotiate an acceptable format+modifier
> +combination for its usage. This is performed through a simple intersection of
> +lists. For example, if a user wants to use Vulkan to render an image to be
> +displayed on a KMS plane, it must:
> +
> + - query KMS for the ``IN_FORMATS`` property for the given plane
> + - query Vulkan for the supported formats for its physical device, making sure
> + to pass the ``VkImageUsageFlagBits`` and ``VkImageCreateFlagBits``
> + corresponding to the intended rendering use
> + - intersect these formats to determine the most appropriate one
> + - for this format, intersect the lists of supported modifiers for both KMS and
> + Vulkan, to obtain a final list of acceptable modifiers for that format
> +
> +This intersection must be performed for all usages. For example, if the user
> +also wishes to encode the image to a video stream, it must query the media API
> +it intends to use for encoding for the set of modifiers it supports, and
> +additionally intersect against this list.
> +
> +If the intersection of all lists is an empty list, it is not possible to share
> +buffers in this way, and an alternate strategy must be considered (e.g. using
> +CPU access routines to copy data between the different uses, with the
> +corresponding performance cost).
> +
> +The resulting modifier list is unsorted; the order is not significant.
> +
> +
> +Allocation
> +==========
> +
> +Once userspace has determined an appropriate format, and corresponding list of
> +acceptable modifiers, it must allocate the buffer. As there is no universal
> +buffer-allocation interface available at either kernel or userspace level, the
> +client makes an arbitrary choice of allocation interface such as Vulkan, GBM, or
> +a media API.
> +
> +Each allocation request must take, at a minimum: the pixel format, a list of
> +acceptable modifiers, and the buffer's width and height. Each API may extend
> +this set of properties in different ways, such as allowing allocation in more
> +than two dimensions, intended usage patterns, etc.
> +
> +The component which allocates the buffer will make an arbitrary choice of what
> +it considers the 'best' modifier within the acceptable list for the requested
> +allocation, any padding required, and further properties of the underlying
> +memory buffers such as whether they are stored in system or device-specific
> +memory, whether or not they are physically contiguous, and their cache mode.
> +These properties of the memory buffer are not visible to userspace, however the
> +``dma-heaps`` API is an effort to address this.
> +
> +After allocation, the client must query the allocator to determine the actual
> +modifier selected for the buffer, as well as the per-plane offset and stride.
> +Allocators are not permitted to vary the format in use, to select a modifier not
> +provided within the acceptable list, nor to vary the pixel dimensions other than
> +the padding expressed through offset, stride, and size.
> +
> +Communicating additional constraints, such as alignment of stride or offset,
> +placement within a particular memory area, etc, is out of scope of dma-buf,
> +and is not solved by format and modifier tokens.
> +
> +
> +Import
> +======
> +
> +To use a buffer within a different context, device, or subsystem, the user
> +passes these parameters (format, modifier, width, height, and per-plane offset
> +and stride) to an importing API.
> +
> +Each memory buffer is referred to by a buffer handle, which may be unique or
> +duplicated within an image. For example, a ``DRM_FORMAT_NV12`` buffer may have
> +the luma and chroma buffers combined into a single memory buffer by use of the
> +per-plane offset parameters, or they may be completely separate allocations in
> +memory. For this reason, each import and allocation API must provide a separate
> +handle for each plane.
> +
> +Each kernel subsystem has its own types and interfaces for buffer management.
> +DRM uses GEM buffer objects (BOs), V4L2 has its own references, etc. These types
> +are not portable between contexts, processes, devices, or subsystems.
> +
> +To address this, ``dma-buf`` handles are used as the universal interchange for
> +buffers. Subsystem-specific operations are used to export native buffer handles
> +to a ``dma-buf`` file descriptor, and to import those file descriptors into a
> +native buffer handle. dma-buf file descriptors can be transferred between
> +contexts, processes, devices, and subsystems.
> +
> +For example, a Wayland media player may use V4L2 to decode a video frame into a
> +``DRM_FORMAT_NV12`` buffer. This will result in two memory planes (luma and
> +chroma) being dequeued by the user from V4L2. These planes are then exported to
> +one dma-buf file descriptor per plane, these descriptors are then sent along
> +with the metadata (format, modifier, width, height, per-plane offset and stride)
> +to the Wayland server. The Wayland server will then import these file
> +descriptors as an EGLImage for use through EGL/OpenGL (ES), a VkImage for use
> +through Vulkan, or a KMS framebuffer object; each of these import operations
> +will take the same metadata and convert the dma-buf file descriptors into their
> +native buffer handles.
> +
> +Having a non-empty intersection of supported modifiers does not guarantee that
> +import will succeed into all consumers; they may have constraints beyond those
> +impliied by modifiers which must be satisfied.
s/impliied/implied
> +
> +
> +Implicit modifiers
> +==================
> +
> +The concept of modifiers post-dates all of the subsystems mentioned above. As
> +such, it has been retrofitted into all of these APIs, and in order to ensure
> +backwards compatibility, support is needed for drivers and userspace which do
> +not (yet) support modifiers.
> +
> +As an example, GBM is used to allocate buffers to be shared between EGL for
> +rendering and KMS for display. It has two entrypoints for allocating buffers:
> +``gbm_bo_create`` which only takes the format, width, height, and a usage token,
> +and ``gbm_bo_create_with_modifiers`` which extends this with a list of modifiers.
> +
> +In the latter case, the allocation is as discussed above, being provided with a
> +list of acceptable modifiers that the implementation can choose from (or fail if
> +it is not possible to allocate within those constraints). In the former case
> +where modifiers are not provided, the GBM implementation must make its own
> +choice as to what is likely to be the 'best' layout. Such a choice is entirely
> +implementation-specific: some will internally use tiled layouts which are not
> +CPU-accessible if the implementation decides that is a good idea through
> +whatever heuristic. It is the implementation's responsibility to ensure that
> +this choice is appropriate.
> +
> +To support this case where the layout is not known because there is no awareness
> +of modifiers, a special ``DRM_FORMAT_MOD_INVALID`` token has been defined. This
> +pseudo-modifier declares that the layout is not known, and that the driver
> +should use its own logic to determine what the underlying layout may be.
> +
> +.. note::
> +
> + ``DRM_FORMAT_MOD_INVALID`` is a non-zero value. The modifier value zero is
> + ``DRM_FORMAT_MOD_LINEAR``, which is an explicit guarantee that the image
> + has the linear layout. Care and attention should be taken to ensure that
> + zero as a default uninitialized value signals no modifier.
> +
> +There are four cases where this token may be used:
> + - during enumeration, an interface may return ``DRM_FORMAT_MOD_INVALID``, either
> + as the sole member of a modifier list to declare that explicit modifiers are
> + not supported, or as part of a larger list to declare that implicit modifiers
> + may be used
> + - during allocation, a user may supply ``DRM_FORMAT_MOD_INVALID``, either as the
> + sole member of a modifier list (equivalent to not supplying a modifier list
> + at all) to declare that explicit modifiers are not supported and must not be
> + used, or as part of a larger list to declare that an allocation using implicit
> + modifiers is acceptable
> + - in a post-allocation query, an implementation may return
> + ``DRM_FORMAT_MOD_INVALID`` as the modifier of the allocated buffer to declare
> + that the underlying layout is implementation-defined and that an explicit
> + modifier description is not available; per the above rules, this may only be
> + returned when the user has included ``DRM_FORMAT_MOD_INVALID`` as part of the
> + list of acceptable modifiers, or not provided a list
> + - when importing a buffer, the user may supply ``DRM_FORMAT_MOD_INVALID`` as the
> + buffer modifier (or not supply a modifier) to indicate that the modifier is
> + unknown for whatever reason; this is only acceptable when the buffer has
> + not been allocated with an explicit modifier
> +
> +It follows from this that for any single buffer, the complete chain of operations
> +formed by the producer and all the consumers must be either fully implicit or fully
> +explicit. For example, if a user wishes to allocate a buffer for use between
> +GPU, display, and media, but the media API does not support modifiers, then the
> +user **must not** allocate the buffer with explicit modifiers and attempt to
> +import the buffer into the media API with no modifier, but either perform the
> +allocation using implicit modifiers, or allocate the buffer for media use
> +separately and copy between the two buffers.
> +
> +As one exception to the above, allocations may be 'upgraded' from implicit
> +to explicit modifiers. For example, if the buffer is allocated with
> +``gbm_bo_create`` (taking no modifiers), the user may then query the modifier with
> +``gbm_bo_get_modifier`` and then use this modifier as an explicit modifier token
> +if a valid modifier is returned.
> +
> +When allocating buffers for exchange between different users and modifiers are
> +not available, implementations are strongly encouraged to use
> +``DRM_FORMAT_MOD_LINEAR`` for their allocation, as this is the universal baseline
> +for exchange. However, it is not guaranteed that this will result in the correct
> +interpretation of buffer content, as implicit modifier operation may still be
> +subject to driver-specific heuristics.
> +
> +Any new users - userspace programs and protocols, kernel subsystems, etc -
> +wishing to exchange buffers must offer interoperability through dma-buf file
> +descriptors for memory planes, DRM format tokens to describe the format, DRM
> +format modifiers to describe the layout in memory, at least width and height for
> +dimensions, and at least offset and stride for each memory plane.
> +
> +.. _zwp_linux_dmabuf_v1: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/blob/main/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
> +.. _VK_EXT_image_drm_format_modifier: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_image_drm_format_modifier.html
> +.. _EGL_EXT_image_dma_buf_import_modifiers: https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import_modifiers.txt
> \ No newline at end of file
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index 72a65db0c498..031df47a7c19 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -22,6 +22,7 @@ place where this information is gathered.
> unshare
> spec_ctrl
> accelerators/ocxl
> + dma-buf-alloc-exchange
> ebpf/index
> ELF
> ioctl/index
This doc contains rich knowledge, thanks for the writing.
I believe that this will helps to educate a crowd of newbies, including me.
But I know part of the content inside this document is correct.
Maybe, it need a more advance programmer to review.
Anyway, I hope this elegant document can be merged.
Reviewed-by: Sui Jingfeng <suijingfeng at loongson.cn>
More information about the dri-devel
mailing list