DMA-heap driver hints
James Jones
jajones at nvidia.com
Tue Jan 24 03:56:23 UTC 2023
On 1/23/23 08:58, Laurent Pinchart wrote:
> Hi Christian,
>
> On Mon, Jan 23, 2023 at 05:29:18PM +0100, Christian König wrote:
>> Am 23.01.23 um 14:55 schrieb Laurent Pinchart:
>>> Hi Christian,
>>>
>>> CC'ing James as I think this is related to his work on the unix device
>>> memory allocator ([1]).
Thank you for including me.
>>> [1] https://lore.kernel.org/dri-devel/8b555674-1c5b-c791-4547-2ea7c16aee6c@nvidia.com/
>>>
>>> On Mon, Jan 23, 2023 at 01:37:54PM +0100, Christian König wrote:
>>>> Hi guys,
>>>>
>>>> this is just an RFC! The last time we discussed the DMA-buf coherency
>>>> problem [1] we concluded that DMA-heap first needs a better way to
>>>> communicate to userspace which heap to use for a certain device.
>>>>
>>>> As far as I know userspace currently just hard codes that information
>>>> which is certainly not desirable considering that we should have this
>>>> inside the kernel as well.
>>>>
>>>> So what those two patches here do is to first add some
>>>> dma_heap_create_device_link() and dma_heap_remove_device_link()
>>>> function and then demonstrating the functionality with uvcvideo
>>>> driver.
>>>>
>>>> The preferred DMA-heap is represented with a symlink in sysfs between
>>>> the device and the virtual DMA-heap device node.
>>>
>>> I'll start with a few high-level comments/questions:
>>>
>>> - Instead of tying drivers to heaps, have you considered a system where
>>> a driver would expose constraints, and a heap would then be selected
>>> based on those constraints ? A tight coupling between heaps and
>>> drivers means downstream patches to drivers in order to use
>>> vendor-specific heaps, that sounds painful.
>>
>> I was wondering the same thing as well, but came to the conclusion that
>> just the other way around is the less painful approach.
>
> From a kernel point of view, sure, it's simpler and thus less painful.
> From the point of view of solving the whole issue, I'm not sure :-)
>
>> The problem is that there are so many driver specific constrains that I
>> don't even know where to start from.
>
> That's where I was hoping James would have some feedback for us, based
> on the work he did on the Unix device memory allocator. If that's not
> the case, we can brainstorm this from scratch.
Simon Ser's and my presentation from XDC 2020 focused entirely on this.
The idea was not to try to enumerate every constraint up front, but
rather to develop an extensible mechanism that would be flexible enough
to encapsulate many disparate types of constraints and perform set
operations on them (merging sets was the only operation we tried to
solve). Simon implemented a prototype header-only library to implement
the mechanism:
https://gitlab.freedesktop.org/emersion/drm-constraints
The links to the presentation and talk are below, along with notes from
the follow-up workshop.
https://lpc.events/event/9/contributions/615/attachments/704/1301/XDC_2020__Allocation_Constraints.pdf
https://www.youtube.com/watch?v=HZEClOP5TIk
https://paste.sr.ht/~emersion/c43b30be08bab1882f1b107402074462bba3b64a
Note one of the hard parts of this was figuring out how to express a
device or heap within the constraint structs. One of the better ideas
proposed back then was something like heap IDs, where dma heaps would
each have one, and devices could register their own heaps (or even just
themselves?) with the heap subsystem and be assigned a locally-unique ID
that userspace could pass around. This sounds similar to what you're
proposing. Perhaps a reasonable identifier is a device (major, minor)
pair. Such a constraint could be expressed as a symlink for easy
visualization/discoverability from userspace, but might be easier to
serialize over the wire as the (major, minor) pair. I'm not clear which
direction is better to express this either: As a link from heap->device,
or device->heap.
>>> A constraint-based system would also, I think, be easier to extend
>>> with additional constraints in the future.
>>>
>>> - I assume some drivers will be able to support multiple heaps. How do
>>> you envision this being implemented ?
>>
>> I don't really see an use case for this.
One use case I know of here is same-vendor GPU local memory on different
GPUs. NVIDIA GPUs have certain things they can only do on local memory,
certain things they can do on all memory, and certain things they can
only do on memory local to another NVIDIA GPU, especially when there
exists an NVLink interface between the two. So they'd ideally express
different constraints for heap representing each of those.
The same thing is often true of memory on remote devices that are at
various points in a PCIe topology. We've had situations where we could
only get enough bandwidth between two PCIe devices when they were less
than some number of hops away on the PCI tree. We hard-coded logic to
detect that in our userspace drivers, but we could instead expose it as
a constraint on heaps that would express which devices can accomplish
certain operations as pairs.
Similarly to the last one, I would assume (But haven't yet run into in
my personal experience) similar limitations arise when you have a NUMA
memory configuration, if you had a heap representing each NUMA node or
something, some might have more coherency than others, or might have
different bandwidth limitations that you could express through something
like device tree, etc. This is more speculative, but seems like a
generalization of the above two cases.
>> We do have some drivers which say: for this use case you can use
>> whatever you want, but for that use case you need to use specific memory
>> (scan out on GPUs for example works like this).
>>
>> But those specific use cases are exactly that, very specific. And
>> exposing all the constrains for them inside a kernel UAPI is a futile
>> effort (at least for the GPU scan out case). In those situations it's
>> just better to have the allocator in userspace deal with device specific
>> stuff.
>
> While the exact constraints will certainly be device-specific, is that
> also true of the type of constraints, or the existence of constraints in
> the first place ? To give an example, with a video decoder producing
> frames that are then rendered by a GPU, the tiling format that would
> produce the best result is device-specific, but the fact that the
> decoder can produce a tiled format that would work better for the GPU,
> or a non-tiled format for other consumers, is a very common constraint.
> I don't think we'll be able to do away completely with the
> device-specific code in userspace, but I think we should be able to
> expose constraints in a generic-enough way that many simple use cases
> will be covered by generic code.
Yes, agreed, the design we proposed took pains to allow vendor-specific
constraints via a general mechanism. We supported both vendor-specific
types of constraints, and vendor-specific values for general
constraints. Some code repository would act as the central registry of
constraint types, similar to the Linux kernel's drm_fourcc.h for
modifiers, or the Khronos github repository for Vulkan vendor IDs. If
the definition needs to be used by the kernel, the kernel is the logical
repository for that role IMHO.
In our 2020 discussion, there was some debate over whether the kernel
should expose and/or consume constraints directly, or whether it's
sufficient to expose lower-level mechanisms from the kernel and keep the
translation of constraints to the correct mechanism in userspace. There
are pros/cons to both. I don't know that we bottomed out on that part of
the discussion, and it could be the right answer is some combination of
the two, as suggested below.
>> What I want to do is to separate the problem. The kernel only provides
>> the information where to allocate from, figuring out the details like
>> how many bytes, which format, plane layout etc.. is still the job of
>> userspace.
>
> Even with UVC, where to allocate memory from will depend on the use
> case. If the consumer is a device that doesn't support non-contiguous
> DMA, the system heap won't work.
>
> Actually, could you explain why UVC works better with the system heap ?
> I'm looking at videobuf2 as an importer, and it doesn't call the dmabuf
> as far as I can tell, so cache management provided by the exporter seems
> to be bypassed in any case.
>
>> What we do have is compatibility between heaps. E.g. a CMA heap is
>> usually compatible with the system heap or might even be a subset of
>> another CMA heap. But I wanted to add that as next step to the heaps
>> framework itself.
>>
>>> - Devices could have different constraints based on particular
>>> configurations. For instance, a device may require specific memory
>>> layout for multi-planar YUV formats only (as in allocating the Y and C
>>> planes of NV12 from different memory banks). A dynamic API may thus be
>>> needed (but may also be very painful to use from userspace).
>>
>> Uff, good to know. But I'm not sure how to expose stuff like that.
>
> Let's see if James has anything to share with us :-) With a bit of luck
> we won't have to start from scratch.
Well, this is the hard example we keep using as a measure of success for
whatever we come up with. I don't know that someone ever sat down and
tried to express this in the mechanism Simon and I proposed in 2020, but
allowing the expression of something that complex was certainly our
goal. How to resolve it down to an allocation mechanism, I believe, was
further than we got, but we weren't that well versed in DMA heaps back
then, or at least I wasn't.
>>>> What's still missing is certainly matching userspace for this since I
>>>> wanted to discuss the initial kernel approach first.
>>>
>>> https://git.libcamera.org/libcamera/libcamera.git/ would be a good place
>>> to prototype userspace support :-)
>>
>> Thanks for the pointer and the review,
>
> By the way, side question, does anyone know what the status of dma heaps
> support is in major distributions ? On my Gentoo box,
> /dev/dma_heap/system is 0600 root:root. That's easy to change for a
> developer, but not friendly to end-users. If we want to move forward
> with dma heaps as standard multimedia allocators (and I would really
> like to see that happening), we have to make sure they can be used.
We seem to have reached a world where display (primary nodes) are
carefully guarded, and some mildly trusted group of folks (video) can
access render nodes, but then there's some separate group generally for
camera/video/v4l and whatnot from what I've seen (I don't survey this
stuff that often. I live in my developer bubble). I'm curious whether
the right direction is a broader group that encompasses all of render
nodes, multimedia, and heaps, or if a more segmented design is right.
The latter is probably the better design from first principles, but
might lead to headaches if the permissions diverge.
Thanks,
-James
>>>> Please take a look and comment.
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> [1] https://lore.kernel.org/all/11a6f97c-e45f-f24b-8a73-48d5a388a2cc@gmail.com/T/
>
More information about the dri-devel
mailing list