<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 9/25/24 19:31, Christian König
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:04caa788-19a6-4336-985c-4eb191c24438@amd.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Am 25.09.24 um 14:51 schrieb Dmitry Baryshkov:<br>
<blockquote type="cite"
cite="mid:lk7a2xuqrctyywuanjwseh5lkcz3soatc2zf3kn3uwc43pdyic@edm3hcd2koas">
<pre class="moz-quote-pre" wrap="">On Wed, Sep 25, 2024 at 10:51:15AM GMT, Christian König wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Am 25.09.24 um 01:05 schrieb Dmitry Baryshkov:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Tue, Sep 24, 2024 at 01:13:18PM GMT, Andrew Davis wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 9/23/24 1:33 AM, Dmitry Baryshkov wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi,
On Fri, Aug 30, 2024 at 09:03:47AM GMT, Jens Wiklander wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi,
This patch set is based on top of Yong Wu's restricted heap patch set [1].
It's also a continuation on Olivier's Add dma-buf secure-heap patch set [2].
The Linaro restricted heap uses genalloc in the kernel to manage the heap
carvout. This is a difference from the Mediatek restricted heap which
relies on the secure world to manage the carveout.
I've tried to adress the comments on [2], but [1] introduces changes so I'm
afraid I've had to skip some comments.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">I know I have raised the same question during LPC (in connection to
Qualcomm's dma-heap implementation). Is there any reason why we are
using generic heaps instead of allocating the dma-bufs on the device
side?
In your case you already have TEE device, you can use it to allocate and
export dma-bufs, which then get imported by the V4L and DRM drivers.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">This goes to the heart of why we have dma-heaps in the first place.
We don't want to burden userspace with having to figure out the right
place to get a dma-buf for a given use-case on a given hardware.
That would be very non-portable, and fail at the core purpose of
a kernel: to abstract hardware specifics away.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Unfortunately all proposals to use dma-buf heaps were moving in the
described direction: let app select (somehow) from a platform- and
vendor- specific list of dma-buf heaps. In the kernel we at least know
the platform on which the system is running. Userspace generally doesn't
(and shouldn't). As such, it seems better to me to keep the knowledge in
the kernel and allow userspace do its job by calling into existing
device drivers.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">The idea of letting the kernel fully abstract away the complexity of inter
device data exchange is a completely failed design. There has been plenty of
evidence for that over the years.
Because of this in DMA-buf it's an intentional design decision that
userspace and *not* the kernel decides where and what to allocate from.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Hmm, ok.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">What the kernel should provide are the necessary information what type of
memory a device can work with and if certain memory is accessible or not.
This is the part which is unfortunately still not well defined nor
implemented at the moment.
Apart from that there are a whole bunch of intentional design decision which
should prevent developers to move allocation decision inside the kernel. For
example DMA-buf doesn't know what the content of the buffer is (except for
it's total size) and which use cases a buffer will be used with.
So the question if memory should be exposed through DMA-heaps or a driver
specific allocator is not a question of abstraction, but rather one of the
physical location and accessibility of the memory.
If the memory is attached to any physical device, e.g. local memory on a
dGPU, FPGA PCIe BAR, RDMA, camera internal memory etc, then expose the
memory as device specific allocator.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">So, for embedded systems with unified memory all buffers (maybe except
PCIe BARs) should come from DMA-BUF heaps, correct?</pre>
</blockquote>
<br>
From what I know that is correct, yes. Question is really if that
will stay this way.<br>
<br>
Neural accelerators look a lot stripped down FPGAs these days and
the benefit of local memory for GPUs is known for decades.<br>
<br>
Could be that designs with local specialized memory see a revival
any time, who knows.<br>
<br>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite"
cite="mid:lk7a2xuqrctyywuanjwseh5lkcz3soatc2zf3kn3uwc43pdyic@edm3hcd2koas">
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">If the memory is not physically attached to any device, but rather just
memory attached to the CPU or a system wide memory controller then expose
the memory as DMA-heap with specific requirements (e.g. certain sized pages,
contiguous, restricted, encrypted, ...).
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Is encrypted / protected a part of the allocation contract or should it
be enforced separately via a call to TEE / SCM / anything else?
</pre>
</blockquote>
<br>
Well that is a really good question I can't fully answer either.
From what I know now I would say it depends on the design.<br>
<br>
</blockquote>
<p>IMHO, I think Dmitry's proposal to rather allow TEE device being
allocator and exporter of DMA-bufs related to restricted memory
makes sense to me. Since it's really the TEE implementation
(OP-TEE, AMD-TEE, TS-TEE or future QTEE) which sets up the
restrictions on a particular piece of allocated memory. AFAIK,
that happens after the DMA-buf gets allocated and then user-space
calls into TEE to setup which media pipeline is going to access
that particular DMA-buf. It can also be a static contract
depending on a particular platform design.<br>
</p>
<p>As Jens noted in the other thread, we already manage shared
memory allocations (from a static carve-out or dynamically mapped)
for communications among Linux and TEE that were based on DMA-bufs
earlier but since we didn't required them to be shared with other
devices, so we rather switched to anonymous memory.<br>
</p>
<p>From user-space perspective, it's cleaner to use TEE device
IOCTLs for DMA-buf allocations since it already know to which
underlying TEE implementation it's communicating with rather than
first figuring out which DMA heap to use for allocation and then
communicating with TEE implementation.<br>
</p>
<p>-Sumit<br>
</p>
</body>
</html>