<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
Uff good question. DMA-buf certainly supports that use case, but I
have no idea if that is actually used somewhere.<br>
<br>
Daniel do you know any case?<br>
<br>
Christian.<br>
<br>
<div class="moz-cite-prefix">Am 27.04.21 um 15:26 schrieb Marek
Olšák:<br>
</div>
<blockquote type="cite"
cite="mid:CAAxE2A4FwZ11_opL++TPUViTOD6ZpV5b3MR+rTDUPvzqYz-oeQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="auto">Ok. So that would only make the following use
cases broken for now:
<div dir="auto">- amd render -> external gpu</div>
<div dir="auto">- amd video encode -> network device</div>
<div dir="auto"><br>
</div>
<div dir="auto">What about the case when we get a buffer from an
external device and we're supposed to make it "busy" when we
are using it, and the external device wants to wait until we
stop using it? Is it something that can happen, thus turning
"external -> amd" into "external <-> amd"?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Marek</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue., Apr. 27, 2021, 08:50
Christian König, <<a
href="mailto:ckoenig.leichtzumerken@gmail.com"
moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div> Only amd -> external.<br>
<br>
We can easily install something in an user queue which waits
for a dma_fence in the kernel.<br>
<br>
But we can't easily wait for an user queue as dependency of
a dma_fence.<br>
<br>
The good thing is we have this wait before signal case on
Vulkan timeline semaphores which have the same problem in
the kernel.<br>
<br>
The good news is I think we can relatively easily convert
i915 and older amdgpu device to something which is
compatible with user fences.<br>
<br>
So yes, getting that fixed case by case should work.<br>
<br>
Christian<br>
<br>
<div>Am 27.04.21 um 14:46 schrieb Marek Olšák:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div>I'll defer to Christian and Alex to decide whether
dropping sync with non-amd devices (GPUs, cameras
etc.) is acceptable.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Rewriting those drivers to this new sync
model could be done on a case by case basis.</div>
<div dir="auto"><br>
</div>
<div dir="auto">For now, would we only lose the "amd
-> external" dependency? Or the "external ->
amd" dependency too?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Marek</div>
<div dir="auto"><br>
<div class="gmail_quote" dir="auto">
<div dir="ltr" class="gmail_attr">On Tue., Apr. 27,
2021, 08:15 Daniel Vetter, <<a
href="mailto:daniel@ffwll.ch" rel="noreferrer
noreferrer noreferrer" target="_blank"
moz-do-not-send="true">daniel@ffwll.ch</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">On
Tue, Apr 27, 2021 at 2:11 PM Marek Olšák <<a
href="mailto:maraeo@gmail.com" rel="noreferrer
noreferrer noreferrer noreferrer"
target="_blank" moz-do-not-send="true">maraeo@gmail.com</a>>
wrote:<br>
> Ok. I'll interpret this as "yes, it will
work, let's do it".<br>
<br>
It works if all you care about is drm/amdgpu. I'm
not sure that's a<br>
reasonable approach for upstream, but it
definitely is an approach :-)<br>
<br>
We've already gone somewhat through the pain of
drm/amdgpu redefining<br>
how implicit sync works without sufficiently
talking with other<br>
people, maybe we should avoid a repeat of this ...<br>
-Daniel<br>
<br>
><br>
> Marek<br>
><br>
> On Tue., Apr. 27, 2021, 08:06 Christian
König, <<a
href="mailto:ckoenig.leichtzumerken@gmail.com"
rel="noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>>
wrote:<br>
>><br>
>> Correct, we wouldn't have synchronization
between device with and without user queues any
more.<br>
>><br>
>> That could only be a problem for A+I
Laptops.<br>
>><br>
>> Memory management will just work with
preemption fences which pause the user queues of a
process before evicting something. That will be a
dma_fence, but also a well known approach.<br>
>><br>
>> Christian.<br>
>><br>
>> Am 27.04.21 um 13:49 schrieb Marek Olšák:<br>
>><br>
>> If we don't use future fences for DMA
fences at all, e.g. we don't use them for memory
management, it can work, right? Memory management
can suspend user queues anytime. It doesn't need
to use DMA fences. There might be something that
I'm missing here.<br>
>><br>
>> What would we lose without DMA fences?
Just inter-device synchronization? I think that
might be acceptable.<br>
>><br>
>> The only case when the kernel will wait
on a future fence is before a page flip.
Everything today already depends on userspace not
hanging the gpu, which makes everything a future
fence.<br>
>><br>
>> Marek<br>
>><br>
>> On Tue., Apr. 27, 2021, 04:02 Daniel
Vetter, <<a href="mailto:daniel@ffwll.ch"
rel="noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">daniel@ffwll.ch</a>>
wrote:<br>
>>><br>
>>> On Mon, Apr 26, 2021 at 04:59:28PM
-0400, Marek Olšák wrote:<br>
>>> > Thanks everybody. The initial
proposal is dead. Here are some thoughts on<br>
>>> > how to do it differently.<br>
>>> ><br>
>>> > I think we can have direct
command submission from userspace via<br>
>>> > memory-mapped queues ("user
queues") without changing window systems.<br>
>>> ><br>
>>> > The memory management doesn't
have to use GPU page faults like HMM.<br>
>>> > Instead, it can wait for user
queues of a specific process to go idle and<br>
>>> > then unmap the queues, so that
userspace can't submit anything. Buffer<br>
>>> > evictions, pinning, etc. can be
executed when all queues are unmapped<br>
>>> > (suspended). Thus, no BO fences
and page faults are needed.<br>
>>> ><br>
>>> > Inter-process synchronization
can use timeline semaphores. Userspace will<br>
>>> > query the wait and signal value
for a shared buffer from the kernel. The<br>
>>> > kernel will keep a history of
those queries to know which process is<br>
>>> > responsible for signalling which
buffer. There is only the wait-timeout<br>
>>> > issue and how to identify the
culprit. One of the solutions is to have the<br>
>>> > GPU send all GPU signal commands
and all timed out wait commands via an<br>
>>> > interrupt to the kernel driver
to monitor and validate userspace behavior.<br>
>>> > With that, it can be identified
whether the culprit is the waiting process<br>
>>> > or the signalling process and
which one. Invalid signal/wait parameters can<br>
>>> > also be detected. The kernel can
force-signal only the semaphores that time<br>
>>> > out, and punish the processes
which caused the timeout or used invalid<br>
>>> > signal/wait parameters.<br>
>>> ><br>
>>> > The question is whether this
synchronization solution is robust enough for<br>
>>> > dma_fence and whatever the
kernel and window systems need.<br>
>>><br>
>>> The proper model here is the
preempt-ctx dma_fence that amdkfd uses<br>
>>> (without page faults). That means
dma_fence for synchronization is doa, at<br>
>>> least as-is, and we're back to
figuring out the winsys problem.<br>
>>><br>
>>> "We'll solve it with timeouts" is
very tempting, but doesn't work. It's<br>
>>> akin to saying that we're solving
deadlock issues in a locking design by<br>
>>> doing a global
s/mutex_lock/mutex_lock_timeout/ in the kernel.
Sure it<br>
>>> avoids having to reach the reset
button, but that's about it.<br>
>>><br>
>>> And the fundamental problem is that
once you throw in userspace command<br>
>>> submission (and syncing, at least
within the userspace driver, otherwise<br>
>>> there's kinda no point if you still
need the kernel for cross-engine sync)<br>
>>> means you get deadlocks if you still
use dma_fence for sync under<br>
>>> perfectly legit use-case. We've
discussed that one ad nauseam last summer:<br>
>>><br>
>>> <a
href="https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#indefinite-dma-fences"
rel="noreferrer noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#indefinite-dma-fences</a><br>
>>><br>
>>> See silly diagramm at the bottom.<br>
>>><br>
>>> Now I think all isn't lost, because
imo the first step to getting to this<br>
>>> brave new world is rebuilding the
driver on top of userspace fences, and<br>
>>> with the adjusted cmd submit model.
You probably don't want to use amdkfd,<br>
>>> but port that as a context flag or
similar to render nodes for gl/vk. Of<br>
>>> course that means you can only use
this mode in headless, without<br>
>>> glx/wayland winsys support, but it's
a start.<br>
>>> -Daniel<br>
>>><br>
>>> ><br>
>>> > Marek<br>
>>> ><br>
>>> > On Tue, Apr 20, 2021 at 4:34 PM
Daniel Stone <<a
href="mailto:daniel@fooishbar.org"
rel="noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">daniel@fooishbar.org</a>>
wrote:<br>
>>> ><br>
>>> > > Hi,<br>
>>> > ><br>
>>> > > On Tue, 20 Apr 2021 at
20:30, Daniel Vetter <<a
href="mailto:daniel@ffwll.ch" rel="noreferrer
noreferrer noreferrer noreferrer"
target="_blank" moz-do-not-send="true">daniel@ffwll.ch</a>>
wrote:<br>
>>> > ><br>
>>> > >> The thing is, you can't
do this in drm/scheduler. At least not without<br>
>>> > >> splitting up the
dma_fence in the kernel into separate memory
fences<br>
>>> > >> and sync fences<br>
>>> > ><br>
>>> > ><br>
>>> > > I'm starting to think this
thread needs its own glossary ...<br>
>>> > ><br>
>>> > > I propose we use 'residency
fence' for execution fences which enact<br>
>>> > > memory-residency
operations, e.g. faulting in a page ultimately
depending<br>
>>> > > on GPU work retiring.<br>
>>> > ><br>
>>> > > And 'value fence' for the
pure-userspace model suggested by timeline<br>
>>> > > semaphores, i.e. fences
being (*addr == val) rather than being able to
look<br>
>>> > > at ctx seqno.<br>
>>> > ><br>
>>> > > Cheers,<br>
>>> > > Daniel<br>
>>> > >
_______________________________________________<br>
>>> > > mesa-dev mailing list<br>
>>> > > <a
href="mailto:mesa-dev@lists.freedesktop.org"
rel="noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">mesa-dev@lists.freedesktop.org</a><br>
>>> > > <a
href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev"
rel="noreferrer noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
>>> > ><br>
>>><br>
>>> --<br>
>>> Daniel Vetter<br>
>>> Software Engineer, Intel Corporation<br>
>>> <a href="http://blog.ffwll.ch"
rel="noreferrer noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">http://blog.ffwll.ch</a><br>
>><br>
>><br>
>>
_______________________________________________<br>
>> mesa-dev mailing list<br>
>> <a
href="mailto:mesa-dev@lists.freedesktop.org"
rel="noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">mesa-dev@lists.freedesktop.org</a><br>
>> <a
href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev"
rel="noreferrer noreferrer noreferrer noreferrer
noreferrer" target="_blank"
moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
>><br>
>><br>
<br>
<br>
-- <br>
Daniel Vetter<br>
Software Engineer, Intel Corporation<br>
<a href="http://blog.ffwll.ch" rel="noreferrer
noreferrer noreferrer noreferrer noreferrer"
target="_blank" moz-do-not-send="true">http://blog.ffwll.ch</a><br>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>