<div dir="auto">Ok. So that would only make the following use cases broken for now:<div dir="auto">- amd render -> external gpu</div><div dir="auto">- amd video encode -> network device</div><div dir="auto"><br></div><div dir="auto">What about the case when we get a buffer from an external device and we're supposed to make it "busy" when we are using it, and the external device wants to wait until we stop using it? Is it something that can happen, thus turning "external -> amd" into "external <-> amd"?</div><div dir="auto"><br></div><div dir="auto">Marek</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue., Apr. 27, 2021, 08:50 Christian König, <<a href="mailto:ckoenig.leichtzumerken@gmail.com">ckoenig.leichtzumerken@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
Only amd -> external.<br>
<br>
We can easily install something in an user queue which waits for a
dma_fence in the kernel.<br>
<br>
But we can't easily wait for an user queue as dependency of a
dma_fence.<br>
<br>
The good thing is we have this wait before signal case on Vulkan
timeline semaphores which have the same problem in the kernel.<br>
<br>
The good news is I think we can relatively easily convert i915 and
older amdgpu device to something which is compatible with user
fences.<br>
<br>
So yes, getting that fixed case by case should work.<br>
<br>
Christian<br>
<br>
<div>Am 27.04.21 um 14:46 schrieb Marek
Olšák:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div>I'll defer to Christian and Alex to decide whether dropping
sync with non-amd devices (GPUs, cameras etc.) is acceptable.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Rewriting those drivers to this new sync model
could be done on a case by case basis.</div>
<div dir="auto"><br>
</div>
<div dir="auto">For now, would we only lose the "amd ->
external" dependency? Or the "external -> amd" dependency
too?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Marek</div>
<div dir="auto"><br>
<div class="gmail_quote" dir="auto">
<div dir="ltr" class="gmail_attr">On Tue., Apr. 27, 2021,
08:15 Daniel Vetter, <<a href="mailto:daniel@ffwll.ch" rel="noreferrer noreferrer noreferrer" target="_blank">daniel@ffwll.ch</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue,
Apr 27, 2021 at 2:11 PM Marek Olšák <<a href="mailto:maraeo@gmail.com" rel="noreferrer
noreferrer noreferrer noreferrer" target="_blank">maraeo@gmail.com</a>> wrote:<br>
> Ok. I'll interpret this as "yes, it will work, let's
do it".<br>
<br>
It works if all you care about is drm/amdgpu. I'm not sure
that's a<br>
reasonable approach for upstream, but it definitely is an
approach :-)<br>
<br>
We've already gone somewhat through the pain of drm/amdgpu
redefining<br>
how implicit sync works without sufficiently talking with
other<br>
people, maybe we should avoid a repeat of this ...<br>
-Daniel<br>
<br>
><br>
> Marek<br>
><br>
> On Tue., Apr. 27, 2021, 08:06 Christian König, <<a href="mailto:ckoenig.leichtzumerken@gmail.com" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">ckoenig.leichtzumerken@gmail.com</a>>
wrote:<br>
>><br>
>> Correct, we wouldn't have synchronization between
device with and without user queues any more.<br>
>><br>
>> That could only be a problem for A+I Laptops.<br>
>><br>
>> Memory management will just work with preemption
fences which pause the user queues of a process before
evicting something. That will be a dma_fence, but also a
well known approach.<br>
>><br>
>> Christian.<br>
>><br>
>> Am 27.04.21 um 13:49 schrieb Marek Olšák:<br>
>><br>
>> If we don't use future fences for DMA fences at
all, e.g. we don't use them for memory management, it can
work, right? Memory management can suspend user queues
anytime. It doesn't need to use DMA fences. There might be
something that I'm missing here.<br>
>><br>
>> What would we lose without DMA fences? Just
inter-device synchronization? I think that might be
acceptable.<br>
>><br>
>> The only case when the kernel will wait on a
future fence is before a page flip. Everything today
already depends on userspace not hanging the gpu, which
makes everything a future fence.<br>
>><br>
>> Marek<br>
>><br>
>> On Tue., Apr. 27, 2021, 04:02 Daniel Vetter, <<a href="mailto:daniel@ffwll.ch" rel="noreferrer noreferrer
noreferrer noreferrer" target="_blank">daniel@ffwll.ch</a>>
wrote:<br>
>>><br>
>>> On Mon, Apr 26, 2021 at 04:59:28PM -0400,
Marek Olšák wrote:<br>
>>> > Thanks everybody. The initial proposal
is dead. Here are some thoughts on<br>
>>> > how to do it differently.<br>
>>> ><br>
>>> > I think we can have direct command
submission from userspace via<br>
>>> > memory-mapped queues ("user queues")
without changing window systems.<br>
>>> ><br>
>>> > The memory management doesn't have to
use GPU page faults like HMM.<br>
>>> > Instead, it can wait for user queues of
a specific process to go idle and<br>
>>> > then unmap the queues, so that userspace
can't submit anything. Buffer<br>
>>> > evictions, pinning, etc. can be executed
when all queues are unmapped<br>
>>> > (suspended). Thus, no BO fences and page
faults are needed.<br>
>>> ><br>
>>> > Inter-process synchronization can use
timeline semaphores. Userspace will<br>
>>> > query the wait and signal value for a
shared buffer from the kernel. The<br>
>>> > kernel will keep a history of those
queries to know which process is<br>
>>> > responsible for signalling which buffer.
There is only the wait-timeout<br>
>>> > issue and how to identify the culprit.
One of the solutions is to have the<br>
>>> > GPU send all GPU signal commands and all
timed out wait commands via an<br>
>>> > interrupt to the kernel driver to
monitor and validate userspace behavior.<br>
>>> > With that, it can be identified whether
the culprit is the waiting process<br>
>>> > or the signalling process and which one.
Invalid signal/wait parameters can<br>
>>> > also be detected. The kernel can
force-signal only the semaphores that time<br>
>>> > out, and punish the processes which
caused the timeout or used invalid<br>
>>> > signal/wait parameters.<br>
>>> ><br>
>>> > The question is whether this
synchronization solution is robust enough for<br>
>>> > dma_fence and whatever the kernel and
window systems need.<br>
>>><br>
>>> The proper model here is the preempt-ctx
dma_fence that amdkfd uses<br>
>>> (without page faults). That means dma_fence
for synchronization is doa, at<br>
>>> least as-is, and we're back to figuring out
the winsys problem.<br>
>>><br>
>>> "We'll solve it with timeouts" is very
tempting, but doesn't work. It's<br>
>>> akin to saying that we're solving deadlock
issues in a locking design by<br>
>>> doing a global
s/mutex_lock/mutex_lock_timeout/ in the kernel. Sure it<br>
>>> avoids having to reach the reset button, but
that's about it.<br>
>>><br>
>>> And the fundamental problem is that once you
throw in userspace command<br>
>>> submission (and syncing, at least within the
userspace driver, otherwise<br>
>>> there's kinda no point if you still need the
kernel for cross-engine sync)<br>
>>> means you get deadlocks if you still use
dma_fence for sync under<br>
>>> perfectly legit use-case. We've discussed
that one ad nauseam last summer:<br>
>>><br>
>>> <a href="https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#indefinite-dma-fences" rel="noreferrer noreferrer noreferrer noreferrer noreferrer" target="_blank">https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#indefinite-dma-fences</a><br>
>>><br>
>>> See silly diagramm at the bottom.<br>
>>><br>
>>> Now I think all isn't lost, because imo the
first step to getting to this<br>
>>> brave new world is rebuilding the driver on
top of userspace fences, and<br>
>>> with the adjusted cmd submit model. You
probably don't want to use amdkfd,<br>
>>> but port that as a context flag or similar to
render nodes for gl/vk. Of<br>
>>> course that means you can only use this mode
in headless, without<br>
>>> glx/wayland winsys support, but it's a start.<br>
>>> -Daniel<br>
>>><br>
>>> ><br>
>>> > Marek<br>
>>> ><br>
>>> > On Tue, Apr 20, 2021 at 4:34 PM Daniel
Stone <<a href="mailto:daniel@fooishbar.org" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">daniel@fooishbar.org</a>>
wrote:<br>
>>> ><br>
>>> > > Hi,<br>
>>> > ><br>
>>> > > On Tue, 20 Apr 2021 at 20:30,
Daniel Vetter <<a href="mailto:daniel@ffwll.ch" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">daniel@ffwll.ch</a>> wrote:<br>
>>> > ><br>
>>> > >> The thing is, you can't do this
in drm/scheduler. At least not without<br>
>>> > >> splitting up the dma_fence in
the kernel into separate memory fences<br>
>>> > >> and sync fences<br>
>>> > ><br>
>>> > ><br>
>>> > > I'm starting to think this thread
needs its own glossary ...<br>
>>> > ><br>
>>> > > I propose we use 'residency fence'
for execution fences which enact<br>
>>> > > memory-residency operations, e.g.
faulting in a page ultimately depending<br>
>>> > > on GPU work retiring.<br>
>>> > ><br>
>>> > > And 'value fence' for the
pure-userspace model suggested by timeline<br>
>>> > > semaphores, i.e. fences being
(*addr == val) rather than being able to look<br>
>>> > > at ctx seqno.<br>
>>> > ><br>
>>> > > Cheers,<br>
>>> > > Daniel<br>
>>> > >
_______________________________________________<br>
>>> > > mesa-dev mailing list<br>
>>> > > <a href="mailto:mesa-dev@lists.freedesktop.org" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
>>> > > <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer noreferrer noreferrer noreferrer noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
>>> > ><br>
>>><br>
>>> --<br>
>>> Daniel Vetter<br>
>>> Software Engineer, Intel Corporation<br>
>>> <a href="http://blog.ffwll.ch" rel="noreferrer noreferrer noreferrer noreferrer noreferrer" target="_blank">http://blog.ffwll.ch</a><br>
>><br>
>><br>
>> _______________________________________________<br>
>> mesa-dev mailing list<br>
>> <a href="mailto:mesa-dev@lists.freedesktop.org" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
>> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer noreferrer noreferrer noreferrer noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
>><br>
>><br>
<br>
<br>
-- <br>
Daniel Vetter<br>
Software Engineer, Intel Corporation<br>
<a href="http://blog.ffwll.ch" rel="noreferrer noreferrer
noreferrer noreferrer noreferrer" target="_blank">http://blog.ffwll.ch</a><br>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote></div>