<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 2019年08月02日 17:41, Lionel Landwerlin
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:3b9f5b4c-827b-dfec-e7c8-a784b0573e85@intel.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div class="moz-cite-prefix">Hey David,</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 02/08/2019 12:11, zhoucm1 wrote:<br>
</div>
<blockquote type="cite"
cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
<p>Hi Lionel,</p>
<p>For binary semaphore, I guess every one will think
application will guarantee wait is behind the signal, whenever
the semaphore is shared or used in internal-process. <br>
</p>
<p>I think below two options can fix your problem:<br>
</p>
<p>a. Can we extend vkWaitForFence so that it can be able to
wait on fence-available? If fence is available, then it's safe
to do semaphore wait in vkQueueSubmit.</p>
</blockquote>
<p><br>
</p>
<p>I'm sorry, but I don't understand what vkWaitForFence() has to
do with this problem.</p>
<p>They test case we're struggling with doesn't use that API.</p>
<p><br>
</p>
<p>Can you maybe explain a bit more how it relates?<br>
</p>
</blockquote>
vkQueueSubmit()<br>
vkWaitForFenceAvailable()<br>
vkQueueSubmit()<br>
vkWaitForFenceAvailable()<br>
vkQueueSubmit()<br>
vkWaitForFenceAvailable()<br>
<br>
Sorry, that could lead application program very ugly.<br>
<blockquote type="cite"
cite="mid:3b9f5b4c-827b-dfec-e7c8-a784b0573e85@intel.com">
<p> </p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
<p>b. Make waitBeforeSignal is valid for binary semaphore as
well, as that way, It is reasonable to add wait/signal
counting for binary syncobj.<br>
</p>
</blockquote>
<p><br>
</p>
<p>Yeah essentially the change we're proposing internally makes
binary semaphores use syncobj timelines.</p>
</blockquote>
Will you raise up a MR to add the language of waitBeforeSignal
support of binary semaphore to vulkan spec?<br>
<br>
-David<br>
<blockquote type="cite"
cite="mid:3b9f5b4c-827b-dfec-e7c8-a784b0573e85@intel.com">
<p>There is just another u64 associated with them.<br>
</p>
<p><br>
</p>
<p>-Lionel<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
<p> </p>
<p><br>
</p>
<p>-David<br>
</p>
<br>
<div class="moz-cite-prefix">On 2019年08月02日 14:27, Lionel
Landwerlin wrote:<br>
</div>
<blockquote type="cite"
cite="mid:9bd985bb-1dfb-b28d-e1da-efa5b41464c8@intel.com">
<div class="moz-cite-prefix">On 02/08/2019 09:10, Koenig,
Christian wrote:<br>
</div>
<blockquote type="cite"
cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
<div dir="auto">
<div><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">Am 02.08.2019 07:38 schrieb
Lionel Landwerlin <a class="moz-txt-link-rfc2396E"
href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a>:<br
type="attribution">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div>On 02/08/2019 08:21, Koenig, Christian
wrote:<br>
</div>
<blockquote>
<div dir="auto">
<div><br>
<div><br>
<div class="elided-text">Am 02.08.2019
07:17 schrieb Lionel Landwerlin <a
href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br
type="attribution">
<blockquote style="margin:0 0 0
0.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div>
<div>On 02/08/2019 08:08, Koenig,
Christian wrote:<br>
</div>
<blockquote>
<div dir="auto">Hi Lionel,
<div dir="auto"><br>
</div>
<div dir="auto">Well that
looks more like your test
case is buggy.</div>
<div dir="auto"><br>
</div>
<div dir="auto">According to
the code the ctx1 queue
always waits for sem1 and
ctx2 queue always waits for
sem2.</div>
</div>
</blockquote>
<p><br>
</p>
<p>That's supposed to be the same
underlying syncobj because it's
exported from one VkDevice as
opaque FD from sem1 and imported
into sem2.<br>
</p>
</div>
</blockquote>
</div>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">Well than that's still buggy
and won't synchronize at all.</div>
<div dir="auto"><br>
</div>
<div dir="auto">When ctx1 waits for a
semaphore and then signals the same
semaphore there is no guarantee that ctx2
will run in between jobs.</div>
<div dir="auto"><br>
</div>
<div dir="auto">It's perfectly valid in this
case to first run all jobs from ctx1 and
then all jobs from ctx2.</div>
</div>
</blockquote>
<p><br>
</p>
<p>That's not really how I see the semaphores
working.</p>
<p>The spec describe VkSemaphore as an interface
to an internal payload opaque to the
application.</p>
<p><br>
</p>
<p>When ctx1 waits on the semaphore, it waits on
the payload put there by the previous
iteration.</p>
</div>
</blockquote>
</div>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">And who says that it's not waiting for
it's own previous payload?</div>
</div>
</blockquote>
<p><br>
</p>
<p>That's was I understood from you previous comment : "there
is no guarantee that ctx2 will run in between jobs"</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto">See if the payload is a counter this won't
work either. Keep in mind that this has the semantic of
a semaphore. Whoever grabs the semaphore first wins and
can run, everybody else has to wait.</div>
</div>
</blockquote>
<p><br>
</p>
<p>What performs the "grab" here?</p>
<p>I thought that would be vkQueueSubmit().</p>
<p>Since that occuring from a single application thread, that
should then be ordered in execution of ctx1,ctx2,ctx1,...<br>
</p>
<p><br>
</p>
<p>Thanks for your time on this,</p>
<p><br>
</p>
<p>-Lionel<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<p>Then it proceeds to signal it by replacing
the internal payload.</p>
</div>
</blockquote>
</div>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">That's an implementation detail of our
sync objects, but I don't think that this behavior is
part of the Vulkan specification.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Regards,</div>
<div dir="auto">Christian.</div>
<div dir="auto">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<p><br>
</p>
<p>ctx2 then waits on that and replaces the
payload again with the new internal
synchronization object.</p>
<p><br>
</p>
<p>The internal payload is a dma fence in our
case and signaling just replaces a dma fence
by another or puts one where there was none
before.</p>
<p>So we should have created a dependecy link
between all the submissions and then should be
executed in the order of QueueSubmit() calls.<br>
</p>
<p><br>
</p>
<p>-Lionel<br>
</p>
<p><br>
</p>
<blockquote>
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto">It only prevents running
both at the same time and as far as I can
see that still works even with threaded
submission.</div>
<div dir="auto"><br>
</div>
<div dir="auto">You need at least two
semaphores for a tandem submission.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Regards,</div>
<div dir="auto">Christian.</div>
<div dir="auto">
<div>
<div class="elided-text">
<blockquote style="margin:0 0 0
0.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div>
<p><br>
</p>
<blockquote>
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto">This way there
can't be any Synchronisation
between the two.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Regards,</div>
<div dir="auto">Christian.</div>
</div>
<div><br>
<div class="elided-text">Am
02.08.2019 06:55 schrieb
Lionel Landwerlin <a
href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br type="attribution">
</div>
</div>
<div>
<div>Hey Christian,</div>
<div><br>
</div>
<div>The problem boils down to
the fact that we don't
immediately create dma
fences when calling
vkQueueSubmit().</div>
<div>This is delayed to a
thread.</div>
<div><br>
</div>
<div>From a single application
thread, you can
QueueSubmit() to 2 queues
from 2 different devices.</div>
<div>Each QueueSubmit to one
queue has a dependency on
the previous QueueSubmit on
the other queue through an
exported/imported semaphore.</div>
<div><br>
</div>
<div>From the API point of
view the state of the
semaphore should be changed
after each QueueSubmit().</div>
<div>The problem is that it's
not because of the thread
and because you might have
those 2 submission threads
tied to different
VkDevice/VkInstance or even
different applications
(synchronizing themselves
outside the vulkan API).</div>
<div><br>
</div>
<div>Hope that makes sense.</div>
<div>It's not really easy to
explain by mail, the best
explanation is probably
reading the test : <a
href="https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788"
moz-do-not-send="true">
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788</a></div>
<div><br>
</div>
<div>Like David mentioned
you're not running into that
issue right now, because you
only dispatch to the thread
under specific conditions.</div>
<div>But I could build a case
to force that and likely run
into the same issue.<br>
</div>
<div><br>
</div>
<div>-Lionel<br>
</div>
<div><br>
</div>
<div>On 02/08/2019 07:33,
Koenig, Christian wrote:<br>
</div>
<blockquote>
<div>
<div dir="auto">Hi Lionel,
<div dir="auto"><br>
</div>
<div dir="auto">Well
could you describe
once more what the
problem is?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Cause I
don't fully understand
why a rather normal
tandem submission with
two semaphores should
fail in any way.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Regards,</div>
<div dir="auto">Christian.</div>
</div>
<div><br>
<div>Am 02.08.2019 06:28
schrieb Lionel
Landwerlin <a
href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br type="attribution">
</div>
</div>
</div>
<font size="2"><span
style="font-size:11pt">
<div>There aren't CTS
tests covering the
issue I was
mentioning.<br>
But we could add them.<br>
<br>
I don't have all the
details regarding your
implementation but
even with <br>
the "semaphore
thread", I could see
it running into the
same issues.<br>
What if a mix of
binary & timeline
semaphores are handed
to vkQueueSubmit()?<br>
<br>
For example with
queueA & queueB
from 2 different
VkDevice :<br>
vkQueueSubmit(queueA,
signal semA);<br>
vkQueueSubmit(queueA,
wait on [semA,
timelineSemB]); with <br>
timelineSemB
triggering a wait
before signal.<br>
vkQueueSubmit(queueB,
signal semA);<br>
<br>
<br>
-Lionel<br>
<br>
On 02/08/2019 06:18,
Zhou, David(ChunMing)
wrote:<br>
> Hi Lionel,<br>
><br>
> By the Queue
thread is a heavy
thread, which is
always resident in
driver during
application running,
our guys don't like
that. So we switch to
Semaphore Thread, only
when waitBeforeSignal
of timeline happens,
we spawn a thread to
handle that wait. So
we don't have your
this issue.<br>
> By the way, I
already pass all your
CTS cases for now. I
suggest you to switch
to Semaphore Thread
instead of Queue
Thread as well. It
works very well.<br>
><br>
> -David<br>
><br>
> -----Original
Message-----<br>
> From: Lionel
Landwerlin <a
href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a><br>
> Sent: Friday,
August 2, 2019 4:52 AM<br>
> To: dri-devel <a
href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"><dri-devel@lists.freedesktop.org></a>;
Koenig, Christian <a
href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><Christian.Koenig@amd.com></a>;
Zhou, David(ChunMing)
<a
href="mailto:David1.Zhou@amd.com"
moz-do-not-send="true"><David1.Zhou@amd.com></a>; Jason Ekstrand <a
href="mailto:jason@jlekstrand.net" moz-do-not-send="true">
<jason@jlekstrand.net></a><br>
> Subject: Threaded
submission &
semaphore sharing<br>
><br>
> Hi Christian,
David,<br>
><br>
> Sorry to report
this so late in the
process, but I think
we found an issue not
directly related to
syncobj timelines
themselves but with a
side effect of the
threaded submissions.<br>
><br>
> Essentially we're
failing a test in
crucible :<br>
>
func.sync.semaphore-fd.opaque-fd<br>
> This test create
a single binary
semaphore, shares it
between 2
VkDevice/VkQueue.<br>
> Then in a loop it
proceeds to submit
workload alternating
between the 2 VkQueue
with one submit
depending on the
other.<br>
> It does so by
waiting on the
VkSemaphore signaled
in the previous
iteration and
resignaling it.<br>
><br>
> The problem for
us is that once things
are dispatched to the
submission thread, the
ordering of the
submission is lost.<br>
> Because we have 2
devices and they both
have their own
submission thread.<br>
><br>
> Jason suggested
that we reestablish
the ordering by having
semaphores/syncobjs
carry an additional
uint64_t payload.<br>
> This 64bit
integer would
represent be an
identifier that
submission threads
will
WAIT_FOR_AVAILABLE on.<br>
><br>
> The scenario
would look like this :<br>
> -
vkQueueSubmit(queueA,
signal on semA);<br>
> - in
the caller thread,
this would increment
the syncobj additional
u64 payload and return
it to userspace.<br>
> - at
some point the
submission thread of
queueA submits the
workload and signal
the syncobj of semA
with value returned in
the caller thread of
vkQueueSubmit().<br>
> -
vkQueueSubmit(queueB,
wait on semA);<br>
> - in
the caller thread,
this would read the
syncobj additional<br>
> u64 payload<br>
> - at
some point the
submission thread of
queueB will try to
submit the work, but
first it will
WAIT_FOR_AVAILABLE the
u64 value returned in
the step above<br>
><br>
> Because we want
the binary semaphores
to be shared across
processes and would
like this to remain a
single FD, the
simplest location to
store this additional
u64 payload would be
the DRM syncobj.<br>
> It would need an
additional ioctl to
read & increment
the value.<br>
><br>
> What do you
think?<br>
><br>
> -Lionel<br>
<br>
<br>
</div>
</span></font></blockquote>
<p><br>
</p>
</div>
</blockquote>
<p><br>
</p>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</blockquote>
<p><br>
</p>
</blockquote>
<br>
</body>
</html>