Threaded submission & semaphore sharing

Fri Aug 2 06:27:04 UTC 2019

On 02/08/2019 09:10, Koenig, Christian wrote:
>
>
> Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
> <lionel.g.landwerlin at intel.com>:
>
>     On 02/08/2019 08:21, Koenig, Christian wrote:
>
>
>
>         Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>         <lionel.g.landwerlin at intel.com>
>         <mailto:lionel.g.landwerlin at intel.com>:
>
>             On 02/08/2019 08:08, Koenig, Christian wrote:
>
>                 Hi Lionel,
>
>                 Well that looks more like your test case is buggy.
>
>                 According to the code the ctx1 queue always waits for
>                 sem1 and ctx2 queue always waits for sem2.
>
>
>             That's supposed to be the same underlying syncobj because
>             it's exported from one VkDevice as opaque FD from sem1 and
>             imported into sem2.
>
>
>         Well than that's still buggy and won't synchronize at all.
>
>         When ctx1 waits for a semaphore and then signals the same
>         semaphore there is no guarantee that ctx2 will run in between
>         jobs.
>
>         It's perfectly valid in this case to first run all jobs from
>         ctx1 and then all jobs from ctx2.
>
>
>     That's not really how I see the semaphores working.
>
>     The spec describe VkSemaphore as an interface to an internal
>     payload opaque to the application.
>
>
>     When ctx1 waits on the semaphore, it waits on the payload put
>     there by the previous iteration.
>
>
> And who says that it's not waiting for it's own previous payload?

That's was I understood from you previous comment : "there is no 
guarantee that ctx2 will run in between jobs"

>
> See if the payload is a counter this won't work either. Keep in mind 
> that this has the semantic of a semaphore. Whoever grabs the semaphore 
> first wins and can run, everybody else has to wait.

What performs the "grab" here?

I thought that would be vkQueueSubmit().

Since that occuring from a single application thread, that should then 
be ordered in execution of ctx1,ctx2,ctx1,...

Thanks for your time on this,

-Lionel

>
>     Then it proceeds to signal it by replacing the internal payload.
>
>
> That's an implementation detail of our sync objects, but I don't think 
> that this behavior is part of the Vulkan specification.
>
> Regards,
> Christian.
>
>
>     ctx2 then waits on that and replaces the payload again with the
>     new internal synchronization object.
>
>
>     The internal payload is a dma fence in our case and signaling just
>     replaces a dma fence by another or puts one where there was none
>     before.
>
>     So we should have created a dependecy link between all the
>     submissions and then should be executed in the order of
>     QueueSubmit() calls.
>
>
>     -Lionel
>
>
>
>         It only prevents running both at the same time and as far as I
>         can see that still works even with threaded submission.
>
>         You need at least two semaphores for a tandem submission.
>
>         Regards,
>         Christian.
>
>
>
>                 This way there can't be any Synchronisation between
>                 the two.
>
>                 Regards,
>                 Christian.
>
>                 Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>                 <lionel.g.landwerlin at intel.com>
>                 <mailto:lionel.g.landwerlin at intel.com>:
>                 Hey Christian,
>
>                 The problem boils down to the fact that we don't
>                 immediately create dma fences when calling
>                 vkQueueSubmit().
>                 This is delayed to a thread.
>
>                 From a single application thread, you can
>                 QueueSubmit() to 2 queues from 2 different devices.
>                 Each QueueSubmit to one queue has a dependency on the
>                 previous QueueSubmit on the other queue through an
>                 exported/imported semaphore.
>
>                 From the API point of view the state of the semaphore
>                 should be changed after each QueueSubmit().
>                 The problem is that it's not because of the thread and
>                 because you might have those 2 submission threads tied
>                 to different VkDevice/VkInstance or even different
>                 applications (synchronizing themselves outside the
>                 vulkan API).
>
>                 Hope that makes sense.
>                 It's not really easy to explain by mail, the best
>                 explanation is probably reading the test :
>                 https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>
>                 Like David mentioned you're not running into that
>                 issue right now, because you only dispatch to the
>                 thread under specific conditions.
>                 But I could build a case to force that and likely run
>                 into the same issue.
>
>                 -Lionel
>
>                 On 02/08/2019 07:33, Koenig, Christian wrote:
>
>                     Hi Lionel,
>
>                     Well could you describe once more what the problem is?
>
>                     Cause I don't fully understand why a rather normal
>                     tandem submission with two semaphores should fail
>                     in any way.
>
>                     Regards,
>                     Christian.
>
>                     Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>                     <lionel.g.landwerlin at intel.com>
>                     <mailto:lionel.g.landwerlin at intel.com>:
>                     There aren't CTS tests covering the issue I was
>                     mentioning.
>                     But we could add them.
>
>                     I don't have all the details regarding your
>                     implementation but even with
>                     the "semaphore thread", I could see it running
>                     into the same issues.
>                     What if a mix of binary & timeline semaphores are
>                     handed to vkQueueSubmit()?
>
>                     For example with queueA & queueB from 2 different
>                     VkDevice :
>                          vkQueueSubmit(queueA, signal semA);
>                          vkQueueSubmit(queueA, wait on [semA,
>                     timelineSemB]); with
>                     timelineSemB triggering a wait before signal.
>                          vkQueueSubmit(queueB, signal semA);
>
>
>                     -Lionel
>
>                     On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>                     > Hi Lionel,
>                     >
>                     > By the Queue thread is a heavy thread, which is
>                     always resident in driver during application
>                     running, our guys don't like that. So we switch to
>                     Semaphore Thread, only when waitBeforeSignal of
>                     timeline happens, we spawn a thread to handle that
>                     wait. So we don't have your this issue.
>                     > By the way, I already pass all your CTS cases
>                     for now. I suggest you to switch to Semaphore
>                     Thread instead of Queue Thread as well. It works
>                     very well.
>                     >
>                     > -David
>                     >
>                     > -----Original Message-----
>                     > From: Lionel Landwerlin
>                     <lionel.g.landwerlin at intel.com>
>                     <mailto:lionel.g.landwerlin at intel.com>
>                     > Sent: Friday, August 2, 2019 4:52 AM
>                     > To: dri-devel <dri-devel at lists.freedesktop.org>
>                     <mailto:dri-devel at lists.freedesktop.org>; Koenig,
>                     Christian <Christian.Koenig at amd.com>
>                     <mailto:Christian.Koenig at amd.com>; Zhou,
>                     David(ChunMing) <David1.Zhou at amd.com>
>                     <mailto:David1.Zhou at amd.com>; Jason Ekstrand
>                     <jason at jlekstrand.net> <mailto:jason at jlekstrand.net>
>                     > Subject: Threaded submission & semaphore sharing
>                     >
>                     > Hi Christian, David,
>                     >
>                     > Sorry to report this so late in the process, but
>                     I think we found an issue not directly related to
>                     syncobj timelines themselves but with a side
>                     effect of the threaded submissions.
>                     >
>                     > Essentially we're failing a test in crucible :
>                     > func.sync.semaphore-fd.opaque-fd
>                     > This test create a single binary semaphore,
>                     shares it between 2 VkDevice/VkQueue.
>                     > Then in a loop it proceeds to submit workload
>                     alternating between the 2 VkQueue with one submit
>                     depending on the other.
>                     > It does so by waiting on the VkSemaphore
>                     signaled in the previous iteration and resignaling it.
>                     >
>                     > The problem for us is that once things are
>                     dispatched to the submission thread, the ordering
>                     of the submission is lost.
>                     > Because we have 2 devices and they both have
>                     their own submission thread.
>                     >
>                     > Jason suggested that we reestablish the ordering
>                     by having semaphores/syncobjs carry an additional
>                     uint64_t payload.
>                     > This 64bit integer would represent be an
>                     identifier that submission threads will
>                     WAIT_FOR_AVAILABLE on.
>                     >
>                     > The scenario would look like this :
>                     >       - vkQueueSubmit(queueA, signal on semA);
>                     >           - in the caller thread, this would
>                     increment the syncobj additional u64 payload and
>                     return it to userspace.
>                     >           - at some point the submission thread
>                     of queueA submits the workload and signal the
>                     syncobj of semA with value returned in the caller
>                     thread of vkQueueSubmit().
>                     >       - vkQueueSubmit(queueB, wait on semA);
>                     >           - in the caller thread, this would
>                     read the syncobj additional
>                     > u64 payload
>                     >           - at some point the submission thread
>                     of queueB will try to submit the work, but first
>                     it will WAIT_FOR_AVAILABLE the u64 value returned
>                     in the step above
>                     >
>                     > Because we want the binary semaphores to be
>                     shared across processes and would like this to
>                     remain a single FD, the simplest location to store
>                     this additional u64 payload would be the DRM syncobj.
>                     > It would need an additional ioctl to read &
>                     increment the value.
>                     >
>                     > What do you think?
>                     >
>                     > -Lionel
>
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190802/b4ff0599/attachment-0001.html>