Threaded submission & semaphore sharing
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Fri Aug 2 06:27:04 UTC 2019
On 02/08/2019 09:10, Koenig, Christian wrote:
>
>
> Am 02.08.2019 07:38 schrieb Lionel Landwerlin
> <lionel.g.landwerlin at intel.com>:
>
> On 02/08/2019 08:21, Koenig, Christian wrote:
>
>
>
> Am 02.08.2019 07:17 schrieb Lionel Landwerlin
> <lionel.g.landwerlin at intel.com>
> <mailto:lionel.g.landwerlin at intel.com>:
>
> On 02/08/2019 08:08, Koenig, Christian wrote:
>
> Hi Lionel,
>
> Well that looks more like your test case is buggy.
>
> According to the code the ctx1 queue always waits for
> sem1 and ctx2 queue always waits for sem2.
>
>
> That's supposed to be the same underlying syncobj because
> it's exported from one VkDevice as opaque FD from sem1 and
> imported into sem2.
>
>
> Well than that's still buggy and won't synchronize at all.
>
> When ctx1 waits for a semaphore and then signals the same
> semaphore there is no guarantee that ctx2 will run in between
> jobs.
>
> It's perfectly valid in this case to first run all jobs from
> ctx1 and then all jobs from ctx2.
>
>
> That's not really how I see the semaphores working.
>
> The spec describe VkSemaphore as an interface to an internal
> payload opaque to the application.
>
>
> When ctx1 waits on the semaphore, it waits on the payload put
> there by the previous iteration.
>
>
> And who says that it's not waiting for it's own previous payload?
That's was I understood from you previous comment : "there is no
guarantee that ctx2 will run in between jobs"
>
> See if the payload is a counter this won't work either. Keep in mind
> that this has the semantic of a semaphore. Whoever grabs the semaphore
> first wins and can run, everybody else has to wait.
What performs the "grab" here?
I thought that would be vkQueueSubmit().
Since that occuring from a single application thread, that should then
be ordered in execution of ctx1,ctx2,ctx1,...
Thanks for your time on this,
-Lionel
>
> Then it proceeds to signal it by replacing the internal payload.
>
>
> That's an implementation detail of our sync objects, but I don't think
> that this behavior is part of the Vulkan specification.
>
> Regards,
> Christian.
>
>
> ctx2 then waits on that and replaces the payload again with the
> new internal synchronization object.
>
>
> The internal payload is a dma fence in our case and signaling just
> replaces a dma fence by another or puts one where there was none
> before.
>
> So we should have created a dependecy link between all the
> submissions and then should be executed in the order of
> QueueSubmit() calls.
>
>
> -Lionel
>
>
>
> It only prevents running both at the same time and as far as I
> can see that still works even with threaded submission.
>
> You need at least two semaphores for a tandem submission.
>
> Regards,
> Christian.
>
>
>
> This way there can't be any Synchronisation between
> the two.
>
> Regards,
> Christian.
>
> Am 02.08.2019 06:55 schrieb Lionel Landwerlin
> <lionel.g.landwerlin at intel.com>
> <mailto:lionel.g.landwerlin at intel.com>:
> Hey Christian,
>
> The problem boils down to the fact that we don't
> immediately create dma fences when calling
> vkQueueSubmit().
> This is delayed to a thread.
>
> From a single application thread, you can
> QueueSubmit() to 2 queues from 2 different devices.
> Each QueueSubmit to one queue has a dependency on the
> previous QueueSubmit on the other queue through an
> exported/imported semaphore.
>
> From the API point of view the state of the semaphore
> should be changed after each QueueSubmit().
> The problem is that it's not because of the thread and
> because you might have those 2 submission threads tied
> to different VkDevice/VkInstance or even different
> applications (synchronizing themselves outside the
> vulkan API).
>
> Hope that makes sense.
> It's not really easy to explain by mail, the best
> explanation is probably reading the test :
> https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>
> Like David mentioned you're not running into that
> issue right now, because you only dispatch to the
> thread under specific conditions.
> But I could build a case to force that and likely run
> into the same issue.
>
> -Lionel
>
> On 02/08/2019 07:33, Koenig, Christian wrote:
>
> Hi Lionel,
>
> Well could you describe once more what the problem is?
>
> Cause I don't fully understand why a rather normal
> tandem submission with two semaphores should fail
> in any way.
>
> Regards,
> Christian.
>
> Am 02.08.2019 06:28 schrieb Lionel Landwerlin
> <lionel.g.landwerlin at intel.com>
> <mailto:lionel.g.landwerlin at intel.com>:
> There aren't CTS tests covering the issue I was
> mentioning.
> But we could add them.
>
> I don't have all the details regarding your
> implementation but even with
> the "semaphore thread", I could see it running
> into the same issues.
> What if a mix of binary & timeline semaphores are
> handed to vkQueueSubmit()?
>
> For example with queueA & queueB from 2 different
> VkDevice :
> vkQueueSubmit(queueA, signal semA);
> vkQueueSubmit(queueA, wait on [semA,
> timelineSemB]); with
> timelineSemB triggering a wait before signal.
> vkQueueSubmit(queueB, signal semA);
>
>
> -Lionel
>
> On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
> > Hi Lionel,
> >
> > By the Queue thread is a heavy thread, which is
> always resident in driver during application
> running, our guys don't like that. So we switch to
> Semaphore Thread, only when waitBeforeSignal of
> timeline happens, we spawn a thread to handle that
> wait. So we don't have your this issue.
> > By the way, I already pass all your CTS cases
> for now. I suggest you to switch to Semaphore
> Thread instead of Queue Thread as well. It works
> very well.
> >
> > -David
> >
> > -----Original Message-----
> > From: Lionel Landwerlin
> <lionel.g.landwerlin at intel.com>
> <mailto:lionel.g.landwerlin at intel.com>
> > Sent: Friday, August 2, 2019 4:52 AM
> > To: dri-devel <dri-devel at lists.freedesktop.org>
> <mailto:dri-devel at lists.freedesktop.org>; Koenig,
> Christian <Christian.Koenig at amd.com>
> <mailto:Christian.Koenig at amd.com>; Zhou,
> David(ChunMing) <David1.Zhou at amd.com>
> <mailto:David1.Zhou at amd.com>; Jason Ekstrand
> <jason at jlekstrand.net> <mailto:jason at jlekstrand.net>
> > Subject: Threaded submission & semaphore sharing
> >
> > Hi Christian, David,
> >
> > Sorry to report this so late in the process, but
> I think we found an issue not directly related to
> syncobj timelines themselves but with a side
> effect of the threaded submissions.
> >
> > Essentially we're failing a test in crucible :
> > func.sync.semaphore-fd.opaque-fd
> > This test create a single binary semaphore,
> shares it between 2 VkDevice/VkQueue.
> > Then in a loop it proceeds to submit workload
> alternating between the 2 VkQueue with one submit
> depending on the other.
> > It does so by waiting on the VkSemaphore
> signaled in the previous iteration and resignaling it.
> >
> > The problem for us is that once things are
> dispatched to the submission thread, the ordering
> of the submission is lost.
> > Because we have 2 devices and they both have
> their own submission thread.
> >
> > Jason suggested that we reestablish the ordering
> by having semaphores/syncobjs carry an additional
> uint64_t payload.
> > This 64bit integer would represent be an
> identifier that submission threads will
> WAIT_FOR_AVAILABLE on.
> >
> > The scenario would look like this :
> > - vkQueueSubmit(queueA, signal on semA);
> > - in the caller thread, this would
> increment the syncobj additional u64 payload and
> return it to userspace.
> > - at some point the submission thread
> of queueA submits the workload and signal the
> syncobj of semA with value returned in the caller
> thread of vkQueueSubmit().
> > - vkQueueSubmit(queueB, wait on semA);
> > - in the caller thread, this would
> read the syncobj additional
> > u64 payload
> > - at some point the submission thread
> of queueB will try to submit the work, but first
> it will WAIT_FOR_AVAILABLE the u64 value returned
> in the step above
> >
> > Because we want the binary semaphores to be
> shared across processes and would like this to
> remain a single FD, the simplest location to store
> this additional u64 payload would be the DRM syncobj.
> > It would need an additional ioctl to read &
> increment the value.
> >
> > What do you think?
> >
> > -Lionel
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190802/b4ff0599/attachment-0001.html>
More information about the dri-devel
mailing list