Threaded submission & semaphore sharing

zhoucm1 zhoucm1 at amd.com
Fri Aug 2 10:01:24 UTC 2019



On 2019年08月02日 17:41, Lionel Landwerlin wrote:
> Hey David,
>
> On 02/08/2019 12:11, zhoucm1 wrote:
>>
>> Hi Lionel,
>>
>> For binary semaphore, I guess every one will think application will 
>> guarantee wait is behind the signal, whenever the semaphore is shared 
>> or used in internal-process.
>>
>> I think below two options can fix your problem:
>>
>> a. Can we extend vkWaitForFence so that it can be able to wait on 
>> fence-available? If fence is available, then it's safe to do 
>> semaphore wait in vkQueueSubmit.
>>
>
> I'm sorry, but I don't understand what vkWaitForFence() has to do with 
> this problem.
>
> They test case we're struggling with doesn't use that API.
>
>
> Can you maybe explain a bit more how it relates?
>
vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()

Sorry, that could lead application program very ugly.
>
>
>> b. Make waitBeforeSignal is valid for binary semaphore as well, as 
>> that way, It is reasonable to add wait/signal counting for binary 
>> syncobj.
>>
>
> Yeah essentially the change we're proposing internally makes binary 
> semaphores use syncobj timelines.
>
Will you raise up a MR to add the language of waitBeforeSignal support 
of binary semaphore to vulkan spec?

-David
>
> There is just another u64 associated with them.
>
>
> -Lionel
>
>
>>
>> -David
>>
>>
>> On 2019年08月02日 14:27, Lionel Landwerlin wrote:
>>> On 02/08/2019 09:10, Koenig, Christian wrote:
>>>>
>>>>
>>>> Am 02.08.2019 07:38 schrieb Lionel Landwerlin 
>>>> <lionel.g.landwerlin at intel.com>:
>>>>
>>>>     On 02/08/2019 08:21, Koenig, Christian wrote:
>>>>
>>>>
>>>>
>>>>         Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>>>>         <lionel.g.landwerlin at intel.com>
>>>>         <mailto:lionel.g.landwerlin at intel.com>:
>>>>
>>>>             On 02/08/2019 08:08, Koenig, Christian wrote:
>>>>
>>>>                 Hi Lionel,
>>>>
>>>>                 Well that looks more like your test case is buggy.
>>>>
>>>>                 According to the code the ctx1 queue always waits
>>>>                 for sem1 and ctx2 queue always waits for sem2.
>>>>
>>>>
>>>>             That's supposed to be the same underlying syncobj
>>>>             because it's exported from one VkDevice as opaque FD
>>>>             from sem1 and imported into sem2.
>>>>
>>>>
>>>>         Well than that's still buggy and won't synchronize at all.
>>>>
>>>>         When ctx1 waits for a semaphore and then signals the same
>>>>         semaphore there is no guarantee that ctx2 will run in
>>>>         between jobs.
>>>>
>>>>         It's perfectly valid in this case to first run all jobs
>>>>         from ctx1 and then all jobs from ctx2.
>>>>
>>>>
>>>>     That's not really how I see the semaphores working.
>>>>
>>>>     The spec describe VkSemaphore as an interface to an internal
>>>>     payload opaque to the application.
>>>>
>>>>
>>>>     When ctx1 waits on the semaphore, it waits on the payload put
>>>>     there by the previous iteration.
>>>>
>>>>
>>>> And who says that it's not waiting for it's own previous payload?
>>>
>>>
>>> That's was I understood from you previous comment : "there is no 
>>> guarantee that ctx2 will run in between jobs"
>>>
>>>
>>>>
>>>> See if the payload is a counter this won't work either. Keep in 
>>>> mind that this has the semantic of a semaphore. Whoever grabs the 
>>>> semaphore first wins and can run, everybody else has to wait.
>>>
>>>
>>> What performs the "grab" here?
>>>
>>> I thought that would be vkQueueSubmit().
>>>
>>> Since that occuring from a single application thread, that should 
>>> then be ordered in execution of ctx1,ctx2,ctx1,...
>>>
>>>
>>> Thanks for your time on this,
>>>
>>>
>>> -Lionel
>>>
>>>
>>>>
>>>>     Then it proceeds to signal it by replacing the internal payload.
>>>>
>>>>
>>>> That's an implementation detail of our sync objects, but I don't 
>>>> think that this behavior is part of the Vulkan specification.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>     ctx2 then waits on that and replaces the payload again with the
>>>>     new internal synchronization object.
>>>>
>>>>
>>>>     The internal payload is a dma fence in our case and signaling
>>>>     just replaces a dma fence by another or puts one where there
>>>>     was none before.
>>>>
>>>>     So we should have created a dependecy link between all the
>>>>     submissions and then should be executed in the order of
>>>>     QueueSubmit() calls.
>>>>
>>>>
>>>>     -Lionel
>>>>
>>>>
>>>>
>>>>         It only prevents running both at the same time and as far
>>>>         as I can see that still works even with threaded submission.
>>>>
>>>>         You need at least two semaphores for a tandem submission.
>>>>
>>>>         Regards,
>>>>         Christian.
>>>>
>>>>
>>>>
>>>>                 This way there can't be any Synchronisation between
>>>>                 the two.
>>>>
>>>>                 Regards,
>>>>                 Christian.
>>>>
>>>>                 Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>>>>                 <lionel.g.landwerlin at intel.com>
>>>>                 <mailto:lionel.g.landwerlin at intel.com>:
>>>>                 Hey Christian,
>>>>
>>>>                 The problem boils down to the fact that we don't
>>>>                 immediately create dma fences when calling
>>>>                 vkQueueSubmit().
>>>>                 This is delayed to a thread.
>>>>
>>>>                 From a single application thread, you can
>>>>                 QueueSubmit() to 2 queues from 2 different devices.
>>>>                 Each QueueSubmit to one queue has a dependency on
>>>>                 the previous QueueSubmit on the other queue through
>>>>                 an exported/imported semaphore.
>>>>
>>>>                 From the API point of view the state of the
>>>>                 semaphore should be changed after each QueueSubmit().
>>>>                 The problem is that it's not because of the thread
>>>>                 and because you might have those 2 submission
>>>>                 threads tied to different VkDevice/VkInstance or
>>>>                 even different applications (synchronizing
>>>>                 themselves outside the vulkan API).
>>>>
>>>>                 Hope that makes sense.
>>>>                 It's not really easy to explain by mail, the best
>>>>                 explanation is probably reading the test :
>>>>                 https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>>>>
>>>>                 Like David mentioned you're not running into that
>>>>                 issue right now, because you only dispatch to the
>>>>                 thread under specific conditions.
>>>>                 But I could build a case to force that and likely
>>>>                 run into the same issue.
>>>>
>>>>                 -Lionel
>>>>
>>>>                 On 02/08/2019 07:33, Koenig, Christian wrote:
>>>>
>>>>                     Hi Lionel,
>>>>
>>>>                     Well could you describe once more what the
>>>>                     problem is?
>>>>
>>>>                     Cause I don't fully understand why a rather
>>>>                     normal tandem submission with two semaphores
>>>>                     should fail in any way.
>>>>
>>>>                     Regards,
>>>>                     Christian.
>>>>
>>>>                     Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>>>>                     <lionel.g.landwerlin at intel.com>
>>>>                     <mailto:lionel.g.landwerlin at intel.com>:
>>>>                     There aren't CTS tests covering the issue I was
>>>>                     mentioning.
>>>>                     But we could add them.
>>>>
>>>>                     I don't have all the details regarding your
>>>>                     implementation but even with
>>>>                     the "semaphore thread", I could see it running
>>>>                     into the same issues.
>>>>                     What if a mix of binary & timeline semaphores
>>>>                     are handed to vkQueueSubmit()?
>>>>
>>>>                     For example with queueA & queueB from 2
>>>>                     different VkDevice :
>>>>                     vkQueueSubmit(queueA, signal semA);
>>>>                     vkQueueSubmit(queueA, wait on [semA,
>>>>                     timelineSemB]); with
>>>>                     timelineSemB triggering a wait before signal.
>>>>                     vkQueueSubmit(queueB, signal semA);
>>>>
>>>>
>>>>                     -Lionel
>>>>
>>>>                     On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>>>>                     > Hi Lionel,
>>>>                     >
>>>>                     > By the Queue thread is a heavy thread, which
>>>>                     is always resident in driver during application
>>>>                     running, our guys don't like that. So we switch
>>>>                     to Semaphore Thread, only when waitBeforeSignal
>>>>                     of timeline happens, we spawn a thread to
>>>>                     handle that wait. So we don't have your this issue.
>>>>                     > By the way, I already pass all your CTS cases
>>>>                     for now. I suggest you to switch to Semaphore
>>>>                     Thread instead of Queue Thread as well. It
>>>>                     works very well.
>>>>                     >
>>>>                     > -David
>>>>                     >
>>>>                     > -----Original Message-----
>>>>                     > From: Lionel Landwerlin
>>>>                     <lionel.g.landwerlin at intel.com>
>>>>                     <mailto:lionel.g.landwerlin at intel.com>
>>>>                     > Sent: Friday, August 2, 2019 4:52 AM
>>>>                     > To: dri-devel
>>>>                     <dri-devel at lists.freedesktop.org>
>>>>                     <mailto:dri-devel at lists.freedesktop.org>;
>>>>                     Koenig, Christian <Christian.Koenig at amd.com>
>>>>                     <mailto:Christian.Koenig at amd.com>; Zhou,
>>>>                     David(ChunMing) <David1.Zhou at amd.com>
>>>>                     <mailto:David1.Zhou at amd.com>; Jason Ekstrand
>>>>                     <jason at jlekstrand.net>
>>>>                     <mailto:jason at jlekstrand.net>
>>>>                     > Subject: Threaded submission & semaphore sharing
>>>>                     >
>>>>                     > Hi Christian, David,
>>>>                     >
>>>>                     > Sorry to report this so late in the process,
>>>>                     but I think we found an issue not directly
>>>>                     related to syncobj timelines themselves but
>>>>                     with a side effect of the threaded submissions.
>>>>                     >
>>>>                     > Essentially we're failing a test in crucible :
>>>>                     > func.sync.semaphore-fd.opaque-fd
>>>>                     > This test create a single binary semaphore,
>>>>                     shares it between 2 VkDevice/VkQueue.
>>>>                     > Then in a loop it proceeds to submit workload
>>>>                     alternating between the 2 VkQueue with one
>>>>                     submit depending on the other.
>>>>                     > It does so by waiting on the VkSemaphore
>>>>                     signaled in the previous iteration and
>>>>                     resignaling it.
>>>>                     >
>>>>                     > The problem for us is that once things are
>>>>                     dispatched to the submission thread, the
>>>>                     ordering of the submission is lost.
>>>>                     > Because we have 2 devices and they both have
>>>>                     their own submission thread.
>>>>                     >
>>>>                     > Jason suggested that we reestablish the
>>>>                     ordering by having semaphores/syncobjs carry an
>>>>                     additional uint64_t payload.
>>>>                     > This 64bit integer would represent be an
>>>>                     identifier that submission threads will
>>>>                     WAIT_FOR_AVAILABLE on.
>>>>                     >
>>>>                     > The scenario would look like this :
>>>>                     >       - vkQueueSubmit(queueA, signal on semA);
>>>>                     >           - in the caller thread, this would
>>>>                     increment the syncobj additional u64 payload
>>>>                     and return it to userspace.
>>>>                     >           - at some point the submission
>>>>                     thread of queueA submits the workload and
>>>>                     signal the syncobj of semA with value returned
>>>>                     in the caller thread of vkQueueSubmit().
>>>>                     >       - vkQueueSubmit(queueB, wait on semA);
>>>>                     >           - in the caller thread, this would
>>>>                     read the syncobj additional
>>>>                     > u64 payload
>>>>                     >           - at some point the submission
>>>>                     thread of queueB will try to submit the work,
>>>>                     but first it will WAIT_FOR_AVAILABLE the u64
>>>>                     value returned in the step above
>>>>                     >
>>>>                     > Because we want the binary semaphores to be
>>>>                     shared across processes and would like this to
>>>>                     remain a single FD, the simplest location to
>>>>                     store this additional u64 payload would be the
>>>>                     DRM syncobj.
>>>>                     > It would need an additional ioctl to read &
>>>>                     increment the value.
>>>>                     >
>>>>                     > What do you think?
>>>>                     >
>>>>                     > -Lionel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190802/e4617973/attachment-0001.html>


More information about the dri-devel mailing list