Threaded submission & semaphore sharing
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Fri Aug 2 09:41:08 UTC 2019
Hey David,
On 02/08/2019 12:11, zhoucm1 wrote:
>
> Hi Lionel,
>
> For binary semaphore, I guess every one will think application will
> guarantee wait is behind the signal, whenever the semaphore is shared
> or used in internal-process.
>
> I think below two options can fix your problem:
>
> a. Can we extend vkWaitForFence so that it can be able to wait on
> fence-available? If fence is available, then it's safe to do semaphore
> wait in vkQueueSubmit.
>
I'm sorry, but I don't understand what vkWaitForFence() has to do with
this problem.
They test case we're struggling with doesn't use that API.
Can you maybe explain a bit more how it relates?
> b. Make waitBeforeSignal is valid for binary semaphore as well, as
> that way, It is reasonable to add wait/signal counting for binary syncobj.
>
Yeah essentially the change we're proposing internally makes binary
semaphores use syncobj timelines.
There is just another u64 associated with them.
-Lionel
>
> -David
>
>
> On 2019年08月02日 14:27, Lionel Landwerlin wrote:
>> On 02/08/2019 09:10, Koenig, Christian wrote:
>>>
>>>
>>> Am 02.08.2019 07:38 schrieb Lionel Landwerlin
>>> <lionel.g.landwerlin at intel.com>:
>>>
>>> On 02/08/2019 08:21, Koenig, Christian wrote:
>>>
>>>
>>>
>>> Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>>> <lionel.g.landwerlin at intel.com>
>>> <mailto:lionel.g.landwerlin at intel.com>:
>>>
>>> On 02/08/2019 08:08, Koenig, Christian wrote:
>>>
>>> Hi Lionel,
>>>
>>> Well that looks more like your test case is buggy.
>>>
>>> According to the code the ctx1 queue always waits
>>> for sem1 and ctx2 queue always waits for sem2.
>>>
>>>
>>> That's supposed to be the same underlying syncobj
>>> because it's exported from one VkDevice as opaque FD
>>> from sem1 and imported into sem2.
>>>
>>>
>>> Well than that's still buggy and won't synchronize at all.
>>>
>>> When ctx1 waits for a semaphore and then signals the same
>>> semaphore there is no guarantee that ctx2 will run in
>>> between jobs.
>>>
>>> It's perfectly valid in this case to first run all jobs from
>>> ctx1 and then all jobs from ctx2.
>>>
>>>
>>> That's not really how I see the semaphores working.
>>>
>>> The spec describe VkSemaphore as an interface to an internal
>>> payload opaque to the application.
>>>
>>>
>>> When ctx1 waits on the semaphore, it waits on the payload put
>>> there by the previous iteration.
>>>
>>>
>>> And who says that it's not waiting for it's own previous payload?
>>
>>
>> That's was I understood from you previous comment : "there is no
>> guarantee that ctx2 will run in between jobs"
>>
>>
>>>
>>> See if the payload is a counter this won't work either. Keep in mind
>>> that this has the semantic of a semaphore. Whoever grabs the
>>> semaphore first wins and can run, everybody else has to wait.
>>
>>
>> What performs the "grab" here?
>>
>> I thought that would be vkQueueSubmit().
>>
>> Since that occuring from a single application thread, that should
>> then be ordered in execution of ctx1,ctx2,ctx1,...
>>
>>
>> Thanks for your time on this,
>>
>>
>> -Lionel
>>
>>
>>>
>>> Then it proceeds to signal it by replacing the internal payload.
>>>
>>>
>>> That's an implementation detail of our sync objects, but I don't
>>> think that this behavior is part of the Vulkan specification.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>> ctx2 then waits on that and replaces the payload again with the
>>> new internal synchronization object.
>>>
>>>
>>> The internal payload is a dma fence in our case and signaling
>>> just replaces a dma fence by another or puts one where there was
>>> none before.
>>>
>>> So we should have created a dependecy link between all the
>>> submissions and then should be executed in the order of
>>> QueueSubmit() calls.
>>>
>>>
>>> -Lionel
>>>
>>>
>>>
>>> It only prevents running both at the same time and as far as
>>> I can see that still works even with threaded submission.
>>>
>>> You need at least two semaphores for a tandem submission.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>
>>> This way there can't be any Synchronisation between
>>> the two.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>>> <lionel.g.landwerlin at intel.com>
>>> <mailto:lionel.g.landwerlin at intel.com>:
>>> Hey Christian,
>>>
>>> The problem boils down to the fact that we don't
>>> immediately create dma fences when calling
>>> vkQueueSubmit().
>>> This is delayed to a thread.
>>>
>>> From a single application thread, you can
>>> QueueSubmit() to 2 queues from 2 different devices.
>>> Each QueueSubmit to one queue has a dependency on
>>> the previous QueueSubmit on the other queue through
>>> an exported/imported semaphore.
>>>
>>> From the API point of view the state of the
>>> semaphore should be changed after each QueueSubmit().
>>> The problem is that it's not because of the thread
>>> and because you might have those 2 submission
>>> threads tied to different VkDevice/VkInstance or
>>> even different applications (synchronizing
>>> themselves outside the vulkan API).
>>>
>>> Hope that makes sense.
>>> It's not really easy to explain by mail, the best
>>> explanation is probably reading the test :
>>> https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>>>
>>> Like David mentioned you're not running into that
>>> issue right now, because you only dispatch to the
>>> thread under specific conditions.
>>> But I could build a case to force that and likely
>>> run into the same issue.
>>>
>>> -Lionel
>>>
>>> On 02/08/2019 07:33, Koenig, Christian wrote:
>>>
>>> Hi Lionel,
>>>
>>> Well could you describe once more what the
>>> problem is?
>>>
>>> Cause I don't fully understand why a rather
>>> normal tandem submission with two semaphores
>>> should fail in any way.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>>> <lionel.g.landwerlin at intel.com>
>>> <mailto:lionel.g.landwerlin at intel.com>:
>>> There aren't CTS tests covering the issue I was
>>> mentioning.
>>> But we could add them.
>>>
>>> I don't have all the details regarding your
>>> implementation but even with
>>> the "semaphore thread", I could see it running
>>> into the same issues.
>>> What if a mix of binary & timeline semaphores
>>> are handed to vkQueueSubmit()?
>>>
>>> For example with queueA & queueB from 2
>>> different VkDevice :
>>> vkQueueSubmit(queueA, signal semA);
>>> vkQueueSubmit(queueA, wait on [semA,
>>> timelineSemB]); with
>>> timelineSemB triggering a wait before signal.
>>> vkQueueSubmit(queueB, signal semA);
>>>
>>>
>>> -Lionel
>>>
>>> On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>>> > Hi Lionel,
>>> >
>>> > By the Queue thread is a heavy thread, which
>>> is always resident in driver during application
>>> running, our guys don't like that. So we switch
>>> to Semaphore Thread, only when waitBeforeSignal
>>> of timeline happens, we spawn a thread to handle
>>> that wait. So we don't have your this issue.
>>> > By the way, I already pass all your CTS cases
>>> for now. I suggest you to switch to Semaphore
>>> Thread instead of Queue Thread as well. It works
>>> very well.
>>> >
>>> > -David
>>> >
>>> > -----Original Message-----
>>> > From: Lionel Landwerlin
>>> <lionel.g.landwerlin at intel.com>
>>> <mailto:lionel.g.landwerlin at intel.com>
>>> > Sent: Friday, August 2, 2019 4:52 AM
>>> > To: dri-devel
>>> <dri-devel at lists.freedesktop.org>
>>> <mailto:dri-devel at lists.freedesktop.org>;
>>> Koenig, Christian <Christian.Koenig at amd.com>
>>> <mailto:Christian.Koenig at amd.com>; Zhou,
>>> David(ChunMing) <David1.Zhou at amd.com>
>>> <mailto:David1.Zhou at amd.com>; Jason Ekstrand
>>> <jason at jlekstrand.net> <mailto:jason at jlekstrand.net>
>>> > Subject: Threaded submission & semaphore sharing
>>> >
>>> > Hi Christian, David,
>>> >
>>> > Sorry to report this so late in the process,
>>> but I think we found an issue not directly
>>> related to syncobj timelines themselves but with
>>> a side effect of the threaded submissions.
>>> >
>>> > Essentially we're failing a test in crucible :
>>> > func.sync.semaphore-fd.opaque-fd
>>> > This test create a single binary semaphore,
>>> shares it between 2 VkDevice/VkQueue.
>>> > Then in a loop it proceeds to submit workload
>>> alternating between the 2 VkQueue with one
>>> submit depending on the other.
>>> > It does so by waiting on the VkSemaphore
>>> signaled in the previous iteration and
>>> resignaling it.
>>> >
>>> > The problem for us is that once things are
>>> dispatched to the submission thread, the
>>> ordering of the submission is lost.
>>> > Because we have 2 devices and they both have
>>> their own submission thread.
>>> >
>>> > Jason suggested that we reestablish the
>>> ordering by having semaphores/syncobjs carry an
>>> additional uint64_t payload.
>>> > This 64bit integer would represent be an
>>> identifier that submission threads will
>>> WAIT_FOR_AVAILABLE on.
>>> >
>>> > The scenario would look like this :
>>> > - vkQueueSubmit(queueA, signal on semA);
>>> > - in the caller thread, this would
>>> increment the syncobj additional u64 payload and
>>> return it to userspace.
>>> > - at some point the submission
>>> thread of queueA submits the workload and signal
>>> the syncobj of semA with value returned in the
>>> caller thread of vkQueueSubmit().
>>> > - vkQueueSubmit(queueB, wait on semA);
>>> > - in the caller thread, this would
>>> read the syncobj additional
>>> > u64 payload
>>> > - at some point the submission
>>> thread of queueB will try to submit the work,
>>> but first it will WAIT_FOR_AVAILABLE the u64
>>> value returned in the step above
>>> >
>>> > Because we want the binary semaphores to be
>>> shared across processes and would like this to
>>> remain a single FD, the simplest location to
>>> store this additional u64 payload would be the
>>> DRM syncobj.
>>> > It would need an additional ioctl to read &
>>> increment the value.
>>> >
>>> > What do you think?
>>> >
>>> > -Lionel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190802/f4e89308/attachment-0001.html>
More information about the dri-devel
mailing list