Threaded submission & semaphore sharing
zhoucm1
zhoucm1 at amd.com
Fri Aug 2 10:01:24 UTC 2019
On 2019年08月02日 17:41, Lionel Landwerlin wrote:
> Hey David,
>
> On 02/08/2019 12:11, zhoucm1 wrote:
>>
>> Hi Lionel,
>>
>> For binary semaphore, I guess every one will think application will
>> guarantee wait is behind the signal, whenever the semaphore is shared
>> or used in internal-process.
>>
>> I think below two options can fix your problem:
>>
>> a. Can we extend vkWaitForFence so that it can be able to wait on
>> fence-available? If fence is available, then it's safe to do
>> semaphore wait in vkQueueSubmit.
>>
>
> I'm sorry, but I don't understand what vkWaitForFence() has to do with
> this problem.
>
> They test case we're struggling with doesn't use that API.
>
>
> Can you maybe explain a bit more how it relates?
>
vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()
vkQueueSubmit()
vkWaitForFenceAvailable()
Sorry, that could lead application program very ugly.
>
>
>> b. Make waitBeforeSignal is valid for binary semaphore as well, as
>> that way, It is reasonable to add wait/signal counting for binary
>> syncobj.
>>
>
> Yeah essentially the change we're proposing internally makes binary
> semaphores use syncobj timelines.
>
Will you raise up a MR to add the language of waitBeforeSignal support
of binary semaphore to vulkan spec?
-David
>
> There is just another u64 associated with them.
>
>
> -Lionel
>
>
>>
>> -David
>>
>>
>> On 2019年08月02日 14:27, Lionel Landwerlin wrote:
>>> On 02/08/2019 09:10, Koenig, Christian wrote:
>>>>
>>>>
>>>> Am 02.08.2019 07:38 schrieb Lionel Landwerlin
>>>> <lionel.g.landwerlin at intel.com>:
>>>>
>>>> On 02/08/2019 08:21, Koenig, Christian wrote:
>>>>
>>>>
>>>>
>>>> Am 02.08.2019 07:17 schrieb Lionel Landwerlin
>>>> <lionel.g.landwerlin at intel.com>
>>>> <mailto:lionel.g.landwerlin at intel.com>:
>>>>
>>>> On 02/08/2019 08:08, Koenig, Christian wrote:
>>>>
>>>> Hi Lionel,
>>>>
>>>> Well that looks more like your test case is buggy.
>>>>
>>>> According to the code the ctx1 queue always waits
>>>> for sem1 and ctx2 queue always waits for sem2.
>>>>
>>>>
>>>> That's supposed to be the same underlying syncobj
>>>> because it's exported from one VkDevice as opaque FD
>>>> from sem1 and imported into sem2.
>>>>
>>>>
>>>> Well than that's still buggy and won't synchronize at all.
>>>>
>>>> When ctx1 waits for a semaphore and then signals the same
>>>> semaphore there is no guarantee that ctx2 will run in
>>>> between jobs.
>>>>
>>>> It's perfectly valid in this case to first run all jobs
>>>> from ctx1 and then all jobs from ctx2.
>>>>
>>>>
>>>> That's not really how I see the semaphores working.
>>>>
>>>> The spec describe VkSemaphore as an interface to an internal
>>>> payload opaque to the application.
>>>>
>>>>
>>>> When ctx1 waits on the semaphore, it waits on the payload put
>>>> there by the previous iteration.
>>>>
>>>>
>>>> And who says that it's not waiting for it's own previous payload?
>>>
>>>
>>> That's was I understood from you previous comment : "there is no
>>> guarantee that ctx2 will run in between jobs"
>>>
>>>
>>>>
>>>> See if the payload is a counter this won't work either. Keep in
>>>> mind that this has the semantic of a semaphore. Whoever grabs the
>>>> semaphore first wins and can run, everybody else has to wait.
>>>
>>>
>>> What performs the "grab" here?
>>>
>>> I thought that would be vkQueueSubmit().
>>>
>>> Since that occuring from a single application thread, that should
>>> then be ordered in execution of ctx1,ctx2,ctx1,...
>>>
>>>
>>> Thanks for your time on this,
>>>
>>>
>>> -Lionel
>>>
>>>
>>>>
>>>> Then it proceeds to signal it by replacing the internal payload.
>>>>
>>>>
>>>> That's an implementation detail of our sync objects, but I don't
>>>> think that this behavior is part of the Vulkan specification.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>> ctx2 then waits on that and replaces the payload again with the
>>>> new internal synchronization object.
>>>>
>>>>
>>>> The internal payload is a dma fence in our case and signaling
>>>> just replaces a dma fence by another or puts one where there
>>>> was none before.
>>>>
>>>> So we should have created a dependecy link between all the
>>>> submissions and then should be executed in the order of
>>>> QueueSubmit() calls.
>>>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>
>>>> It only prevents running both at the same time and as far
>>>> as I can see that still works even with threaded submission.
>>>>
>>>> You need at least two semaphores for a tandem submission.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>
>>>> This way there can't be any Synchronisation between
>>>> the two.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 02.08.2019 06:55 schrieb Lionel Landwerlin
>>>> <lionel.g.landwerlin at intel.com>
>>>> <mailto:lionel.g.landwerlin at intel.com>:
>>>> Hey Christian,
>>>>
>>>> The problem boils down to the fact that we don't
>>>> immediately create dma fences when calling
>>>> vkQueueSubmit().
>>>> This is delayed to a thread.
>>>>
>>>> From a single application thread, you can
>>>> QueueSubmit() to 2 queues from 2 different devices.
>>>> Each QueueSubmit to one queue has a dependency on
>>>> the previous QueueSubmit on the other queue through
>>>> an exported/imported semaphore.
>>>>
>>>> From the API point of view the state of the
>>>> semaphore should be changed after each QueueSubmit().
>>>> The problem is that it's not because of the thread
>>>> and because you might have those 2 submission
>>>> threads tied to different VkDevice/VkInstance or
>>>> even different applications (synchronizing
>>>> themselves outside the vulkan API).
>>>>
>>>> Hope that makes sense.
>>>> It's not really easy to explain by mail, the best
>>>> explanation is probably reading the test :
>>>> https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788
>>>>
>>>> Like David mentioned you're not running into that
>>>> issue right now, because you only dispatch to the
>>>> thread under specific conditions.
>>>> But I could build a case to force that and likely
>>>> run into the same issue.
>>>>
>>>> -Lionel
>>>>
>>>> On 02/08/2019 07:33, Koenig, Christian wrote:
>>>>
>>>> Hi Lionel,
>>>>
>>>> Well could you describe once more what the
>>>> problem is?
>>>>
>>>> Cause I don't fully understand why a rather
>>>> normal tandem submission with two semaphores
>>>> should fail in any way.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 02.08.2019 06:28 schrieb Lionel Landwerlin
>>>> <lionel.g.landwerlin at intel.com>
>>>> <mailto:lionel.g.landwerlin at intel.com>:
>>>> There aren't CTS tests covering the issue I was
>>>> mentioning.
>>>> But we could add them.
>>>>
>>>> I don't have all the details regarding your
>>>> implementation but even with
>>>> the "semaphore thread", I could see it running
>>>> into the same issues.
>>>> What if a mix of binary & timeline semaphores
>>>> are handed to vkQueueSubmit()?
>>>>
>>>> For example with queueA & queueB from 2
>>>> different VkDevice :
>>>> vkQueueSubmit(queueA, signal semA);
>>>> vkQueueSubmit(queueA, wait on [semA,
>>>> timelineSemB]); with
>>>> timelineSemB triggering a wait before signal.
>>>> vkQueueSubmit(queueB, signal semA);
>>>>
>>>>
>>>> -Lionel
>>>>
>>>> On 02/08/2019 06:18, Zhou, David(ChunMing) wrote:
>>>> > Hi Lionel,
>>>> >
>>>> > By the Queue thread is a heavy thread, which
>>>> is always resident in driver during application
>>>> running, our guys don't like that. So we switch
>>>> to Semaphore Thread, only when waitBeforeSignal
>>>> of timeline happens, we spawn a thread to
>>>> handle that wait. So we don't have your this issue.
>>>> > By the way, I already pass all your CTS cases
>>>> for now. I suggest you to switch to Semaphore
>>>> Thread instead of Queue Thread as well. It
>>>> works very well.
>>>> >
>>>> > -David
>>>> >
>>>> > -----Original Message-----
>>>> > From: Lionel Landwerlin
>>>> <lionel.g.landwerlin at intel.com>
>>>> <mailto:lionel.g.landwerlin at intel.com>
>>>> > Sent: Friday, August 2, 2019 4:52 AM
>>>> > To: dri-devel
>>>> <dri-devel at lists.freedesktop.org>
>>>> <mailto:dri-devel at lists.freedesktop.org>;
>>>> Koenig, Christian <Christian.Koenig at amd.com>
>>>> <mailto:Christian.Koenig at amd.com>; Zhou,
>>>> David(ChunMing) <David1.Zhou at amd.com>
>>>> <mailto:David1.Zhou at amd.com>; Jason Ekstrand
>>>> <jason at jlekstrand.net>
>>>> <mailto:jason at jlekstrand.net>
>>>> > Subject: Threaded submission & semaphore sharing
>>>> >
>>>> > Hi Christian, David,
>>>> >
>>>> > Sorry to report this so late in the process,
>>>> but I think we found an issue not directly
>>>> related to syncobj timelines themselves but
>>>> with a side effect of the threaded submissions.
>>>> >
>>>> > Essentially we're failing a test in crucible :
>>>> > func.sync.semaphore-fd.opaque-fd
>>>> > This test create a single binary semaphore,
>>>> shares it between 2 VkDevice/VkQueue.
>>>> > Then in a loop it proceeds to submit workload
>>>> alternating between the 2 VkQueue with one
>>>> submit depending on the other.
>>>> > It does so by waiting on the VkSemaphore
>>>> signaled in the previous iteration and
>>>> resignaling it.
>>>> >
>>>> > The problem for us is that once things are
>>>> dispatched to the submission thread, the
>>>> ordering of the submission is lost.
>>>> > Because we have 2 devices and they both have
>>>> their own submission thread.
>>>> >
>>>> > Jason suggested that we reestablish the
>>>> ordering by having semaphores/syncobjs carry an
>>>> additional uint64_t payload.
>>>> > This 64bit integer would represent be an
>>>> identifier that submission threads will
>>>> WAIT_FOR_AVAILABLE on.
>>>> >
>>>> > The scenario would look like this :
>>>> > - vkQueueSubmit(queueA, signal on semA);
>>>> > - in the caller thread, this would
>>>> increment the syncobj additional u64 payload
>>>> and return it to userspace.
>>>> > - at some point the submission
>>>> thread of queueA submits the workload and
>>>> signal the syncobj of semA with value returned
>>>> in the caller thread of vkQueueSubmit().
>>>> > - vkQueueSubmit(queueB, wait on semA);
>>>> > - in the caller thread, this would
>>>> read the syncobj additional
>>>> > u64 payload
>>>> > - at some point the submission
>>>> thread of queueB will try to submit the work,
>>>> but first it will WAIT_FOR_AVAILABLE the u64
>>>> value returned in the step above
>>>> >
>>>> > Because we want the binary semaphores to be
>>>> shared across processes and would like this to
>>>> remain a single FD, the simplest location to
>>>> store this additional u64 payload would be the
>>>> DRM syncobj.
>>>> > It would need an additional ioctl to read &
>>>> increment the value.
>>>> >
>>>> > What do you think?
>>>> >
>>>> > -Lionel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190802/e4617973/attachment-0001.html>
More information about the dri-devel
mailing list