<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 2019年08月02日 17:41, Lionel Landwerlin
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:3b9f5b4c-827b-dfec-e7c8-a784b0573e85@intel.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">Hey David,</div>
      <div class="moz-cite-prefix"><br>
      </div>
      <div class="moz-cite-prefix">On 02/08/2019 12:11, zhoucm1 wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
        <p>Hi Lionel,</p>
        <p>For binary semaphore, I guess every one will think
          application will guarantee wait is behind the signal, whenever
          the semaphore is shared or used in internal-process. <br>
        </p>
        <p>I think below two options can fix your problem:<br>
        </p>
        <p>a. Can we extend vkWaitForFence so that it can be able to
          wait on fence-available? If fence is available, then it's safe
          to do semaphore wait in vkQueueSubmit.</p>
      </blockquote>
      <p><br>
      </p>
      <p>I'm sorry, but I don't understand what vkWaitForFence() has to
        do with this problem.</p>
      <p>They test case we're struggling with doesn't use that API.</p>
      <p><br>
      </p>
      <p>Can you maybe explain a bit more how it relates?<br>
      </p>
    </blockquote>
    vkQueueSubmit()<br>
    vkWaitForFenceAvailable()<br>
    vkQueueSubmit()<br>
    vkWaitForFenceAvailable()<br>
    vkQueueSubmit()<br>
    vkWaitForFenceAvailable()<br>
    <br>
    Sorry, that could lead application program very ugly.<br>
    <blockquote type="cite"
      cite="mid:3b9f5b4c-827b-dfec-e7c8-a784b0573e85@intel.com">
      <p> </p>
      <p><br>
      </p>
      <blockquote type="cite"
        cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
        <p>b. Make waitBeforeSignal is valid for binary semaphore as
          well, as that way, It is reasonable to add wait/signal
          counting for binary syncobj.<br>
        </p>
      </blockquote>
      <p><br>
      </p>
      <p>Yeah essentially the change we're proposing internally makes
        binary semaphores use syncobj timelines.</p>
    </blockquote>
    Will you raise up a MR to add the language of waitBeforeSignal
    support of binary semaphore to vulkan spec?<br>
    <br>
    -David<br>
    <blockquote type="cite"
      cite="mid:3b9f5b4c-827b-dfec-e7c8-a784b0573e85@intel.com">
      <p>There is just another u64 associated with them.<br>
      </p>
      <p><br>
      </p>
      <p>-Lionel<br>
      </p>
      <p><br>
      </p>
      <blockquote type="cite"
        cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
        <p> </p>
        <p><br>
        </p>
        <p>-David<br>
        </p>
        <br>
        <div class="moz-cite-prefix">On 2019年08月02日 14:27, Lionel
          Landwerlin wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:9bd985bb-1dfb-b28d-e1da-efa5b41464c8@intel.com">
          <div class="moz-cite-prefix">On 02/08/2019 09:10, Koenig,
            Christian wrote:<br>
          </div>
          <blockquote type="cite"
            cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
            <div dir="auto">
              <div><br>
                <div class="gmail_extra"><br>
                  <div class="gmail_quote">Am 02.08.2019 07:38 schrieb
                    Lionel Landwerlin <a class="moz-txt-link-rfc2396E"
                      href="mailto:lionel.g.landwerlin@intel.com"
                      moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a>:<br
                      type="attribution">
                    <blockquote class="quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div>
                        <div>On 02/08/2019 08:21, Koenig, Christian
                          wrote:<br>
                        </div>
                        <blockquote>
                          <div dir="auto">
                            <div><br>
                              <div><br>
                                <div class="elided-text">Am 02.08.2019
                                  07:17 schrieb Lionel Landwerlin <a
                                    href="mailto:lionel.g.landwerlin@intel.com"
                                    moz-do-not-send="true">
                                    <lionel.g.landwerlin@intel.com></a>:<br
                                    type="attribution">
                                  <blockquote style="margin:0 0 0
                                    0.8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    <div>
                                      <div>On 02/08/2019 08:08, Koenig,
                                        Christian wrote:<br>
                                      </div>
                                      <blockquote>
                                        <div dir="auto">Hi Lionel,
                                          <div dir="auto"><br>
                                          </div>
                                          <div dir="auto">Well that
                                            looks more like your test
                                            case is buggy.</div>
                                          <div dir="auto"><br>
                                          </div>
                                          <div dir="auto">According to
                                            the code the ctx1 queue
                                            always waits for sem1 and
                                            ctx2 queue always waits for
                                            sem2.</div>
                                        </div>
                                      </blockquote>
                                      <p><br>
                                      </p>
                                      <p>That's supposed to be the same
                                        underlying syncobj because it's
                                        exported from one VkDevice as
                                        opaque FD from sem1 and imported
                                        into sem2.<br>
                                      </p>
                                    </div>
                                  </blockquote>
                                </div>
                              </div>
                            </div>
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">Well than that's still buggy
                              and won't synchronize at all.</div>
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">When ctx1 waits for a
                              semaphore and then signals the same
                              semaphore there is no guarantee that ctx2
                              will run in between jobs.</div>
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">It's perfectly valid in this
                              case to first run all jobs from ctx1 and
                              then all jobs from ctx2.</div>
                          </div>
                        </blockquote>
                        <p><br>
                        </p>
                        <p>That's not really how I see the semaphores
                          working.</p>
                        <p>The spec describe VkSemaphore as an interface
                          to an internal payload opaque to the
                          application.</p>
                        <p><br>
                        </p>
                        <p>When ctx1 waits on the semaphore, it waits on
                          the payload put there by the previous
                          iteration.</p>
                      </div>
                    </blockquote>
                  </div>
                </div>
              </div>
              <div dir="auto"><br>
              </div>
              <div dir="auto">And who says that it's not waiting for
                it's own previous payload?</div>
            </div>
          </blockquote>
          <p><br>
          </p>
          <p>That's was I understood from you previous comment : "there
            is no guarantee that ctx2 will run in between jobs"</p>
          <p><br>
          </p>
          <blockquote type="cite"
            cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
            <div dir="auto">
              <div dir="auto"><br>
              </div>
              <div dir="auto">See if the payload is a counter this won't
                work either. Keep in mind that this has the semantic of
                a semaphore. Whoever grabs the semaphore first wins and
                can run, everybody else has to wait.</div>
            </div>
          </blockquote>
          <p><br>
          </p>
          <p>What performs the "grab" here?</p>
          <p>I thought that would be vkQueueSubmit().</p>
          <p>Since that occuring from a single application thread, that
            should then be ordered in execution of ctx1,ctx2,ctx1,...<br>
          </p>
          <p><br>
          </p>
          <p>Thanks for your time on this,</p>
          <p><br>
          </p>
          <p>-Lionel<br>
          </p>
          <p><br>
          </p>
          <blockquote type="cite"
            cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
            <div dir="auto">
              <div dir="auto"><br>
              </div>
              <div dir="auto">
                <div class="gmail_extra">
                  <div class="gmail_quote">
                    <blockquote class="quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div>
                        <p>Then it proceeds to signal it by replacing
                          the internal payload.</p>
                      </div>
                    </blockquote>
                  </div>
                </div>
              </div>
              <div dir="auto"><br>
              </div>
              <div dir="auto">That's an implementation detail of our
                sync objects, but I don't think that this behavior is
                part of the Vulkan specification.</div>
              <div dir="auto"><br>
              </div>
              <div dir="auto">Regards,</div>
              <div dir="auto">Christian.</div>
              <div dir="auto">
                <div class="gmail_extra">
                  <div class="gmail_quote">
                    <blockquote class="quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div>
                        <p><br>
                        </p>
                        <p>ctx2 then waits on that and replaces the
                          payload again with the new internal
                          synchronization object.</p>
                        <p><br>
                        </p>
                        <p>The internal payload is a dma fence in our
                          case and signaling just replaces a dma fence
                          by another or puts one where there was none
                          before.</p>
                        <p>So we should have created a dependecy link
                          between all the submissions and then should be
                          executed in the order of QueueSubmit() calls.<br>
                        </p>
                        <p><br>
                        </p>
                        <p>-Lionel<br>
                        </p>
                        <p><br>
                        </p>
                        <blockquote>
                          <div dir="auto">
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">It only prevents running
                              both at the same time and as far as I can
                              see that still works even with threaded
                              submission.</div>
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">You need at least two
                              semaphores for a tandem submission.</div>
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">Regards,</div>
                            <div dir="auto">Christian.</div>
                            <div dir="auto">
                              <div>
                                <div class="elided-text">
                                  <blockquote style="margin:0 0 0
                                    0.8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    <div>
                                      <p><br>
                                      </p>
                                      <blockquote>
                                        <div dir="auto">
                                          <div dir="auto"><br>
                                          </div>
                                          <div dir="auto">This way there
                                            can't be any Synchronisation
                                            between the two.</div>
                                          <div dir="auto"><br>
                                          </div>
                                          <div dir="auto">Regards,</div>
                                          <div dir="auto">Christian.</div>
                                        </div>
                                        <div><br>
                                          <div class="elided-text">Am
                                            02.08.2019 06:55 schrieb
                                            Lionel Landwerlin <a
                                              href="mailto:lionel.g.landwerlin@intel.com"
                                              moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br type="attribution">
                                          </div>
                                        </div>
                                        <div>
                                          <div>Hey Christian,</div>
                                          <div><br>
                                          </div>
                                          <div>The problem boils down to
                                            the fact that we don't
                                            immediately create dma
                                            fences when calling
                                            vkQueueSubmit().</div>
                                          <div>This is delayed to a
                                            thread.</div>
                                          <div><br>
                                          </div>
                                          <div>From a single application
                                            thread, you can
                                            QueueSubmit() to 2 queues
                                            from 2 different devices.</div>
                                          <div>Each QueueSubmit to one
                                            queue has a dependency on
                                            the previous QueueSubmit on
                                            the other queue through an
                                            exported/imported semaphore.</div>
                                          <div><br>
                                          </div>
                                          <div>From the API point of
                                            view the state of the
                                            semaphore should be changed
                                            after each QueueSubmit().</div>
                                          <div>The problem is that it's
                                            not because of the thread
                                            and because you might have
                                            those 2 submission threads
                                            tied to different
                                            VkDevice/VkInstance or even
                                            different applications
                                            (synchronizing themselves
                                            outside the vulkan API).</div>
                                          <div><br>
                                          </div>
                                          <div>Hope that makes sense.</div>
                                          <div>It's not really easy to
                                            explain by mail, the best
                                            explanation is probably
                                            reading the test : <a
href="https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788"
                                              moz-do-not-send="true">
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788</a></div>
                                          <div><br>
                                          </div>
                                          <div>Like David mentioned
                                            you're not running into that
                                            issue right now, because you
                                            only dispatch to the thread
                                            under specific conditions.</div>
                                          <div>But I could build a case
                                            to force that and likely run
                                            into the same issue.<br>
                                          </div>
                                          <div><br>
                                          </div>
                                          <div>-Lionel<br>
                                          </div>
                                          <div><br>
                                          </div>
                                          <div>On 02/08/2019 07:33,
                                            Koenig, Christian wrote:<br>
                                          </div>
                                          <blockquote>
                                            <div>
                                              <div dir="auto">Hi Lionel,
                                                <div dir="auto"><br>
                                                </div>
                                                <div dir="auto">Well
                                                  could you describe
                                                  once more what the
                                                  problem is?</div>
                                                <div dir="auto"><br>
                                                </div>
                                                <div dir="auto">Cause I
                                                  don't fully understand
                                                  why a rather normal
                                                  tandem submission with
                                                  two semaphores should
                                                  fail in any way.</div>
                                                <div dir="auto"><br>
                                                </div>
                                                <div dir="auto">Regards,</div>
                                                <div dir="auto">Christian.</div>
                                              </div>
                                              <div><br>
                                                <div>Am 02.08.2019 06:28
                                                  schrieb Lionel
                                                  Landwerlin <a
                                                    href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br type="attribution">
                                                </div>
                                              </div>
                                            </div>
                                            <font size="2"><span
                                                style="font-size:11pt">
                                                <div>There aren't CTS
                                                  tests covering the
                                                  issue I was
                                                  mentioning.<br>
                                                  But we could add them.<br>
                                                  <br>
                                                  I don't have all the
                                                  details regarding your
                                                  implementation but
                                                  even with <br>
                                                  the "semaphore
                                                  thread", I could see
                                                  it running into the
                                                  same issues.<br>
                                                  What if a mix of
                                                  binary & timeline
                                                  semaphores are handed
                                                  to vkQueueSubmit()?<br>
                                                  <br>
                                                  For example with
                                                  queueA & queueB
                                                  from 2 different
                                                  VkDevice :<br>
                                                      
                                                  vkQueueSubmit(queueA,
                                                  signal semA);<br>
                                                      
                                                  vkQueueSubmit(queueA,
                                                  wait on [semA,
                                                  timelineSemB]); with <br>
                                                  timelineSemB
                                                  triggering a wait
                                                  before signal.<br>
                                                      
                                                  vkQueueSubmit(queueB,
                                                  signal semA);<br>
                                                  <br>
                                                  <br>
                                                  -Lionel<br>
                                                  <br>
                                                  On 02/08/2019 06:18,
                                                  Zhou, David(ChunMing)
                                                  wrote:<br>
                                                  > Hi Lionel,<br>
                                                  ><br>
                                                  > By the Queue
                                                  thread is a heavy
                                                  thread, which is
                                                  always resident in
                                                  driver during
                                                  application running,
                                                  our guys don't like
                                                  that. So we switch to
                                                  Semaphore Thread, only
                                                  when waitBeforeSignal
                                                  of timeline happens,
                                                  we spawn a thread to
                                                  handle that wait. So
                                                  we don't have your
                                                  this issue.<br>
                                                  > By the way, I
                                                  already pass all your
                                                  CTS cases for now. I
                                                  suggest you to switch
                                                  to Semaphore Thread
                                                  instead of Queue
                                                  Thread as well. It
                                                  works very well.<br>
                                                  ><br>
                                                  > -David<br>
                                                  ><br>
                                                  > -----Original
                                                  Message-----<br>
                                                  > From: Lionel
                                                  Landwerlin <a
                                                    href="mailto:lionel.g.landwerlin@intel.com"
moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a><br>
                                                  > Sent: Friday,
                                                  August 2, 2019 4:52 AM<br>
                                                  > To: dri-devel <a
href="mailto:dri-devel@lists.freedesktop.org" moz-do-not-send="true"><dri-devel@lists.freedesktop.org></a>;
                                                  Koenig, Christian <a
href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true"><Christian.Koenig@amd.com></a>;
                                                  Zhou, David(ChunMing)
                                                  <a
                                                    href="mailto:David1.Zhou@amd.com"
moz-do-not-send="true"><David1.Zhou@amd.com></a>; Jason Ekstrand <a
href="mailto:jason@jlekstrand.net" moz-do-not-send="true">
<jason@jlekstrand.net></a><br>
                                                  > Subject: Threaded
                                                  submission &
                                                  semaphore sharing<br>
                                                  ><br>
                                                  > Hi Christian,
                                                  David,<br>
                                                  ><br>
                                                  > Sorry to report
                                                  this so late in the
                                                  process, but I think
                                                  we found an issue not
                                                  directly related to
                                                  syncobj timelines
                                                  themselves but with a
                                                  side effect of the
                                                  threaded submissions.<br>
                                                  ><br>
                                                  > Essentially we're
                                                  failing a test in
                                                  crucible :<br>
                                                  >
                                                  func.sync.semaphore-fd.opaque-fd<br>
                                                  > This test create
                                                  a single binary
                                                  semaphore, shares it
                                                  between 2
                                                  VkDevice/VkQueue.<br>
                                                  > Then in a loop it
                                                  proceeds to submit
                                                  workload alternating
                                                  between the 2 VkQueue
                                                  with one submit
                                                  depending on the
                                                  other.<br>
                                                  > It does so by
                                                  waiting on the
                                                  VkSemaphore signaled
                                                  in the previous
                                                  iteration and
                                                  resignaling it.<br>
                                                  ><br>
                                                  > The problem for
                                                  us is that once things
                                                  are dispatched to the
                                                  submission thread, the
                                                  ordering of the
                                                  submission is lost.<br>
                                                  > Because we have 2
                                                  devices and they both
                                                  have their own
                                                  submission thread.<br>
                                                  ><br>
                                                  > Jason suggested
                                                  that we reestablish
                                                  the ordering by having
                                                  semaphores/syncobjs
                                                  carry an additional
                                                  uint64_t payload.<br>
                                                  > This 64bit
                                                  integer would
                                                  represent be an
                                                  identifier that
                                                  submission threads
                                                  will
                                                  WAIT_FOR_AVAILABLE on.<br>
                                                  ><br>
                                                  > The scenario
                                                  would look like this :<br>
                                                  >       -
                                                  vkQueueSubmit(queueA,
                                                  signal on semA);<br>
                                                  >           - in
                                                  the caller thread,
                                                  this would increment
                                                  the syncobj additional
                                                  u64 payload and return
                                                  it to userspace.<br>
                                                  >           - at
                                                  some point the
                                                  submission thread of
                                                  queueA submits the
                                                  workload and signal
                                                  the syncobj of semA
                                                  with value returned in
                                                  the caller thread of
                                                  vkQueueSubmit().<br>
                                                  >       -
                                                  vkQueueSubmit(queueB,
                                                  wait on semA);<br>
                                                  >           - in
                                                  the caller thread,
                                                  this would read the
                                                  syncobj additional<br>
                                                  > u64 payload<br>
                                                  >           - at
                                                  some point the
                                                  submission thread of
                                                  queueB will try to
                                                  submit the work, but
                                                  first it will
                                                  WAIT_FOR_AVAILABLE the
                                                  u64 value returned in
                                                  the step above<br>
                                                  ><br>
                                                  > Because we want
                                                  the binary semaphores
                                                  to be shared across
                                                  processes and would
                                                  like this to remain a
                                                  single FD, the
                                                  simplest location to
                                                  store this additional
                                                  u64 payload would be
                                                  the DRM syncobj.<br>
                                                  > It would need an
                                                  additional ioctl to
                                                  read & increment
                                                  the value.<br>
                                                  ><br>
                                                  > What do you
                                                  think?<br>
                                                  ><br>
                                                  > -Lionel<br>
                                                  <br>
                                                  <br>
                                                </div>
                                              </span></font></blockquote>
                                          <p><br>
                                          </p>
                                        </div>
                                      </blockquote>
                                      <p><br>
                                      </p>
                                    </div>
                                  </blockquote>
                                </div>
                                <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <p><br>
                        </p>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
          <p><br>
          </p>
        </blockquote>
        <br>
      </blockquote>
      <p><br>
      </p>
    </blockquote>
    <br>
  </body>
</html>