<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi Lionel,</p>
    <p>For binary semaphore, I guess every one will think application
      will guarantee wait is behind the signal, whenever the semaphore
      is shared or used in internal-process. <br>
    </p>
    <p>I think below two options can fix your problem:<br>
    </p>
    <p>a. Can we extend vkWaitForFence so that it can be able to wait on
      fence-available? If fence is available, then it's safe to do
      semaphore wait in vkQueueSubmit.</p>
    <p>b. Make waitBeforeSignal is valid for binary semaphore as well,
      as that way, It is reasonable to add wait/signal counting for
      binary syncobj.<br>
    </p>
    <p><br>
    </p>
    <p>-David<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 2019年08月02日 14:27, Lionel Landwerlin
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:9bd985bb-1dfb-b28d-e1da-efa5b41464c8@intel.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">On 02/08/2019 09:10, Koenig,
        Christian wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
        <div dir="auto">
          <div><br>
            <div class="gmail_extra"><br>
              <div class="gmail_quote">Am 02.08.2019 07:38 schrieb
                Lionel Landwerlin <a class="moz-txt-link-rfc2396E"
                  href="mailto:lionel.g.landwerlin@intel.com"
                  moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a>:<br
                  type="attribution">
                <blockquote class="quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div>
                    <div>On 02/08/2019 08:21, Koenig, Christian wrote:<br>
                    </div>
                    <blockquote>
                      <div dir="auto">
                        <div><br>
                          <div><br>
                            <div class="elided-text">Am 02.08.2019 07:17
                              schrieb Lionel Landwerlin <a
                                href="mailto:lionel.g.landwerlin@intel.com"
                                moz-do-not-send="true">
                                <lionel.g.landwerlin@intel.com></a>:<br
                                type="attribution">
                              <blockquote style="margin:0 0 0
                                0.8ex;border-left:1px #ccc
                                solid;padding-left:1ex">
                                <div>
                                  <div>On 02/08/2019 08:08, Koenig,
                                    Christian wrote:<br>
                                  </div>
                                  <blockquote>
                                    <div dir="auto">Hi Lionel,
                                      <div dir="auto"><br>
                                      </div>
                                      <div dir="auto">Well that looks
                                        more like your test case is
                                        buggy.</div>
                                      <div dir="auto"><br>
                                      </div>
                                      <div dir="auto">According to the
                                        code the ctx1 queue always waits
                                        for sem1 and ctx2 queue always
                                        waits for sem2.</div>
                                    </div>
                                  </blockquote>
                                  <p><br>
                                  </p>
                                  <p>That's supposed to be the same
                                    underlying syncobj because it's
                                    exported from one VkDevice as opaque
                                    FD from sem1 and imported into sem2.<br>
                                  </p>
                                </div>
                              </blockquote>
                            </div>
                          </div>
                        </div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">Well than that's still buggy and
                          won't synchronize at all.</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">When ctx1 waits for a semaphore
                          and then signals the same semaphore there is
                          no guarantee that ctx2 will run in between
                          jobs.</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">It's perfectly valid in this
                          case to first run all jobs from ctx1 and then
                          all jobs from ctx2.</div>
                      </div>
                    </blockquote>
                    <p><br>
                    </p>
                    <p>That's not really how I see the semaphores
                      working.</p>
                    <p>The spec describe VkSemaphore as an interface to
                      an internal payload opaque to the application.</p>
                    <p><br>
                    </p>
                    <p>When ctx1 waits on the semaphore, it waits on the
                      payload put there by the previous iteration.</p>
                  </div>
                </blockquote>
              </div>
            </div>
          </div>
          <div dir="auto"><br>
          </div>
          <div dir="auto">And who says that it's not waiting for it's
            own previous payload?</div>
        </div>
      </blockquote>
      <p><br>
      </p>
      <p>That's was I understood from you previous comment : "there is
        no guarantee that ctx2 will run in between jobs"</p>
      <p><br>
      </p>
      <blockquote type="cite"
        cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
        <div dir="auto">
          <div dir="auto"><br>
          </div>
          <div dir="auto">See if the payload is a counter this won't
            work either. Keep in mind that this has the semantic of a
            semaphore. Whoever grabs the semaphore first wins and can
            run, everybody else has to wait.</div>
        </div>
      </blockquote>
      <p><br>
      </p>
      <p>What performs the "grab" here?</p>
      <p>I thought that would be vkQueueSubmit().</p>
      <p>Since that occuring from a single application thread, that
        should then be ordered in execution of ctx1,ctx2,ctx1,...<br>
      </p>
      <p><br>
      </p>
      <p>Thanks for your time on this,</p>
      <p><br>
      </p>
      <p>-Lionel<br>
      </p>
      <p><br>
      </p>
      <blockquote type="cite"
        cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
        <div dir="auto">
          <div dir="auto"><br>
          </div>
          <div dir="auto">
            <div class="gmail_extra">
              <div class="gmail_quote">
                <blockquote class="quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div>
                    <p>Then it proceeds to signal it by replacing the
                      internal payload.</p>
                  </div>
                </blockquote>
              </div>
            </div>
          </div>
          <div dir="auto"><br>
          </div>
          <div dir="auto">That's an implementation detail of our sync
            objects, but I don't think that this behavior is part of the
            Vulkan specification.</div>
          <div dir="auto"><br>
          </div>
          <div dir="auto">Regards,</div>
          <div dir="auto">Christian.</div>
          <div dir="auto">
            <div class="gmail_extra">
              <div class="gmail_quote">
                <blockquote class="quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div>
                    <p><br>
                    </p>
                    <p>ctx2 then waits on that and replaces the payload
                      again with the new internal synchronization
                      object.</p>
                    <p><br>
                    </p>
                    <p>The internal payload is a dma fence in our case
                      and signaling just replaces a dma fence by another
                      or puts one where there was none before.</p>
                    <p>So we should have created a dependecy link
                      between all the submissions and then should be
                      executed in the order of QueueSubmit() calls.<br>
                    </p>
                    <p><br>
                    </p>
                    <p>-Lionel<br>
                    </p>
                    <p><br>
                    </p>
                    <blockquote>
                      <div dir="auto">
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">It only prevents running both at
                          the same time and as far as I can see that
                          still works even with threaded submission.</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">You need at least two semaphores
                          for a tandem submission.</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">Regards,</div>
                        <div dir="auto">Christian.</div>
                        <div dir="auto">
                          <div>
                            <div class="elided-text">
                              <blockquote style="margin:0 0 0
                                0.8ex;border-left:1px #ccc
                                solid;padding-left:1ex">
                                <div>
                                  <p><br>
                                  </p>
                                  <blockquote>
                                    <div dir="auto">
                                      <div dir="auto"><br>
                                      </div>
                                      <div dir="auto">This way there
                                        can't be any Synchronisation
                                        between the two.</div>
                                      <div dir="auto"><br>
                                      </div>
                                      <div dir="auto">Regards,</div>
                                      <div dir="auto">Christian.</div>
                                    </div>
                                    <div><br>
                                      <div class="elided-text">Am
                                        02.08.2019 06:55 schrieb Lionel
                                        Landwerlin <a
                                          href="mailto:lionel.g.landwerlin@intel.com"
                                          moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br type="attribution">
                                      </div>
                                    </div>
                                    <div>
                                      <div>Hey Christian,</div>
                                      <div><br>
                                      </div>
                                      <div>The problem boils down to the
                                        fact that we don't immediately
                                        create dma fences when calling
                                        vkQueueSubmit().</div>
                                      <div>This is delayed to a thread.</div>
                                      <div><br>
                                      </div>
                                      <div>From a single application
                                        thread, you can QueueSubmit() to
                                        2 queues from 2 different
                                        devices.</div>
                                      <div>Each QueueSubmit to one queue
                                        has a dependency on the previous
                                        QueueSubmit on the other queue
                                        through an exported/imported
                                        semaphore.</div>
                                      <div><br>
                                      </div>
                                      <div>From the API point of view
                                        the state of the semaphore
                                        should be changed after each
                                        QueueSubmit().</div>
                                      <div>The problem is that it's not
                                        because of the thread and
                                        because you might have those 2
                                        submission threads tied to
                                        different VkDevice/VkInstance or
                                        even different applications
                                        (synchronizing themselves
                                        outside the vulkan API).</div>
                                      <div><br>
                                      </div>
                                      <div>Hope that makes sense.</div>
                                      <div>It's not really easy to
                                        explain by mail, the best
                                        explanation is probably reading
                                        the test : <a
href="https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788"
                                          moz-do-not-send="true">
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788</a></div>
                                      <div><br>
                                      </div>
                                      <div>Like David mentioned you're
                                        not running into that issue
                                        right now, because you only
                                        dispatch to the thread under
                                        specific conditions.</div>
                                      <div>But I could build a case to
                                        force that and likely run into
                                        the same issue.<br>
                                      </div>
                                      <div><br>
                                      </div>
                                      <div>-Lionel<br>
                                      </div>
                                      <div><br>
                                      </div>
                                      <div>On 02/08/2019 07:33, Koenig,
                                        Christian wrote:<br>
                                      </div>
                                      <blockquote>
                                        <div>
                                          <div dir="auto">Hi Lionel,
                                            <div dir="auto"><br>
                                            </div>
                                            <div dir="auto">Well could
                                              you describe once more
                                              what the problem is?</div>
                                            <div dir="auto"><br>
                                            </div>
                                            <div dir="auto">Cause I
                                              don't fully understand why
                                              a rather normal tandem
                                              submission with two
                                              semaphores should fail in
                                              any way.</div>
                                            <div dir="auto"><br>
                                            </div>
                                            <div dir="auto">Regards,</div>
                                            <div dir="auto">Christian.</div>
                                          </div>
                                          <div><br>
                                            <div>Am 02.08.2019 06:28
                                              schrieb Lionel Landwerlin
                                              <a
                                                href="mailto:lionel.g.landwerlin@intel.com"
                                                moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br type="attribution">
                                            </div>
                                          </div>
                                        </div>
                                        <font size="2"><span
                                            style="font-size:11pt">
                                            <div>There aren't CTS tests
                                              covering the issue I was
                                              mentioning.<br>
                                              But we could add them.<br>
                                              <br>
                                              I don't have all the
                                              details regarding your
                                              implementation but even
                                              with <br>
                                              the "semaphore thread", I
                                              could see it running into
                                              the same issues.<br>
                                              What if a mix of binary
                                              & timeline semaphores
                                              are handed to
                                              vkQueueSubmit()?<br>
                                              <br>
                                              For example with queueA
                                              & queueB from 2
                                              different VkDevice :<br>
                                                   vkQueueSubmit(queueA,
                                              signal semA);<br>
                                                   vkQueueSubmit(queueA,
                                              wait on [semA,
                                              timelineSemB]); with <br>
                                              timelineSemB triggering a
                                              wait before signal.<br>
                                                   vkQueueSubmit(queueB,
                                              signal semA);<br>
                                              <br>
                                              <br>
                                              -Lionel<br>
                                              <br>
                                              On 02/08/2019 06:18, Zhou,
                                              David(ChunMing) wrote:<br>
                                              > Hi Lionel,<br>
                                              ><br>
                                              > By the Queue thread
                                              is a heavy thread, which
                                              is always resident in
                                              driver during application
                                              running, our guys don't
                                              like that. So we switch to
                                              Semaphore Thread, only
                                              when waitBeforeSignal of
                                              timeline happens, we spawn
                                              a thread to handle that
                                              wait. So we don't have
                                              your this issue.<br>
                                              > By the way, I already
                                              pass all your CTS cases
                                              for now. I suggest you to
                                              switch to Semaphore Thread
                                              instead of Queue Thread as
                                              well. It works very well.<br>
                                              ><br>
                                              > -David<br>
                                              ><br>
                                              > -----Original
                                              Message-----<br>
                                              > From: Lionel
                                              Landwerlin <a
                                                href="mailto:lionel.g.landwerlin@intel.com"
                                                moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a><br>
                                              > Sent: Friday, August
                                              2, 2019 4:52 AM<br>
                                              > To: dri-devel <a
                                                href="mailto:dri-devel@lists.freedesktop.org"
                                                moz-do-not-send="true"><dri-devel@lists.freedesktop.org></a>;
                                              Koenig, Christian <a
                                                href="mailto:Christian.Koenig@amd.com"
                                                moz-do-not-send="true"><Christian.Koenig@amd.com></a>;
                                              Zhou, David(ChunMing) <a
href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"><David1.Zhou@amd.com></a>;
                                              Jason Ekstrand <a
                                                href="mailto:jason@jlekstrand.net"
                                                moz-do-not-send="true">
<jason@jlekstrand.net></a><br>
                                              > Subject: Threaded
                                              submission & semaphore
                                              sharing<br>
                                              ><br>
                                              > Hi Christian, David,<br>
                                              ><br>
                                              > Sorry to report this
                                              so late in the process,
                                              but I think we found an
                                              issue not directly related
                                              to syncobj timelines
                                              themselves but with a side
                                              effect of the threaded
                                              submissions.<br>
                                              ><br>
                                              > Essentially we're
                                              failing a test in crucible
                                              :<br>
                                              >
                                              func.sync.semaphore-fd.opaque-fd<br>
                                              > This test create a
                                              single binary semaphore,
                                              shares it between 2
                                              VkDevice/VkQueue.<br>
                                              > Then in a loop it
                                              proceeds to submit
                                              workload alternating
                                              between the 2 VkQueue with
                                              one submit depending on
                                              the other.<br>
                                              > It does so by waiting
                                              on the VkSemaphore
                                              signaled in the previous
                                              iteration and resignaling
                                              it.<br>
                                              ><br>
                                              > The problem for us is
                                              that once things are
                                              dispatched to the
                                              submission thread, the
                                              ordering of the submission
                                              is lost.<br>
                                              > Because we have 2
                                              devices and they both have
                                              their own submission
                                              thread.<br>
                                              ><br>
                                              > Jason suggested that
                                              we reestablish the
                                              ordering by having
                                              semaphores/syncobjs carry
                                              an additional uint64_t
                                              payload.<br>
                                              > This 64bit integer
                                              would represent be an
                                              identifier that submission
                                              threads will
                                              WAIT_FOR_AVAILABLE on.<br>
                                              ><br>
                                              > The scenario would
                                              look like this :<br>
                                              >       -
                                              vkQueueSubmit(queueA,
                                              signal on semA);<br>
                                              >           - in the
                                              caller thread, this would
                                              increment the syncobj
                                              additional u64 payload and
                                              return it to userspace.<br>
                                              >           - at some
                                              point the submission
                                              thread of queueA submits
                                              the workload and signal
                                              the syncobj of semA with
                                              value returned in the
                                              caller thread of
                                              vkQueueSubmit().<br>
                                              >       -
                                              vkQueueSubmit(queueB, wait
                                              on semA);<br>
                                              >           - in the
                                              caller thread, this would
                                              read the syncobj
                                              additional<br>
                                              > u64 payload<br>
                                              >           - at some
                                              point the submission
                                              thread of queueB will try
                                              to submit the work, but
                                              first it will
                                              WAIT_FOR_AVAILABLE the u64
                                              value returned in the step
                                              above<br>
                                              ><br>
                                              > Because we want the
                                              binary semaphores to be
                                              shared across processes
                                              and would like this to
                                              remain a single FD, the
                                              simplest location to store
                                              this additional u64
                                              payload would be the DRM
                                              syncobj.<br>
                                              > It would need an
                                              additional ioctl to read
                                              & increment the value.<br>
                                              ><br>
                                              > What do you think?<br>
                                              ><br>
                                              > -Lionel<br>
                                              <br>
                                              <br>
                                            </div>
                                          </span></font></blockquote>
                                      <p><br>
                                      </p>
                                    </div>
                                  </blockquote>
                                  <p><br>
                                  </p>
                                </div>
                              </blockquote>
                            </div>
                            <br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <p><br>
                    </p>
                  </div>
                </blockquote>
              </div>
              <br>
            </div>
          </div>
        </div>
      </blockquote>
      <p><br>
      </p>
    </blockquote>
    <br>
  </body>
</html>