<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Hey David,</div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix">On 02/08/2019 12:11, zhoucm1 wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>Hi Lionel,</p>
      <p>For binary semaphore, I guess every one will think application
        will guarantee wait is behind the signal, whenever the semaphore
        is shared or used in internal-process. <br>
      </p>
      <p>I think below two options can fix your problem:<br>
      </p>
      <p>a. Can we extend vkWaitForFence so that it can be able to wait
        on fence-available? If fence is available, then it's safe to do
        semaphore wait in vkQueueSubmit.</p>
    </blockquote>
    <p><br>
    </p>
    <p>I'm sorry, but I don't understand what vkWaitForFence() has to do
      with this problem.</p>
    <p>They test case we're struggling with doesn't use that API.</p>
    <p><br>
    </p>
    <p>Can you maybe explain a bit more how it relates?<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
      <p>b. Make waitBeforeSignal is valid for binary semaphore as well,
        as that way, It is reasonable to add wait/signal counting for
        binary syncobj.<br>
      </p>
    </blockquote>
    <p><br>
    </p>
    <p>Yeah essentially the change we're proposing internally makes
      binary semaphores use syncobj timelines.</p>
    <p>There is just another u64 associated with them.<br>
    </p>
    <p><br>
    </p>
    <p>-Lionel<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:d23455fe-c74a-2ee0-a954-af86963e4d2f@amd.com">
      <p> </p>
      <p><br>
      </p>
      <p>-David<br>
      </p>
      <br>
      <div class="moz-cite-prefix">On 2019年08月02日 14:27, Lionel
        Landwerlin wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:9bd985bb-1dfb-b28d-e1da-efa5b41464c8@intel.com">
        <div class="moz-cite-prefix">On 02/08/2019 09:10, Koenig,
          Christian wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
          <div dir="auto">
            <div><br>
              <div class="gmail_extra"><br>
                <div class="gmail_quote">Am 02.08.2019 07:38 schrieb
                  Lionel Landwerlin <a class="moz-txt-link-rfc2396E"
                    href="mailto:lionel.g.landwerlin@intel.com"
                    moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a>:<br
                    type="attribution">
                  <blockquote class="quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div>
                      <div>On 02/08/2019 08:21, Koenig, Christian wrote:<br>
                      </div>
                      <blockquote>
                        <div dir="auto">
                          <div><br>
                            <div><br>
                              <div class="elided-text">Am 02.08.2019
                                07:17 schrieb Lionel Landwerlin <a
                                  href="mailto:lionel.g.landwerlin@intel.com"
                                  moz-do-not-send="true">
                                  <lionel.g.landwerlin@intel.com></a>:<br
                                  type="attribution">
                                <blockquote style="margin:0 0 0
                                  0.8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  <div>
                                    <div>On 02/08/2019 08:08, Koenig,
                                      Christian wrote:<br>
                                    </div>
                                    <blockquote>
                                      <div dir="auto">Hi Lionel,
                                        <div dir="auto"><br>
                                        </div>
                                        <div dir="auto">Well that looks
                                          more like your test case is
                                          buggy.</div>
                                        <div dir="auto"><br>
                                        </div>
                                        <div dir="auto">According to the
                                          code the ctx1 queue always
                                          waits for sem1 and ctx2 queue
                                          always waits for sem2.</div>
                                      </div>
                                    </blockquote>
                                    <p><br>
                                    </p>
                                    <p>That's supposed to be the same
                                      underlying syncobj because it's
                                      exported from one VkDevice as
                                      opaque FD from sem1 and imported
                                      into sem2.<br>
                                    </p>
                                  </div>
                                </blockquote>
                              </div>
                            </div>
                          </div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">Well than that's still buggy
                            and won't synchronize at all.</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">When ctx1 waits for a
                            semaphore and then signals the same
                            semaphore there is no guarantee that ctx2
                            will run in between jobs.</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">It's perfectly valid in this
                            case to first run all jobs from ctx1 and
                            then all jobs from ctx2.</div>
                        </div>
                      </blockquote>
                      <p><br>
                      </p>
                      <p>That's not really how I see the semaphores
                        working.</p>
                      <p>The spec describe VkSemaphore as an interface
                        to an internal payload opaque to the
                        application.</p>
                      <p><br>
                      </p>
                      <p>When ctx1 waits on the semaphore, it waits on
                        the payload put there by the previous iteration.</p>
                    </div>
                  </blockquote>
                </div>
              </div>
            </div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">And who says that it's not waiting for it's
              own previous payload?</div>
          </div>
        </blockquote>
        <p><br>
        </p>
        <p>That's was I understood from you previous comment : "there is
          no guarantee that ctx2 will run in between jobs"</p>
        <p><br>
        </p>
        <blockquote type="cite"
          cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
          <div dir="auto">
            <div dir="auto"><br>
            </div>
            <div dir="auto">See if the payload is a counter this won't
              work either. Keep in mind that this has the semantic of a
              semaphore. Whoever grabs the semaphore first wins and can
              run, everybody else has to wait.</div>
          </div>
        </blockquote>
        <p><br>
        </p>
        <p>What performs the "grab" here?</p>
        <p>I thought that would be vkQueueSubmit().</p>
        <p>Since that occuring from a single application thread, that
          should then be ordered in execution of ctx1,ctx2,ctx1,...<br>
        </p>
        <p><br>
        </p>
        <p>Thanks for your time on this,</p>
        <p><br>
        </p>
        <p>-Lionel<br>
        </p>
        <p><br>
        </p>
        <blockquote type="cite"
          cite="mid:e2a1839e-1ee1-4ecb-9b18-af338046c0f1@email.android.com">
          <div dir="auto">
            <div dir="auto"><br>
            </div>
            <div dir="auto">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <blockquote class="quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div>
                      <p>Then it proceeds to signal it by replacing the
                        internal payload.</p>
                    </div>
                  </blockquote>
                </div>
              </div>
            </div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">That's an implementation detail of our sync
              objects, but I don't think that this behavior is part of
              the Vulkan specification.</div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">Regards,</div>
            <div dir="auto">Christian.</div>
            <div dir="auto">
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <blockquote class="quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div>
                      <p><br>
                      </p>
                      <p>ctx2 then waits on that and replaces the
                        payload again with the new internal
                        synchronization object.</p>
                      <p><br>
                      </p>
                      <p>The internal payload is a dma fence in our case
                        and signaling just replaces a dma fence by
                        another or puts one where there was none before.</p>
                      <p>So we should have created a dependecy link
                        between all the submissions and then should be
                        executed in the order of QueueSubmit() calls.<br>
                      </p>
                      <p><br>
                      </p>
                      <p>-Lionel<br>
                      </p>
                      <p><br>
                      </p>
                      <blockquote>
                        <div dir="auto">
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">It only prevents running both
                            at the same time and as far as I can see
                            that still works even with threaded
                            submission.</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">You need at least two
                            semaphores for a tandem submission.</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">Regards,</div>
                          <div dir="auto">Christian.</div>
                          <div dir="auto">
                            <div>
                              <div class="elided-text">
                                <blockquote style="margin:0 0 0
                                  0.8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  <div>
                                    <p><br>
                                    </p>
                                    <blockquote>
                                      <div dir="auto">
                                        <div dir="auto"><br>
                                        </div>
                                        <div dir="auto">This way there
                                          can't be any Synchronisation
                                          between the two.</div>
                                        <div dir="auto"><br>
                                        </div>
                                        <div dir="auto">Regards,</div>
                                        <div dir="auto">Christian.</div>
                                      </div>
                                      <div><br>
                                        <div class="elided-text">Am
                                          02.08.2019 06:55 schrieb
                                          Lionel Landwerlin <a
                                            href="mailto:lionel.g.landwerlin@intel.com"
                                            moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br type="attribution">
                                        </div>
                                      </div>
                                      <div>
                                        <div>Hey Christian,</div>
                                        <div><br>
                                        </div>
                                        <div>The problem boils down to
                                          the fact that we don't
                                          immediately create dma fences
                                          when calling vkQueueSubmit().</div>
                                        <div>This is delayed to a
                                          thread.</div>
                                        <div><br>
                                        </div>
                                        <div>From a single application
                                          thread, you can QueueSubmit()
                                          to 2 queues from 2 different
                                          devices.</div>
                                        <div>Each QueueSubmit to one
                                          queue has a dependency on the
                                          previous QueueSubmit on the
                                          other queue through an
                                          exported/imported semaphore.</div>
                                        <div><br>
                                        </div>
                                        <div>From the API point of view
                                          the state of the semaphore
                                          should be changed after each
                                          QueueSubmit().</div>
                                        <div>The problem is that it's
                                          not because of the thread and
                                          because you might have those 2
                                          submission threads tied to
                                          different VkDevice/VkInstance
                                          or even different applications
                                          (synchronizing themselves
                                          outside the vulkan API).</div>
                                        <div><br>
                                        </div>
                                        <div>Hope that makes sense.</div>
                                        <div>It's not really easy to
                                          explain by mail, the best
                                          explanation is probably
                                          reading the test : <a
href="https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788"
                                            moz-do-not-send="true">
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788</a></div>
                                        <div><br>
                                        </div>
                                        <div>Like David mentioned you're
                                          not running into that issue
                                          right now, because you only
                                          dispatch to the thread under
                                          specific conditions.</div>
                                        <div>But I could build a case to
                                          force that and likely run into
                                          the same issue.<br>
                                        </div>
                                        <div><br>
                                        </div>
                                        <div>-Lionel<br>
                                        </div>
                                        <div><br>
                                        </div>
                                        <div>On 02/08/2019 07:33,
                                          Koenig, Christian wrote:<br>
                                        </div>
                                        <blockquote>
                                          <div>
                                            <div dir="auto">Hi Lionel,
                                              <div dir="auto"><br>
                                              </div>
                                              <div dir="auto">Well could
                                                you describe once more
                                                what the problem is?</div>
                                              <div dir="auto"><br>
                                              </div>
                                              <div dir="auto">Cause I
                                                don't fully understand
                                                why a rather normal
                                                tandem submission with
                                                two semaphores should
                                                fail in any way.</div>
                                              <div dir="auto"><br>
                                              </div>
                                              <div dir="auto">Regards,</div>
                                              <div dir="auto">Christian.</div>
                                            </div>
                                            <div><br>
                                              <div>Am 02.08.2019 06:28
                                                schrieb Lionel
                                                Landwerlin <a
                                                  href="mailto:lionel.g.landwerlin@intel.com"
                                                  moz-do-not-send="true">
<lionel.g.landwerlin@intel.com></a>:<br type="attribution">
                                              </div>
                                            </div>
                                          </div>
                                          <font size="2"><span
                                              style="font-size:11pt">
                                              <div>There aren't CTS
                                                tests covering the issue
                                                I was mentioning.<br>
                                                But we could add them.<br>
                                                <br>
                                                I don't have all the
                                                details regarding your
                                                implementation but even
                                                with <br>
                                                the "semaphore thread",
                                                I could see it running
                                                into the same issues.<br>
                                                What if a mix of binary
                                                & timeline
                                                semaphores are handed to
                                                vkQueueSubmit()?<br>
                                                <br>
                                                For example with queueA
                                                & queueB from 2
                                                different VkDevice :<br>
                                                    
                                                vkQueueSubmit(queueA,
                                                signal semA);<br>
                                                    
                                                vkQueueSubmit(queueA,
                                                wait on [semA,
                                                timelineSemB]); with <br>
                                                timelineSemB triggering
                                                a wait before signal.<br>
                                                    
                                                vkQueueSubmit(queueB,
                                                signal semA);<br>
                                                <br>
                                                <br>
                                                -Lionel<br>
                                                <br>
                                                On 02/08/2019 06:18,
                                                Zhou, David(ChunMing)
                                                wrote:<br>
                                                > Hi Lionel,<br>
                                                ><br>
                                                > By the Queue thread
                                                is a heavy thread, which
                                                is always resident in
                                                driver during
                                                application running, our
                                                guys don't like that. So
                                                we switch to Semaphore
                                                Thread, only when
                                                waitBeforeSignal of
                                                timeline happens, we
                                                spawn a thread to handle
                                                that wait. So we don't
                                                have your this issue.<br>
                                                > By the way, I
                                                already pass all your
                                                CTS cases for now. I
                                                suggest you to switch to
                                                Semaphore Thread instead
                                                of Queue Thread as well.
                                                It works very well.<br>
                                                ><br>
                                                > -David<br>
                                                ><br>
                                                > -----Original
                                                Message-----<br>
                                                > From: Lionel
                                                Landwerlin <a
                                                  href="mailto:lionel.g.landwerlin@intel.com"
                                                  moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a><br>
                                                > Sent: Friday,
                                                August 2, 2019 4:52 AM<br>
                                                > To: dri-devel <a
                                                  href="mailto:dri-devel@lists.freedesktop.org"
                                                  moz-do-not-send="true"><dri-devel@lists.freedesktop.org></a>;
                                                Koenig, Christian <a
                                                  href="mailto:Christian.Koenig@amd.com"
                                                  moz-do-not-send="true"><Christian.Koenig@amd.com></a>;
                                                Zhou, David(ChunMing) <a
href="mailto:David1.Zhou@amd.com" moz-do-not-send="true"><David1.Zhou@amd.com></a>;
                                                Jason Ekstrand <a
                                                  href="mailto:jason@jlekstrand.net"
                                                  moz-do-not-send="true">
<jason@jlekstrand.net></a><br>
                                                > Subject: Threaded
                                                submission &
                                                semaphore sharing<br>
                                                ><br>
                                                > Hi Christian,
                                                David,<br>
                                                ><br>
                                                > Sorry to report
                                                this so late in the
                                                process, but I think we
                                                found an issue not
                                                directly related to
                                                syncobj timelines
                                                themselves but with a
                                                side effect of the
                                                threaded submissions.<br>
                                                ><br>
                                                > Essentially we're
                                                failing a test in
                                                crucible :<br>
                                                >
                                                func.sync.semaphore-fd.opaque-fd<br>
                                                > This test create a
                                                single binary semaphore,
                                                shares it between 2
                                                VkDevice/VkQueue.<br>
                                                > Then in a loop it
                                                proceeds to submit
                                                workload alternating
                                                between the 2 VkQueue
                                                with one submit
                                                depending on the other.<br>
                                                > It does so by
                                                waiting on the
                                                VkSemaphore signaled in
                                                the previous iteration
                                                and resignaling it.<br>
                                                ><br>
                                                > The problem for us
                                                is that once things are
                                                dispatched to the
                                                submission thread, the
                                                ordering of the
                                                submission is lost.<br>
                                                > Because we have 2
                                                devices and they both
                                                have their own
                                                submission thread.<br>
                                                ><br>
                                                > Jason suggested
                                                that we reestablish the
                                                ordering by having
                                                semaphores/syncobjs
                                                carry an additional
                                                uint64_t payload.<br>
                                                > This 64bit integer
                                                would represent be an
                                                identifier that
                                                submission threads will
                                                WAIT_FOR_AVAILABLE on.<br>
                                                ><br>
                                                > The scenario would
                                                look like this :<br>
                                                >       -
                                                vkQueueSubmit(queueA,
                                                signal on semA);<br>
                                                >           - in the
                                                caller thread, this
                                                would increment the
                                                syncobj additional u64
                                                payload and return it to
                                                userspace.<br>
                                                >           - at some
                                                point the submission
                                                thread of queueA submits
                                                the workload and signal
                                                the syncobj of semA with
                                                value returned in the
                                                caller thread of
                                                vkQueueSubmit().<br>
                                                >       -
                                                vkQueueSubmit(queueB,
                                                wait on semA);<br>
                                                >           - in the
                                                caller thread, this
                                                would read the syncobj
                                                additional<br>
                                                > u64 payload<br>
                                                >           - at some
                                                point the submission
                                                thread of queueB will
                                                try to submit the work,
                                                but first it will
                                                WAIT_FOR_AVAILABLE the
                                                u64 value returned in
                                                the step above<br>
                                                ><br>
                                                > Because we want the
                                                binary semaphores to be
                                                shared across processes
                                                and would like this to
                                                remain a single FD, the
                                                simplest location to
                                                store this additional
                                                u64 payload would be the
                                                DRM syncobj.<br>
                                                > It would need an
                                                additional ioctl to read
                                                & increment the
                                                value.<br>
                                                ><br>
                                                > What do you think?<br>
                                                ><br>
                                                > -Lionel<br>
                                                <br>
                                                <br>
                                              </div>
                                            </span></font></blockquote>
                                        <p><br>
                                        </p>
                                      </div>
                                    </blockquote>
                                    <p><br>
                                    </p>
                                  </div>
                                </blockquote>
                              </div>
                              <br>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <p><br>
                      </p>
                    </div>
                  </blockquote>
                </div>
                <br>
              </div>
            </div>
          </div>
        </blockquote>
        <p><br>
        </p>
      </blockquote>
      <br>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>