<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 02/08/2019 08:21, Koenig, Christian
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:65e763e7-221c-4073-b777-b84c6d724f40@email.android.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="auto">
        <div><br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">Am 02.08.2019 07:17 schrieb Lionel
              Landwerlin <a class="moz-txt-link-rfc2396E" href="mailto:lionel.g.landwerlin@intel.com"><lionel.g.landwerlin@intel.com></a>:<br
                type="attribution">
              <blockquote class="quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div>
                  <div>On 02/08/2019 08:08, Koenig, Christian wrote:<br>
                  </div>
                  <blockquote>
                    <div dir="auto">Hi Lionel,
                      <div dir="auto"><br>
                      </div>
                      <div dir="auto">Well that looks more like your
                        test case is buggy.</div>
                      <div dir="auto"><br>
                      </div>
                      <div dir="auto">According to the code the ctx1
                        queue always waits for sem1 and ctx2 queue
                        always waits for sem2.</div>
                    </div>
                  </blockquote>
                  <p><br>
                  </p>
                  <p>That's supposed to be the same underlying syncobj
                    because it's exported from one VkDevice as opaque FD
                    from sem1 and imported into sem2.<br>
                  </p>
                </div>
              </blockquote>
            </div>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Well than that's still buggy and won't
          synchronize at all.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">When ctx1 waits for a semaphore and then signals
          the same semaphore there is no guarantee that ctx2 will run in
          between jobs.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">It's perfectly valid in this case to first run
          all jobs from ctx1 and then all jobs from ctx2.</div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>That's not really how I see the semaphores working.</p>
    <p>The spec describe VkSemaphore as an interface to an internal
      payload opaque to the application.</p>
    <p><br>
    </p>
    <p>When ctx1 waits on the semaphore, it waits on the payload put
      there by the previous iteration.</p>
    <p>Then it proceeds to signal it by replacing the internal payload.</p>
    <p><br>
    </p>
    <p>ctx2 then waits on that and replaces the payload again with the
      new internal synchronization object.</p>
    <p><br>
    </p>
    <p>The internal payload is a dma fence in our case and signaling
      just replaces a dma fence by another or puts one where there was
      none before.</p>
    <p>So we should have created a dependecy link between all the
      submissions and then should be executed in the order of
      QueueSubmit() calls.<br>
    </p>
    <p><br>
    </p>
    <p>-Lionel<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:65e763e7-221c-4073-b777-b84c6d724f40@email.android.com">
      <div dir="auto">
        <div dir="auto"><br>
        </div>
        <div dir="auto">It only prevents running both at the same time
          and as far as I can see that still works even with threaded
          submission.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">You need at least two semaphores for a tandem
          submission.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Regards,</div>
        <div dir="auto">Christian.</div>
        <div dir="auto">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <blockquote class="quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div>
                  <p><br>
                  </p>
                  <blockquote>
                    <div dir="auto">
                      <div dir="auto"><br>
                      </div>
                      <div dir="auto">This way there can't be any
                        Synchronisation between the two.</div>
                      <div dir="auto"><br>
                      </div>
                      <div dir="auto">Regards,</div>
                      <div dir="auto">Christian.</div>
                    </div>
                    <div><br>
                      <div class="elided-text">Am 02.08.2019 06:55
                        schrieb Lionel Landwerlin <a
                          href="mailto:lionel.g.landwerlin@intel.com"
                          moz-do-not-send="true">
                          <lionel.g.landwerlin@intel.com></a>:<br
                          type="attribution">
                      </div>
                    </div>
                    <div>
                      <div>Hey Christian,</div>
                      <div><br>
                      </div>
                      <div>The problem boils down to the fact that we
                        don't immediately create dma fences when calling
                        vkQueueSubmit().</div>
                      <div>This is delayed to a thread.</div>
                      <div><br>
                      </div>
                      <div>From a single application thread, you can
                        QueueSubmit() to 2 queues from 2 different
                        devices.</div>
                      <div>Each QueueSubmit to one queue has a
                        dependency on the previous QueueSubmit on the
                        other queue through an exported/imported
                        semaphore.</div>
                      <div><br>
                      </div>
                      <div>From the API point of view the state of the
                        semaphore should be changed after each
                        QueueSubmit().</div>
                      <div>The problem is that it's not because of the
                        thread and because you might have those 2
                        submission threads tied to different
                        VkDevice/VkInstance or even different
                        applications (synchronizing themselves outside
                        the vulkan API).</div>
                      <div><br>
                      </div>
                      <div>Hope that makes sense.</div>
                      <div>It's not really easy to explain by mail, the
                        best explanation is probably reading the test :
                        <a
href="https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788"
                          moz-do-not-send="true">
https://gitlab.freedesktop.org/mesa/crucible/blob/master/src/tests/func/sync/semaphore-fd.c#L788</a></div>
                      <div><br>
                      </div>
                      <div>Like David mentioned you're not running into
                        that issue right now, because you only dispatch
                        to the thread under specific conditions.</div>
                      <div>But I could build a case to force that and
                        likely run into the same issue.<br>
                      </div>
                      <div><br>
                      </div>
                      <div>-Lionel<br>
                      </div>
                      <div><br>
                      </div>
                      <div>On 02/08/2019 07:33, Koenig, Christian wrote:<br>
                      </div>
                      <blockquote>
                        <div>
                          <div dir="auto">Hi Lionel,
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">Well could you describe once
                              more what the problem is?</div>
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">Cause I don't fully
                              understand why a rather normal tandem
                              submission with two semaphores should fail
                              in any way.</div>
                            <div dir="auto"><br>
                            </div>
                            <div dir="auto">Regards,</div>
                            <div dir="auto">Christian.</div>
                          </div>
                          <div><br>
                            <div>Am 02.08.2019 06:28 schrieb Lionel
                              Landwerlin <a
                                href="mailto:lionel.g.landwerlin@intel.com"
                                moz-do-not-send="true">
                                <lionel.g.landwerlin@intel.com></a>:<br
                                type="attribution">
                            </div>
                          </div>
                        </div>
                        <font size="2"><span style="font-size:11pt">
                            <div>There aren't CTS tests covering the
                              issue I was mentioning.<br>
                              But we could add them.<br>
                              <br>
                              I don't have all the details regarding
                              your implementation but even with <br>
                              the "semaphore thread", I could see it
                              running into the same issues.<br>
                              What if a mix of binary & timeline
                              semaphores are handed to vkQueueSubmit()?<br>
                              <br>
                              For example with queueA & queueB from
                              2 different VkDevice :<br>
                                   vkQueueSubmit(queueA, signal semA);<br>
                                   vkQueueSubmit(queueA, wait on [semA,
                              timelineSemB]); with <br>
                              timelineSemB triggering a wait before
                              signal.<br>
                                   vkQueueSubmit(queueB, signal semA);<br>
                              <br>
                              <br>
                              -Lionel<br>
                              <br>
                              On 02/08/2019 06:18, Zhou, David(ChunMing)
                              wrote:<br>
                              > Hi Lionel,<br>
                              ><br>
                              > By the Queue thread is a heavy
                              thread, which is always resident in driver
                              during application running, our guys don't
                              like that. So we switch to Semaphore
                              Thread, only when waitBeforeSignal of
                              timeline happens, we spawn a thread to
                              handle that wait. So we don't have your
                              this issue.<br>
                              > By the way, I already pass all your
                              CTS cases for now. I suggest you to switch
                              to Semaphore Thread instead of Queue
                              Thread as well. It works very well.<br>
                              ><br>
                              > -David<br>
                              ><br>
                              > -----Original Message-----<br>
                              > From: Lionel Landwerlin <a
                                href="mailto:lionel.g.landwerlin@intel.com"
                                moz-do-not-send="true"><lionel.g.landwerlin@intel.com></a><br>
                              > Sent: Friday, August 2, 2019 4:52 AM<br>
                              > To: dri-devel <a
                                href="mailto:dri-devel@lists.freedesktop.org"
                                moz-do-not-send="true"><dri-devel@lists.freedesktop.org></a>;
                              Koenig, Christian
                              <a href="mailto:Christian.Koenig@amd.com"
                                moz-do-not-send="true"><Christian.Koenig@amd.com></a>;
                              Zhou, David(ChunMing)
                              <a href="mailto:David1.Zhou@amd.com"
                                moz-do-not-send="true"><David1.Zhou@amd.com></a>;
                              Jason Ekstrand <a
                                href="mailto:jason@jlekstrand.net"
                                moz-do-not-send="true">
                                <jason@jlekstrand.net></a><br>
                              > Subject: Threaded submission &
                              semaphore sharing<br>
                              ><br>
                              > Hi Christian, David,<br>
                              ><br>
                              > Sorry to report this so late in the
                              process, but I think we found an issue not
                              directly related to syncobj timelines
                              themselves but with a side effect of the
                              threaded submissions.<br>
                              ><br>
                              > Essentially we're failing a test in
                              crucible :<br>
                              > func.sync.semaphore-fd.opaque-fd<br>
                              > This test create a single binary
                              semaphore, shares it between 2
                              VkDevice/VkQueue.<br>
                              > Then in a loop it proceeds to submit
                              workload alternating between the 2 VkQueue
                              with one submit depending on the other.<br>
                              > It does so by waiting on the
                              VkSemaphore signaled in the previous
                              iteration and resignaling it.<br>
                              ><br>
                              > The problem for us is that once
                              things are dispatched to the submission
                              thread, the ordering of the submission is
                              lost.<br>
                              > Because we have 2 devices and they
                              both have their own submission thread.<br>
                              ><br>
                              > Jason suggested that we reestablish
                              the ordering by having semaphores/syncobjs
                              carry an additional uint64_t payload.<br>
                              > This 64bit integer would represent be
                              an identifier that submission threads will
                              WAIT_FOR_AVAILABLE on.<br>
                              ><br>
                              > The scenario would look like this :<br>
                              >       - vkQueueSubmit(queueA, signal
                              on semA);<br>
                              >           - in the caller thread,
                              this would increment the syncobj
                              additional u64 payload and return it to
                              userspace.<br>
                              >           - at some point the
                              submission thread of queueA submits the
                              workload and signal the syncobj of semA
                              with value returned in the caller thread
                              of vkQueueSubmit().<br>
                              >       - vkQueueSubmit(queueB, wait on
                              semA);<br>
                              >           - in the caller thread,
                              this would read the syncobj additional<br>
                              > u64 payload<br>
                              >           - at some point the
                              submission thread of queueB will try to
                              submit the work, but first it will
                              WAIT_FOR_AVAILABLE the u64 value returned
                              in the step above<br>
                              ><br>
                              > Because we want the binary semaphores
                              to be shared across processes and would
                              like this to remain a single FD, the
                              simplest location to store this additional
                              u64 payload would be the DRM syncobj.<br>
                              > It would need an additional ioctl to
                              read & increment the value.<br>
                              ><br>
                              > What do you think?<br>
                              ><br>
                              > -Lionel<br>
                              <br>
                              <br>
                            </div>
                          </span></font></blockquote>
                      <p><br>
                      </p>
                    </div>
                  </blockquote>
                  <p><br>
                  </p>
                </div>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>