<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Am 28.05.20 um 21:35 schrieb Marek
      Olšák:<br>
    </div>
    <blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
      
      <div dir="ltr">
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Thu, May 28, 2020 at 2:12
            PM Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <div>Am 28.05.20 um 18:06 schrieb Marek Olšák:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Thu, May 28,
                      2020 at 10:40 AM Christian König <<a href="mailto:christian.koenig@amd.com" target="_blank" moz-do-not-send="true">christian.koenig@amd.com</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">Am 28.05.20 um
                      12:06 schrieb Michel Dänzer:<br>
                      > On 2020-05-28 11:11 a.m., Christian König
                      wrote:<br>
                      >> Well we still need implicit sync [...]<br>
                      > Yeah, this isn't about "we don't want
                      implicit sync", it's about "amdgpu<br>
                      > doesn't ensure later jobs fully see the
                      effects of previous implicitly<br>
                      > synced jobs", requiring userspace to do
                      pessimistic flushing.<br>
                      <br>
                      Yes, exactly that.<br>
                      <br>
                      For the background: We also do this flushing for
                      explicit syncs. And <br>
                      when this was implemented 2-3 years ago we first
                      did the flushing for <br>
                      implicit sync as well.<br>
                      <br>
                      That was immediately reverted and then implemented
                      differently because <br>
                      it caused severe performance problems in some use
                      cases.<br>
                      <br>
                      I'm not sure of the root cause of this performance
                      problems. My <br>
                      assumption was always that we then insert to many
                      pipeline syncs, but <br>
                      Marek doesn't seem to think it could be that.<br>
                      <br>
                      On the one hand I'm rather keen to remove the
                      extra handling and just <br>
                      always use the explicit handling for everything
                      because it simplifies <br>
                      the kernel code quite a bit. On the other hand I
                      don't want to run into <br>
                      this performance problem again.<br>
                      <br>
                      Additional to that what the kernel does is a
                      "full" pipeline sync, e.g. <br>
                      we busy wait for the full hardware pipeline to
                      drain. That might be <br>
                      overkill if you just want to do some flushing so
                      that the next shader <br>
                      sees the stuff written, but I'm not an expert on
                      that.<br>
                    </blockquote>
                    <div><br>
                    </div>
                    <div>Do we busy-wait on the CPU or in WAIT_REG_MEM?</div>
                    <div><br>
                    </div>
                    <div>WAIT_REG_MEM is what UMDs do and should be
                      faster.</div>
                  </div>
                </div>
              </blockquote>
              <br>
              We use WAIT_REG_MEM to wait for an EOP fence value to
              reach memory.<br>
              <br>
              We use this for a couple of things, especially to make
              sure that the hardware is idle before changing VMID to
              page table associations.<br>
              <br>
              What about your idea of having an extra dw in the shared
              BOs indicating that they are flushed?<br>
              <br>
              As far as I understand it an EOS or other event might be
              sufficient for the caches as well. And you could insert
              the WAIT_REG_MEM directly before the first draw using the
              texture and not before the whole IB.<br>
              <br>
              Could be that we can optimize this even more than what we
              do in the kernel.<br>
              <br>
              Christian.<br>
            </div>
          </blockquote>
          <div><br>
          </div>
          Adding fences into BOs would be bad, because all UMDs would
          have to handle them.</div>
      </div>
    </blockquote>
    <br>
    Yeah, already assumed that this is the biggest blocker.<br>
    <br>
    <blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">Is it possible to do this in the ring
          buffer:</div>
        <div class="gmail_quote">if (fence_signalled) {</div>
        <div class="gmail_quote">
          <div class="gmail_quote">   indirect_buffer(dependent_IB);<br>
          </div>
             indirect_buffer(other_IB);<br>
        </div>
        <div class="gmail_quote">} else {</div>
        <div class="gmail_quote">   indirect_buffer(other_IB);</div>
        <div class="gmail_quote">   wait_reg_mem(fence);<br>
        </div>
        <div class="gmail_quote">   indirect_buffer(dependent_IB);<br>
        </div>
        }</div>
    </blockquote>
    <br>
    That's maybe possible, but at least not easily implementable.<br>
    <br>
    <blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">Or we might have to wait for a hw
          scheduler.<br>
        </div>
      </div>
    </blockquote>
    <br>
    I'm still fine doing the pipeline sync for implicit sync as well, I
    just need somebody to confirm me that this doesn't backfire in some
    case.<br>
    <br>
    <blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote"><br>
        </div>
        <div class="gmail_quote">
          <div class="gmail_quote">Does the kernel sync when the driver
            fd is different, or when the context is different?</div>
        </div>
      </div>
    </blockquote>
    <br>
    Only when the driver fd is different.<br>
    <br>
    Christian.<br>
    <br>
    <blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div class="gmail_quote"><br>
          </div>
        </div>
        <div class="gmail_quote">Marek<br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>