<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    Am 15.09.23 um 16:49 schrieb Felix Kuehling:<br>
    <blockquote type="cite"
      cite="mid:1c4c51ad-a042-983a-ff66-cddfbb917c6f@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div class="moz-cite-prefix">On 2023-09-15 6:19, Christian König
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:39f0c4d9-0959-73ed-9bca-43a342fb906a@gmail.com">Am
        15.09.23 um 10:53 schrieb Lang Yu: <br>
        <blockquote type="cite">On 09/14/ , Felix Kuehling wrote: <br>
          <blockquote type="cite">On 2023-09-14 10:02, Christian König
            wrote: <br>
          </blockquote>
          Do we still need to use legacy flush to emulate heavyweight
          flush <br>
          if we don't use SVM? And can I push this now? <br>
        </blockquote>
        <br>
        Felix needs to decide that. From what I understand the KFD needs
        heavyweight flushes for secure SVM operation. <br>
      </blockquote>
      <p>Yes. We need to be able to guarantee to the kernel, that the
        GPU will not access unmapped memory. There are two strategies in
        the driver to do this:</p>
      <ol>
        <li>Preempt GPU queues (which implies a heavy-weight TLB flush
          in the scheduler firmware)</li>
        <li>Invalidate page table entries and flush TLBs<br>
        </li>
      </ol>
      <p>#1 happens during MMU notifiers with XNACK off. #2 happens in
        MMU notifiers with XNACK on (not supported on GFX10.x) and when
        unified memory us munmapped. It's that last part I'm worried
        about. When memory is munmapped and given back to the OS, we
        need to be able to guarantee that the GPU won't access it any
        more. The same is true when GTT BOs and userptr BOs are freed.
        After unmapping them from the GPU page tables, we need a
        heavy-weight flush. I believe the same should apply to the
        graphics driver, but maybe that's implied through the CS and
        fence mechanisms that keep memory allocated while the GPU is
        accessing it.<br>
      </p>
      <p>A legacy flush has a slim chance of not being sufficient
        because memory accesses using old addresses can still be in
        flight in the GPU.<br>
      </p>
      <br>
      <blockquote type="cite"
        cite="mid:39f0c4d9-0959-73ed-9bca-43a342fb906a@gmail.com"> <br>
        If heavyweight flushes are buggy papering over that by using
        legacy flushes is only a mediocre workaround. <br>
      </blockquote>
      <p>I agree. I'd like to avoid half-baked workarounds that will
        cause more headaches later on. I started an internal email
        thread with Tony to understand the requirements for heavy-weight
        flushes on the affected GPUs and find a better workaround.</p>
    </blockquote>
    <br>
    Thanks, then this patch should be put on hold until that stuff is
    cleared up.<br>
    <br>
    Regards,<br>
    Christian.<br>
    <br>
    <blockquote type="cite"
      cite="mid:1c4c51ad-a042-983a-ff66-cddfbb917c6f@amd.com">
      <p>Regards,<br>
          Felix<br>
      </p>
      <p><br>
      </p>
      <blockquote type="cite"
        cite="mid:39f0c4d9-0959-73ed-9bca-43a342fb906a@gmail.com"> <br>
        Regards, <br>
        Christian. <br>
        <br>
        <blockquote type="cite"> <br>
          Regards, <br>
          Lang <br>
          <br>
          <br>
          <blockquote type="cite">
            <blockquote type="cite">Am 14.09.23 um 15:59 schrieb Felix
              Kuehling: <br>
              <blockquote type="cite">On 2023-09-14 9:39, Christian
                König wrote: <br>
                <blockquote type="cite">Is a single legacy flush
                  sufficient to emulate an heavyweight <br>
                  flush as well? <br>
                  <br>
                  On previous generations we needed to issue at least
                  two legacy <br>
                  flushes for this. <br>
                </blockquote>
                I assume you are referring to the Vega20 XGMI
                workaround. That is a <br>
                very different issue. Because PTEs would be cached in
                L2, we had to <br>
                always use a heavy-weight flush that would also flush
                the L2 cache <br>
                as well, and follow that with another legacy flush to
                deal with race <br>
                conditions where stale PTEs could be re-fetched from L2
                before the <br>
                L2 flush was complete. <br>
              </blockquote>
              No, we also have another (badly documented) workaround
              which issues a <br>
              legacy flush before each heavy weight on some hw
              generations. See the my <br>
              TLB flush cleanup patches. <br>
              <br>
              <blockquote type="cite">A heavy-weight flush guarantees
                that there are no more possible <br>
                memory accesses using the old PTEs. With physically
                addressed caches <br>
                on GFXv9 that includes a cache flush because the address
                translation <br>
                happened before putting data into the cache. I think the
                address <br>
                translation and cache architecture works differently on
                GFXv10. So <br>
                maybe the cache-flush is not required here. <br>
                <br>
                But even then a legacy flush probably allows for
                in-flight memory <br>
                accesses with old physical addresses to complete after
                the TLB <br>
                flush. So there is a small risk of memory corruption
                that was <br>
                assumed to not be accessed by the GPU any more. Or when
                using IOMMU <br>
                device isolation it would result in IOMMU faults if the
                DMA mappings <br>
                are invalidated slightly too early. <br>
              </blockquote>
              Mhm, that's quite bad. Any idea how to avoid that? <br>
            </blockquote>
            A few ideas <br>
            <br>
              * Add an arbitrary delay and hope that it is longer than
            the FIFOs in <br>
                the HW <br>
              * Execute an atomic operation to memory on some GPU engine
            that could <br>
                act as a fence, maybe just a RELEASE_MEM on the CP to
            some writeback <br>
                location would do the job <br>
              * If needed, RELEASE_MEM could also perform a cache flush
            <br>
            <br>
            Regards, <br>
               Felix <br>
            <br>
            <br>
            <blockquote type="cite">Regards, <br>
              Christian. <br>
              <br>
              <blockquote type="cite">Regards, <br>
                   Felix <br>
                <br>
                <br>
                <blockquote type="cite">And please don't push before
                  getting an rb from Felix as well. <br>
                  <br>
                  Regards, <br>
                  Christian. <br>
                  <br>
                  <br>
                  Am 14.09.23 um 11:23 schrieb Lang Yu: <br>
                  <blockquote type="cite">cyan_skilfish has problems
                    with other flush types. <br>
                    <br>
                    v2: fix incorrect ternary conditional operator
                    usage.(Yifan) <br>
                    <br>
                    Signed-off-by: Lang Yu <a
                      class="moz-txt-link-rfc2396E"
                      href="mailto:Lang.Yu@amd.com"
                      moz-do-not-send="true"><Lang.Yu@amd.com></a>
                    <br>
                    Cc: <a class="moz-txt-link-rfc2396E"
                      href="mailto:stable@vger.kernel.org"
                      moz-do-not-send="true"><stable@vger.kernel.org></a>
                    # v5.15+ <br>
                    --- <br>
                       drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7
                    ++++++- <br>
                       1 file changed, 6 insertions(+), 1 deletion(-) <br>
                    <br>
                    diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                    <br>
                    b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c <br>
                    index d3da13f4c80e..c6d11047169a 100644 <br>
                    --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c <br>
                    +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c <br>
                    @@ -236,7 +236,8 @@ static void <br>
                    gmc_v10_0_flush_vm_hub(struct amdgpu_device *adev,
                    uint32_t <br>
                    vmid, <br>
                       { <br>
                           bool use_semaphore = <br>
                    gmc_v10_0_use_invalidate_semaphore(adev, vmhub); <br>
                           struct amdgpu_vmhub *hub =
                    &adev->vmhub[vmhub]; <br>
                    -    u32 inv_req = <br>
                    hub->vmhub_funcs->get_invalidate_req(vmid,
                    flush_type); <br>
                    +    u32 inv_req =
                    hub->vmhub_funcs->get_invalidate_req(vmid, <br>
                    +              (adev->asic_type !=
                    CHIP_CYAN_SKILLFISH) ? <br>
                    flush_type : 0); <br>
                           u32 tmp; <br>
                           /* Use register 17 for GART */ <br>
                           const unsigned int eng = 17; <br>
                    @@ -331,6 +332,8 @@ static void <br>
                    gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev,
                    uint32_t <br>
                    vmid, <br>
                             int r; <br>
                       +    flush_type = (adev->asic_type !=
                    CHIP_CYAN_SKILLFISH) <br>
                    ? flush_type : 0; <br>
                    + <br>
                           /* flush hdp cache */ <br>
                           adev->hdp.funcs->flush_hdp(adev, NULL);
                    <br>
                       @@ -426,6 +429,8 @@ static int <br>
                    gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device
                    *adev, <br>
                           struct amdgpu_ring *ring =
                    &adev->gfx.kiq[0].ring; <br>
                           struct amdgpu_kiq *kiq =
                    &adev->gfx.kiq[0]; <br>
                       +    flush_type = (adev->asic_type !=
                    CHIP_CYAN_SKILLFISH) <br>
                    ? flush_type : 0; <br>
                    + <br>
                           if (amdgpu_emu_mode == 0 &&
                    ring->sched.ready) { <br>
                              
                    spin_lock(&adev->gfx.kiq[0].ring_lock); <br>
                               /* 2 dwords flush + 8 dwords fence */ <br>
                  </blockquote>
                </blockquote>
              </blockquote>
            </blockquote>
          </blockquote>
        </blockquote>
        <br>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>