<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 2023-09-14 10:02, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:745145aa-76fb-bb17-6065-c5e29c37f3c6@amd.com">
      <br>
      <br>
      Am 14.09.23 um 15:59 schrieb Felix Kuehling:
      <br>
      <blockquote type="cite">
        <br>
        On 2023-09-14 9:39, Christian König wrote:
        <br>
        <blockquote type="cite">Is a single legacy flush sufficient to
          emulate an heavyweight flush as well?
          <br>
          <br>
          On previous generations we needed to issue at least two legacy
          flushes for this.
          <br>
        </blockquote>
        I assume you are referring to the Vega20 XGMI workaround. That
        is a very different issue. Because PTEs would be cached in L2,
        we had to always use a heavy-weight flush that would also flush
        the L2 cache as well, and follow that with another legacy flush
        to deal with race conditions where stale PTEs could be
        re-fetched from L2 before the L2 flush was complete.
        <br>
      </blockquote>
      <br>
      No, we also have another (badly documented) workaround which
      issues a legacy flush before each heavy weight on some hw
      generations. See the my TLB flush cleanup patches.
      <br>
      <br>
      <blockquote type="cite">
        <br>
        A heavy-weight flush guarantees that there are no more possible
        memory accesses using the old PTEs. With physically addressed
        caches on GFXv9 that includes a cache flush because the address
        translation happened before putting data into the cache. I think
        the address translation and cache architecture works differently
        on GFXv10. So maybe the cache-flush is not required here.
        <br>
        <br>
        But even then a legacy flush probably allows for in-flight
        memory accesses with old physical addresses to complete after
        the TLB flush. So there is a small risk of memory corruption
        that was assumed to not be accessed by the GPU any more. Or when
        using IOMMU device isolation it would result in IOMMU faults if
        the DMA mappings are invalidated slightly too early.
        <br>
      </blockquote>
      <br>
      Mhm, that's quite bad. Any idea how to avoid that?
      <br>
    </blockquote>
    <p>A few ideas<br>
    </p>
    <ul>
      <li>Add an arbitrary delay and hope that it is longer than the
        FIFOs in the HW</li>
      <li>Execute an atomic operation to memory on some GPU engine that
        could act as a fence, maybe just a RELEASE_MEM on the CP to some
        writeback location would do the job</li>
      <li>If needed, RELEASE_MEM could also perform a cache flush<br>
      </li>
    </ul>
    <p>Regards,<br>
        Felix</p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:745145aa-76fb-bb17-6065-c5e29c37f3c6@amd.com">
      <br>
      Regards,
      <br>
      Christian.
      <br>
      <br>
      <blockquote type="cite">
        <br>
        Regards,
        <br>
          Felix
        <br>
        <br>
        <br>
        <blockquote type="cite">
          <br>
          And please don't push before getting an rb from Felix as well.
          <br>
          <br>
          Regards,
          <br>
          Christian.
          <br>
          <br>
          <br>
          Am 14.09.23 um 11:23 schrieb Lang Yu:
          <br>
          <blockquote type="cite">cyan_skilfish has problems with other
            flush types.
            <br>
            <br>
            v2: fix incorrect ternary conditional operator usage.(Yifan)
            <br>
            <br>
            Signed-off-by: Lang Yu <a class="moz-txt-link-rfc2396E" href="mailto:Lang.Yu@amd.com"><Lang.Yu@amd.com></a>
            <br>
            Cc: <a class="moz-txt-link-rfc2396E" href="mailto:stable@vger.kernel.org"><stable@vger.kernel.org></a> # v5.15+
            <br>
            ---
            <br>
              drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
            <br>
              1 file changed, 6 insertions(+), 1 deletion(-)
            <br>
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
            b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
            <br>
            index d3da13f4c80e..c6d11047169a 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
            <br>
            @@ -236,7 +236,8 @@ static void
            gmc_v10_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t
            vmid,
            <br>
              {
            <br>
                  bool use_semaphore =
            gmc_v10_0_use_invalidate_semaphore(adev, vmhub);
            <br>
                  struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
            <br>
            -    u32 inv_req =
            hub->vmhub_funcs->get_invalidate_req(vmid,
            flush_type);
            <br>
            +    u32 inv_req =
            hub->vmhub_funcs->get_invalidate_req(vmid,
            <br>
            +              (adev->asic_type != CHIP_CYAN_SKILLFISH) ?
            flush_type : 0);
            <br>
                  u32 tmp;
            <br>
                  /* Use register 17 for GART */
            <br>
                  const unsigned int eng = 17;
            <br>
            @@ -331,6 +332,8 @@ static void
            gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t
            vmid,
            <br>
                    int r;
            <br>
              +    flush_type = (adev->asic_type !=
            CHIP_CYAN_SKILLFISH) ? flush_type : 0;
            <br>
            +
            <br>
                  /* flush hdp cache */
            <br>
                  adev->hdp.funcs->flush_hdp(adev, NULL);
            <br>
              @@ -426,6 +429,8 @@ static int
            gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
            <br>
                  struct amdgpu_ring *ring =
            &adev->gfx.kiq[0].ring;
            <br>
                  struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
            <br>
              +    flush_type = (adev->asic_type !=
            CHIP_CYAN_SKILLFISH) ? flush_type : 0;
            <br>
            +
            <br>
                  if (amdgpu_emu_mode == 0 &&
            ring->sched.ready) {
            <br>
                      spin_lock(&adev->gfx.kiq[0].ring_lock);
            <br>
                      /* 2 dwords flush + 8 dwords fence */
            <br>
          </blockquote>
          <br>
        </blockquote>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>