<div dir="ltr">ping<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 2, 2019 at 2:29 PM Marek Olšák <<a href="mailto:maraeo@gmail.com">maraeo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">From: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank">marek.olsak@amd.com</a>><br>
<br>
This RELEASE_MEM use has the Release semantic, which means we should write<br>
back but not invalidate. Invalidations only make sense with the Acquire<br>
semantic (ACQUIRE_MEM), or when RELEASE_MEM is used to do the combined<br>
Acquire-Release semantic, which is a barrier, not a fence.<br>
<br>
The undesirable side effect of doing invalidations for the Release semantic<br>
is that it invalidates caches while shaders are running, because the Release<br>
can execute in the middle of the next IB.<br>
<br>
UMDs should use ACQUIRE_MEM at the beginning of IBs. Doing cache<br>
invalidations for a fence (like in this case) doesn't do anything<br>
for correctness.<br>
<br>
Signed-off-by: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank">marek.olsak@amd.com</a>><br>
---<br>
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +-----<br>
 1 file changed, 1 insertion(+), 5 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
index 210d24511dc6..a30f5d4913b9 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
@@ -4296,25 +4296,21 @@ static void gfx_v10_0_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,<br>
        bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;<br>
<br>
        /* Interrupt not work fine on GFX10.1 model yet. Use fallback instead */<br>
        if (adev->pdev->device == 0x50)<br>
                int_sel = false;<br>
<br>
        /* RELEASE_MEM - flush caches, send int */<br>
        amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 6));<br>
        amdgpu_ring_write(ring, (PACKET3_RELEASE_MEM_GCR_SEQ |<br>
                                 PACKET3_RELEASE_MEM_GCR_GL2_WB |<br>
-                                PACKET3_RELEASE_MEM_GCR_GL2_INV |<br>
-                                PACKET3_RELEASE_MEM_GCR_GL2_US |<br>
-                                PACKET3_RELEASE_MEM_GCR_GL1_INV |<br>
-                                PACKET3_RELEASE_MEM_GCR_GLV_INV |<br>
-                                PACKET3_RELEASE_MEM_GCR_GLM_INV |<br>
+                                PACKET3_RELEASE_MEM_GCR_GLM_INV | /* must be set with GLM_WB */<br>
                                 PACKET3_RELEASE_MEM_GCR_GLM_WB |<br>
                                 PACKET3_RELEASE_MEM_CACHE_POLICY(3) |<br>
                                 PACKET3_RELEASE_MEM_EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) |<br>
                                 PACKET3_RELEASE_MEM_EVENT_INDEX(5)));<br>
        amdgpu_ring_write(ring, (PACKET3_RELEASE_MEM_DATA_SEL(write64bit ? 2 : 1) |<br>
                                 PACKET3_RELEASE_MEM_INT_SEL(int_sel ? 2 : 0)));<br>
<br>
        /*<br>
         * the address should be Qword aligned if 64bit write, Dword<br>
         * aligned if only send 32bit data low (discard data high)<br>
-- <br>
2.17.1<br>
<br>
</blockquote></div>