<p style="font-family:Arial;">
Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.
</p>
<p style="font-family:Arial;">
I had tested the PM4 dma packet: <span style="font-family:Arial;">PACKET3_DMA_DATA. In most of time, it works. </span>
</p>
<p style="font-family:Arial;">
But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet.
</p>
<p style="font-family:Arial;">
Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.
</p>
<p style="font-family:Arial;">
I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.
</p>
<p style="font-family:Arial;">
Was it a hardware bug or did I miss something?
</p>
<p style="font-family:Arial;">
</p>