[BUG] Data race when use PACKET3_DMA_DATA?

Chen Lei chenlei18s at ict.ac.cn
Thu Jun 3 00:29:14 UTC 2021

Hi Alex. Thanks for your quick reply. 
I first submit the OpenCL kernel packet and then submit the DMA DATA packet. And the OpenCL kernel reads the value written by the DMA DATA packet. 
If I understand you correctly, that is because the CP engine continues on to process the DMA DATA packet after launching the OpenCL kernel. If so, is there any way to sync the CP engine until the OpenCL kernel is complete?

> -----Original Messages-----
> From: "Alex Deucher" <alexdeucher at gmail.com>
> Sent Time: 2021-06-02 21:37:51 (Wednesday)
> To: "Chen Lei" <chenlei18s at ict.ac.cn>
> Cc: "amd-gfx list" <amd-gfx at lists.freedesktop.org>
> Subject: Re: [BUG] Data race when use PACKET3_DMA_DATA?
> On Wed, Jun 2, 2021 at 8:44 AM Chen Lei <chenlei18s at ict.ac.cn> wrote:
> >
> > Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.
> >
> > I had tested the PM4 dma packet:  PACKET3_DMA_DATA. In most of time, it works.
> >
> > But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet.
> >
> > Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.
> >
> > I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.
> >
> > Was it a hardware bug or did I miss something?
> >
> The CP DMA engine is separate from the various CP micro engines.  When
> there is a DMA DATA packet, the DMA operation is offloaded to the CP
> DMA engine and the CP engine that processed the packet continues on to
> the next packet.  You need to use the ENGINE_SEL and CP_SYNC bits in
> the DMA DATA packet to specify the behavior you want.  The ENGINE_SEL
> bit selects which CP engine processes the packet (PFP or ME) and the
> CP_SYNC bit stops further packet processing on the selected engine
> until the DMA is complete.
> Alex
</chenlei18s at ict.ac.cn></amd-gfx at lists.freedesktop.org></chenlei18s at ict.ac.cn></alexdeucher at gmail.com>

More information about the amd-gfx mailing list