[Beignet] CL_MEM_USE_HOST_PTR involve extra copy?

Zhigang Gong zhigang.gong at linux.intel.com
Tue Dec 2 23:49:34 PST 2014


Forgot to mention, please be noted that the L3 cache in the paper
is LLC indeed but not the L3 cache I mentioned in my previous
note.

On Wed, Dec 03, 2014 at 03:48:18PM +0800, spring_wind wrote:
> Thanks for your reply.
> 
> Now, I really want to know whether clflush instruction evicts all memory line including the Last Level Cache(LLC), if so, hurt performance defenitly.
> 
> Here is paper I searched which said clflush indeed did like that.
> http://eprint.iacr.org/2013/448.pdf
> 
> 
> 
> 
> 
> 
> 
> At 2014-12-03 13:57:36, "Zhigang Gong" <zhigang.gong at linux.intel.com> wrote:
> >On Wed, Dec 03, 2014 at 01:46:51PM +0800, spring_wind wrote:
> >> 
> >> 
> >> I noticed that in kernel's source(3.12.13) drm_clflush_page used clflush instruction.
> >> I wonder whether the clfush instruction only push data to LLC or RAM? If it is RAM, is it needless since CPU and GPU share LLC.
> >
> >Right, they share the LLC, but doesn't share other level of caches,
> >so when swtich from CPU domain to GPU domain, CPU needs to flush CPU's
> >L1/L2 cache to LLC cache. When switch from GPU domain to CPU domain,
> >GPU needs to flush GPU's L3 cache to LLC cache.
> >
> >Please be noted, the L3 cache I mentioned is in GPU domain.
> >> 
> >> We are planning to use beignet to accelerate video analysis and image processing. The data flow will involve input and output between GPU and CPU, we expect it will be not a bottleneck.
> >Although cache flush between L3 <--> LLC <--> L2/L1 is not as heavy
> >as the flush between LLC and memory, it's still not free. Especially
> >if your application need to do that domain switch freqently.
> >
> >> 
> >> 
> >> 
> >> At 2014-12-03 10:36:28, "Zhigang Gong" <zhigang.gong at linux.intel.com> wrote:
> >> >On Wed, Dec 03, 2014 at 10:44:26AM +0800, spring_wind wrote:
> >> >> 1. clCreateBuffer(CL_MEM_USE_HOST_PTR)
> >> >> 2. clMapxx-->change the data pointed by host ptr--->clUnmapxxx
> >> >> 
> >> >> Is that right?
> >> >The usual use scenario is as below:
> >> >
> >> >1. clCreateBuffer(CL_MEM_USE_HOST_PTR)
> >> >2. clEnqueueNDRangeKernel() to enqueue some kernel to do some
> >> >   computation on GPU side and write to the above buffer.
> >> >3. clEnqueueMapBuffer() the buffer, and do some computation
> >> >   on CPU side and modify the buffer content pointed by host_ptr.
> >> >4. clEnqueueUnmapBuffer() to give the buffer back to GPU.
> >> >
> >> >The step 3 and 4 are synchronization points defined by OpenCL
> >> >spec which are mandatory on all platforms to access a buffer
> >> >pointed by host_ptr shared by GPU and CPU. At step3, not only
> >> >a GPU cache flush (flush from GPU's L3 cache to CPU's LLC cache),
> >> >but also need to wait for all the kernel enqueued in step 2
> >> >to be finished.
> >> >
> >> >
> >> >> 
> >> >> I think clUnmapxx must contain cache flush operation although it may not involve extra copy, correct me if am wrong.
> >> >
> >> >You are right.
> >> >The cache flush is always a must when we switch between GPU
> >> >domain and CPU domain. But you can see, the above flush is
> >> >not so heavy, as it's not a flush between cache and slow
> >> >memory.
> >> >
> >> >> I am very concerned about the beignet performance on our product.
> >> >
> >> >You are welcome to share your detail concerns with us if possible.
> >> >And there are also some performance hint in the 'docs/optimization-guide.mdwn'.
> >> >We will continue to update that document.
> >> >
> >> >Thanks,
> >> >Zhigang Gong.
> >> >
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 在 2014-12-03 10:14:10,"Guo, Yejun" <yejun.guo at intel.com> 写道:
> >> >> 
> >> >> 
> >> >> Just need to invoke clMapxx function before you access the host ptr from CPU side, and invoke clUnmapxx function after the CPU access. There is no extra copy inside clMap/Unmapxx if all the conditions are satisfied. You can refer to the code in runtime_use_host_ptr_buffer.cpp under beignet/utests.
> >> >> 
> >> >>  
> >> >> 
> >> >>  
> >> >> 
> >> >> From: spring_wind [mailto:spring_wind at yeah.net]
> >> >> Sent: Wednesday, December 03, 2014 10:02 AM
> >> >> To: Guo, Yejun
> >> >> Cc:beignet at lists.freedesktop.org
> >> >> Subject: Re:RE: [Beignet] CL_MEM_USE_HOST_PTR involve extra copy?
> >> >> 
> >> >>  
> >> >> 
> >> >> You mean if I use CL_MEM_USE_HOST_PTR and host ptr is page aligned, I change the data pointed by host ptr from CPU side, I don't have to do any flush opertaion or I should still call clMapxx function?
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 在 2014-12-03 08:37:15,"Guo, Yejun" <yejun.guo at intel.com> 写道:
> >> >> 
> >> >> 
> >> >> 
> >> >> Hi,
> >> >> 
> >> >>  
> >> >> 
> >> >> please check the latest code of beignet, there is no copy needed between CPU and GPU if the host_ptr provided by application is page aligned, and the page align limitation is expected to be removed some days later. You can also try CL_MEM_ALLOC_HOST_PTR to avoid the extra copy without align limitation.
> >> >> 
> >> >>  
> >> >> 
> >> >> Btw, this zero-copy is supported starting from linux kernel 3.16 and libdrm 2.4.58.
> >> >> 
> >> >>  
> >> >> 
> >> >>  
> >> >> 
> >> >> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of spring_wind
> >> >> Sent: Tuesday, December 02, 2014 8:38 PM
> >> >> To:beignet at lists.freedesktop.org
> >> >> Subject: [Beignet] CL_MEM_USE_HOST_PTR involve extra copy?
> >> >> 
> >> >>  
> >> >> 
> >> >> Hi:
> >> >> 
> >> >> Intel opencl optimization guide said using CL_MEM_USE_HOST_PTR can avoid extra copy between CPU and GPU, but I noticed that in beignet's implementation source it was not like that:
> >> >> 
> >> >> /* Copy the data if required */
> >> >> 
> >> >>   if (flags & CL_MEM_COPY_HOST_PTR || flags & CL_MEM_USE_HOST_PTR)
> >> >> 
> >> >>     cl_buffer_subdata(mem->bo, 0, sz, data);
> >> >> 
> >> >>  
> >> >> 
> >> >> Could someone give me an answer?
> >> >> 
> >> >>  
> >> >> 
> >> >>  
> >> >
> >> >> _______________________________________________
> >> >> Beignet mailing list
> >> >> Beignet at lists.freedesktop.org
> >> >> http://lists.freedesktop.org/mailman/listinfo/beignet
> >> >
> >> >_______________________________________________
> >> >Beignet mailing list
> >> >Beignet at lists.freedesktop.org
> >> >http://lists.freedesktop.org/mailman/listinfo/beignet
> >_______________________________________________
> >Beignet mailing list
> >Beignet at lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/beignet

> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet



More information about the Beignet mailing list