[Beignet] CL_MEM_USE_HOST_PTR involve extra copy?

Zhigang Gong zhigang.gong at linux.intel.com
Tue Dec 2 18:36:28 PST 2014


On Wed, Dec 03, 2014 at 10:44:26AM +0800, spring_wind wrote:
> 1. clCreateBuffer(CL_MEM_USE_HOST_PTR)
> 2. clMapxx-->change the data pointed by host ptr--->clUnmapxxx
> 
> Is that right?
The usual use scenario is as below:

1. clCreateBuffer(CL_MEM_USE_HOST_PTR)
2. clEnqueueNDRangeKernel() to enqueue some kernel to do some
   computation on GPU side and write to the above buffer.
3. clEnqueueMapBuffer() the buffer, and do some computation
   on CPU side and modify the buffer content pointed by host_ptr.
4. clEnqueueUnmapBuffer() to give the buffer back to GPU.

The step 3 and 4 are synchronization points defined by OpenCL
spec which are mandatory on all platforms to access a buffer
pointed by host_ptr shared by GPU and CPU. At step3, not only
a GPU cache flush (flush from GPU's L3 cache to CPU's LLC cache),
but also need to wait for all the kernel enqueued in step 2
to be finished.


> 
> I think clUnmapxx must contain cache flush operation although it may not involve extra copy, correct me if am wrong.

You are right.
The cache flush is always a must when we switch between GPU
domain and CPU domain. But you can see, the above flush is
not so heavy, as it's not a flush between cache and slow
memory.

> I am very concerned about the beignet performance on our product.

You are welcome to share your detail concerns with us if possible.
And there are also some performance hint in the 'docs/optimization-guide.mdwn'.
We will continue to update that document.

Thanks,
Zhigang Gong.

> 
> 
> 
> 
> 
> 
> 
> 在 2014-12-03 10:14:10,"Guo, Yejun" <yejun.guo at intel.com> 写道:
> 
> 
> Just need to invoke clMapxx function before you access the host ptr from CPU side, and invoke clUnmapxx function after the CPU access. There is no extra copy inside clMap/Unmapxx if all the conditions are satisfied. You can refer to the code in runtime_use_host_ptr_buffer.cpp under beignet/utests.
> 
>  
> 
>  
> 
> From: spring_wind [mailto:spring_wind at yeah.net]
> Sent: Wednesday, December 03, 2014 10:02 AM
> To: Guo, Yejun
> Cc:beignet at lists.freedesktop.org
> Subject: Re:RE: [Beignet] CL_MEM_USE_HOST_PTR involve extra copy?
> 
>  
> 
> You mean if I use CL_MEM_USE_HOST_PTR and host ptr is page aligned, I change the data pointed by host ptr from CPU side, I don't have to do any flush opertaion or I should still call clMapxx function?
> 
> 
> 
> 
> 
> 
> 在 2014-12-03 08:37:15,"Guo, Yejun" <yejun.guo at intel.com> 写道:
> 
> 
> 
> Hi,
> 
>  
> 
> please check the latest code of beignet, there is no copy needed between CPU and GPU if the host_ptr provided by application is page aligned, and the page align limitation is expected to be removed some days later. You can also try CL_MEM_ALLOC_HOST_PTR to avoid the extra copy without align limitation.
> 
>  
> 
> Btw, this zero-copy is supported starting from linux kernel 3.16 and libdrm 2.4.58.
> 
>  
> 
>  
> 
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of spring_wind
> Sent: Tuesday, December 02, 2014 8:38 PM
> To:beignet at lists.freedesktop.org
> Subject: [Beignet] CL_MEM_USE_HOST_PTR involve extra copy?
> 
>  
> 
> Hi:
> 
> Intel opencl optimization guide said using CL_MEM_USE_HOST_PTR can avoid extra copy between CPU and GPU, but I noticed that in beignet's implementation source it was not like that:
> 
> /* Copy the data if required */
> 
>   if (flags & CL_MEM_COPY_HOST_PTR || flags & CL_MEM_USE_HOST_PTR)
> 
>     cl_buffer_subdata(mem->bo, 0, sz, data);
> 
>  
> 
> Could someone give me an answer?
> 
>  
> 
>  

> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet



More information about the Beignet mailing list