[Mesa-dev] bad performance issue in GPU & CPU data sharing

Mon Jun 7 08:53:23 UTC 2021

> -----Original Message-----
> From: Palli, Tapani <tapani.palli at intel.com>
> Sent: Thursday, June 3, 2021 1:23 PM
> To: Zong, Wei <wei.zong at intel.com>; mesa-dev at lists.freedesktop.org
> Subject: Re: [Mesa-dev] bad performance issue in GPU & CPU data sharing
> 
> Hi;
> 
> On 5/31/21 12:33 PM, Zong, Wei wrote:
> > Hello,
> >
> > I'm using GLES shader to run algorithms on image frames, I got very
> > bad performance issue in GPU & CPU data sharing, especially retrieve
> > data from GPU to CPU.
> >
> > Basically, I use
> > */glGenBuffers/*/*/glBindBuffer/*/*/glBufferData(target, size, data,
> > usage) /*to create GPU buffer object and initialize GPU data store
> > with CPU data pointer. After GLES shader finished the processing, I
> > use /*glMapBufferRange*///to retrieve processed image data back to
> > CPU, and for some reason I have to do an extra data copy from the gl
> > map pointer to another CPU buffer, this is super slow.
> >
> > Here's the code snippet
> > https://github.com/intel/libxcam/blob/master/modules/gles/gl_buffer.cp
> > p#L94
> > <https://github.com/intel/libxcam/blob/master/modules/gles/gl_buffer.c
> > pp#L94>
> >
> > https://github.com/intel/libxcam/blob/master/modules/gles/gl_buffer.cp
> > p#L127
> > <https://github.com/intel/libxcam/blob/master/modules/gles/gl_buffer.c
> > pp#L127>
> >
> > I wonder If there has other efficient way to sharing data between CPU
> > & GPU GLES shader?
> >
> > Thanks,
> >
> > Zong Wei
> >
> 
> Could you break down the use-case here a bit, why do you need CPU access to
> the image? If I understand correctly, is it so that camera pipeline renders to a
> dmabuf and then this is imported to GLES for processing and then you map it to
> CPU for ... something?
> 
> Thanks;
> 
> // Tapani

Hi Tapani,
I got multiple input video frames from decoder, copied these frames to egl buffers, used GLES shader to stitch these frames into one high resolution frame (8K), then I mapped the stitched frame to CPU, encoded the frame into h246 stream. I don't have GPU encoder to handle 8K frame.

I noticed glMapBufferRange Api description from https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glMapBufferRange.xhtml
"Mappings to the data stores of buffer objects may have nonstandard performance characteristics. For example, such mappings may be marked as uncacheable regions of memory, and in such cases reading from them may be very slow. To ensure optimal performance, the client should use the mapping in a fashion consistent with the values of GL_BUFFER_USAGE for the buffer object and of access. Using a mapping in a fashion inconsistent with these values is liable to be multiple orders of magnitude slower than using normal memory."

Seems I should find a way to map the buffer as cacheable memory. How to map a cacheable memory? And how to use DMA buffering?

Thanks,
Zong Wei