[Libva] performances question with libva on i5 sandy bridge

Thu Jan 22 20:14:25 PST 2015

On Thu, 2015-01-22 at 08:16 +0100, Gilles Chanteperdrix wrote:
> On Thu, Jan 22, 2015 at 09:00:38AM +0800, Xiang, Haihao wrote:
> > On Wed, 2015-01-21 at 08:46 +0100, Gilles Chanteperdrix wrote:
> > > On Wed, Jan 21, 2015 at 02:32:34PM +0800, Xiang, Haihao wrote:
> > > > On Mon, 2015-01-19 at 07:37 +0100, Gilles Chanteperdrix wrote:
> > > > > On Mon, Jan 19, 2015 at 10:34:37AM +0800, Xiang, Haihao wrote:
> > > > > > On Mon, 2015-01-19 at 00:20 +0100, Gilles Chanteperdrix wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > I am testing libva with ffmpeg on Linux to decode h264 video. 
> > > > > > > Linux version is 3.4.29 
> > > > > > > FFmpeg version is 2.5.3 
> > > > > > > Mesa version is 10.4.0 
> > > > > > > libdrm version is 2.4.58 
> > > > > > > libva version is 1.5.0
> > > > > > > 
> > > > > > > From what I could gather from the documentation and examples, using
> > > > > > > vaDeriveImage should be preferred if it is available. However, I
> > > > > > > have compared, with top, the CPU consumed, and I observe that the
> > > > > > > following code:
> > > > > > > 
> > > > > > > #ifdef USE_VADERIVEIMAGE
> > > > > > > 	vrc = vaDeriveImage(ctx->display, buf->surface_id, &va_image);
> > > > > > > 	CHECK_VASTATUS(vrc, "vaDeriveImage");
> > > > > > > #else
> > > > > > > 	vrc = vaGetImage(ctx->display, buf->surface_id,
> > > > > > > 			0, 0, cctx->coded_width, cctx->coded_height,
> > > > > > > 			va_image.image_id);
> > > > > > > 	CHECK_VASTATUS(vrc, "vaGetImage");
> > > > > > > #endif
> > > > > > > 
> > > > > > > 	vrc = vaMapBuffer(ctx->display, va_image.buf, &data);
> > > > > > > 	CHECK_VASTATUS(vrc, "vaMapBuffer");
> > > > > > > 
> > > > > > > 	memcpy(f->img[0], data + va_image.offsets[0],
> > > > > > > 		va_image.pitches[0] * cctx->coded_height);
> > > > > > > 	memcpy(f->img[1], data + va_image.offsets[1],
> > > > > > > 		va_image.pitches[1] * cctx->coded_height / 2);
> > > > > > > 
> > > > > > > 	vrc = vaUnmapBuffer(ctx->display, va_image.buf);
> > > > > > > 	CHECK_VASTATUS(vrc, "vaUnmapBuffer");
> > > > > > > 
> > > > > > > #ifdef USE_VADERIVEIMAGE
> > > > > > > 	vrc = vaDestroyImage(ctx->display, va_image.image_id);
> > > > > > > 	CHECK_VASTATUS(vrc, "vaDestroyImage");
> > > > > > > #endif
> > > > > > > 
> > > > > > > Results in a higher cpu consumption if compiled with
> > > > > > > USE_VADERIVEIMAGE. Is this normal, or is there something I am doing
> > > > > > > wrong? I can provide the complete code if needed.
> > > > > > 
> > > > > > It depends on the the underlying memory format. Most surfaces used in
> > > > > > the driver are tiled, so the derived images are tiled too, the memory
> > > > > > returned is uncached and reading data from it would be slow. If the
> > > > > > image isn't tiled, the returned memory is cached. 
> > > > > 
> > > > > Ok. Thanks for the explanation.
> > > > > 
> > > > > Is the result of vaMapBuffer always uncached, or only for
> > > > > a VA image obtained with vaDeriveImage ? 
> > > > 
> > > > You could try the following patch if you want to the result of
> > > > vaMapBuffer() on an Image is uncached.
> > > > http://lists.freedesktop.org/archives/libva/attachments/20140617/d9cc4b3c/attachment.bin
> > > 
> > > The patch applies to the 1.5.0 release with some offset.
> > > With this patch applied, I get the same (bad) performances with or
> > > without using vaDeriveImage.
> > > 
> > > But after some tests, I have found that the solution which avoids
> > > the most copies for display on opengl is to use vaCopySurfaceGLX.
> > > 
> > > Unfortunately, it is is specific to GLX, while it works fine with
> > > the i965 diver, it does not seem to work with the vdpau base va
> > > driver and I have read yesterday that EGL is "the new thing", so I
> > > am going to look into EGL. If someone has some example code for
> > > running EGL on Linux with the XOrg server, I am interested.
> > 
> > If you want to use VA surface or VA image with external APIs like EGL,
> > the best way is to use vaAcquireBufferHandle() to export the low level
> > buffer handle (drm flink handle or prime fd), then you can use this
> > handle with external APIs. You can refer to some examples in libyami
> > https://github.com/01org/libyami
> 
> Ok, thanks. I began trying EGL, but have a more basic problem: my
> test program receives a segmentation fault when it tries to call
> glCreateShader.

Could you file a bug to libyami if it doesn't work for you ?

Thanks
Haihao

>