[Libva] performances question with libva on i5 sandy bridge

Wed Jan 21 17:00:38 PST 2015

On Wed, 2015-01-21 at 08:46 +0100, Gilles Chanteperdrix wrote:
> On Wed, Jan 21, 2015 at 02:32:34PM +0800, Xiang, Haihao wrote:
> > On Mon, 2015-01-19 at 07:37 +0100, Gilles Chanteperdrix wrote:
> > > On Mon, Jan 19, 2015 at 10:34:37AM +0800, Xiang, Haihao wrote:
> > > > On Mon, 2015-01-19 at 00:20 +0100, Gilles Chanteperdrix wrote:
> > > > > Hi,
> > > > > 
> > > > > I am testing libva with ffmpeg on Linux to decode h264 video. 
> > > > > Linux version is 3.4.29 
> > > > > FFmpeg version is 2.5.3 
> > > > > Mesa version is 10.4.0 
> > > > > libdrm version is 2.4.58 
> > > > > libva version is 1.5.0
> > > > > 
> > > > > From what I could gather from the documentation and examples, using
> > > > > vaDeriveImage should be preferred if it is available. However, I
> > > > > have compared, with top, the CPU consumed, and I observe that the
> > > > > following code:
> > > > > 
> > > > > #ifdef USE_VADERIVEIMAGE
> > > > > 	vrc = vaDeriveImage(ctx->display, buf->surface_id, &va_image);
> > > > > 	CHECK_VASTATUS(vrc, "vaDeriveImage");
> > > > > #else
> > > > > 	vrc = vaGetImage(ctx->display, buf->surface_id,
> > > > > 			0, 0, cctx->coded_width, cctx->coded_height,
> > > > > 			va_image.image_id);
> > > > > 	CHECK_VASTATUS(vrc, "vaGetImage");
> > > > > #endif
> > > > > 
> > > > > 	vrc = vaMapBuffer(ctx->display, va_image.buf, &data);
> > > > > 	CHECK_VASTATUS(vrc, "vaMapBuffer");
> > > > > 
> > > > > 	memcpy(f->img[0], data + va_image.offsets[0],
> > > > > 		va_image.pitches[0] * cctx->coded_height);
> > > > > 	memcpy(f->img[1], data + va_image.offsets[1],
> > > > > 		va_image.pitches[1] * cctx->coded_height / 2);
> > > > > 
> > > > > 	vrc = vaUnmapBuffer(ctx->display, va_image.buf);
> > > > > 	CHECK_VASTATUS(vrc, "vaUnmapBuffer");
> > > > > 
> > > > > #ifdef USE_VADERIVEIMAGE
> > > > > 	vrc = vaDestroyImage(ctx->display, va_image.image_id);
> > > > > 	CHECK_VASTATUS(vrc, "vaDestroyImage");
> > > > > #endif
> > > > > 
> > > > > Results in a higher cpu consumption if compiled with
> > > > > USE_VADERIVEIMAGE. Is this normal, or is there something I am doing
> > > > > wrong? I can provide the complete code if needed.
> > > > 
> > > > It depends on the the underlying memory format. Most surfaces used in
> > > > the driver are tiled, so the derived images are tiled too, the memory
> > > > returned is uncached and reading data from it would be slow. If the
> > > > image isn't tiled, the returned memory is cached. 
> > > 
> > > Ok. Thanks for the explanation.
> > > 
> > > Is the result of vaMapBuffer always uncached, or only for
> > > a VA image obtained with vaDeriveImage ? 
> > 
> > You could try the following patch if you want to the result of
> > vaMapBuffer() on an Image is uncached.
> > http://lists.freedesktop.org/archives/libva/attachments/20140617/d9cc4b3c/attachment.bin
> 
> The patch applies to the 1.5.0 release with some offset.
> With this patch applied, I get the same (bad) performances with or
> without using vaDeriveImage.
> 
> But after some tests, I have found that the solution which avoids
> the most copies for display on opengl is to use vaCopySurfaceGLX.
> 
> Unfortunately, it is is specific to GLX, while it works fine with
> the i965 diver, it does not seem to work with the vdpau base va
> driver and I have read yesterday that EGL is "the new thing", so I
> am going to look into EGL. If someone has some example code for
> running EGL on Linux with the XOrg server, I am interested.

If you want to use VA surface or VA image with external APIs like EGL,
the best way is to use vaAcquireBufferHandle() to export the low level
buffer handle (drm flink handle or prime fd), then you can use this
handle with external APIs. You can refer to some examples in libyami
https://github.com/01org/libyami

>