[Libva] Parse and render in different threads?

Wed Mar 20 06:14:21 PDT 2013

On Wed, Mar 20, 2013 at 9:02 AM, Andreas Larsson
<andreas.larsson at multiq.se> wrote:
> Yep ok, that's nice!
>
> Followup question(s):
>
> I'm currently filling in my buffers by calling vaCreateBuffer with NULL as pointer argument to let the server allocate the memory for me, then I use vaMapBuffer when I fill it in. This works very nice and performance is good.
>
> However I see that when I over-allocate my slice buffers the performance drops a bit. Currently I allocate 8kb for each slice and fill it in and then specify in the slice parameters exactly how much of the buffer I actually use. As I understand it the over-allocation shouldn't affect performance, but it does. Are the buffers copied even though I allocate them in the server?

If you are on ironlake or on sandybridge with an old kernel it might
be that you're hitting the clflush overhead of cpu mmaps, which is
proportional to the size of the buffer, not the data actually used.
I'm not sure of the internals and whether libva has an interface to
upload with copies. But if that exists we could use the kernel's
pwrite interface, which has generally higher throughput in this case
and only overhead proportional to the actually written data.

> More over, the documentation says the buffers are automatically de-allocated once they've been sent to vaRenderPicture. Is there any way to re-use the buffers and avoid automatic de-allocation? I really don't see why I should create/destroy all buffers each frame, seems like a complete waste of resources.

Buffers, including any cpu mappings already set up are _very_
aggressively cached internally in libva (actually libdrm) arleady. So
there should be no need for you to do any caching on top.
-Daniel

>
> Kind regards, Andreas Larsson
>
>
> 20 mar 2013 kl. 01:50 skrev "Xiang, Haihao" <haihao.xiang at intel.com>
> :
>
>> On Tue, 2013-03-19 at 08:02 +0000, Andreas Larsson wrote:
>>> Hi!
>>>
>>> Do I have to perform bitstream parsing and vaRenderPicture in separate threads to maintain best performance. I.e. do vaRenderPicture block or are those calls buffered and handled asynchronously by the driver/chip, like OpenGL?
>>
>> VA runs in asynchronous mode.
>>
>>>
>>> As it is, I generate MPEG2 data and for each slice I call vaRenderPicture before I generate the next slice, so if vaRenderPicture blocks this would drain my performance completly.
>>>
>>> Kind regards, Andreas Larsson
>>>
>>> _______________________________________________
>>> Libva mailing list
>>> Libva at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/libva
>>
>>
>
> _______________________________________________
> Libva mailing list
> Libva at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/libva

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch