[gst-embedded] [gst-devel] [PATCH] audioconvert: add NEON acceleration for some conversions

Mon Aug 10 09:12:16 PDT 2009

On Aug 10, 2009, at 9:59 AM, Sebastian Dröge wrote:

> Am Montag, den 10.08.2009, 09:41 -0500 schrieb Rob Clark:
>> 1) convert default processing functions to __attribute__((weak)) so  
>> they can be overrided with
>>   architecture specific accelerated functions (ie. NEON, MMX,  
>> Altivec, etc)
>> 2) override gst_audio_quantize_quantize_signed_tpdf_none() to use  
>> NEON vector instructions
>> 3) override gst_audio_convert_unpack_float_le() to use NEON vector  
>> instructions
>>
>> This speeds up audioconvert ~10x, at least for the 32b float -> 16b  
>> int conversion needed to play
>> AC-3 audio (ie. DVD's) via ALSA
>
> Hi,
> first of all, could you file a bug for this and attach the bug  
> there? :)

[RC] Hi Sebastian, I just wanted to send patch here, because it might  
be interesting to others working on ARM (armv7) based processors.

liboil / orc based solution is probably better long term solution,  
although I'm not sure of the current state of liboil / orc on armv7.   
That, and I wanted an excuse to teach myself about NEON ;-)

So I don't know if you want to integrate this patch as-is, which is  
why I didn't create an issue in bugzilla yet.  I guess my next side- 
project is to learn a bit more about liboil / orc.

> and then some comments on the patch itself:
> - Don't use __atribute__(weak), it's not portable. Instead use  
> liboil to
>  detect at runtime if the CPU supports a specific instruction set and
>  then use the appropiate function pointer to the unpack/quantize
>  function

[RC] oh, darn.. it was such a clever trick too..

> - Add a configure check to see if the compiler supports the specific
>  instruction set and only compile that ARMv7 code then

[RC] I did put the whole file within a '#ifdef __ARM_NEON__ /  
#endif'.. which should also work even if the compiler supports NEON  
but user doesn't give '-mfpu=neon'.   But I admit that my configure- 
foo is weak, so there is certainly a better way to do this.

> - The start of a buffer might not be 16 byte aligned or what alignment
>  is required by VFP. It's only guaranteed to be aligned to the sample
>  type, i.e. 2 byte aligned for 16 bit samples, etc
>

[RC] AFAIK, VLDR/VSTR doesn't require 128bit alignment, although the  
cycle count is lower for aligned accesses.  So I guess it could be  
made a bit faster by handling alignment a little better.  As-is, it is  
a night and day difference and the gstaudioconvert related functions  
only show up a couple pages down in oprofile output.  Now it is liba52  
that needs some optimization ;-)

> In general this patch is a good idea though, something like this  
> really
> needs to go into audioconvert at critical places for other  
> architectures
> too.
>
> FYI, David Schleef has partially converted audioconvert to use orc[0].
> Together with the orc VFP backend this would obsolete your patch I  
> guess.
>
> [0] http://cgit.freedesktop.org/~ds/gst-plugins-base/log/?h=orc
> <signature.asc><ATT00001.txt><ATT00002.txt>

[RC] ok, I'll check out his patch..  that is almost certainly the  
better long term approach.  I just didn't know what was the current  
state of ORC for NEON/VFP..

BR,
-R