[gst-embedded] [gst-devel] [PATCH] audioconvert: add NEON acceleration for some conversions
Rob Clark
rob at ti.com
Mon Aug 10 09:12:16 PDT 2009
On Aug 10, 2009, at 9:59 AM, Sebastian Dröge wrote:
> Am Montag, den 10.08.2009, 09:41 -0500 schrieb Rob Clark:
>> 1) convert default processing functions to __attribute__((weak)) so
>> they can be overrided with
>> architecture specific accelerated functions (ie. NEON, MMX,
>> Altivec, etc)
>> 2) override gst_audio_quantize_quantize_signed_tpdf_none() to use
>> NEON vector instructions
>> 3) override gst_audio_convert_unpack_float_le() to use NEON vector
>> instructions
>>
>> This speeds up audioconvert ~10x, at least for the 32b float -> 16b
>> int conversion needed to play
>> AC-3 audio (ie. DVD's) via ALSA
>
> Hi,
> first of all, could you file a bug for this and attach the bug
> there? :)
[RC] Hi Sebastian, I just wanted to send patch here, because it might
be interesting to others working on ARM (armv7) based processors.
liboil / orc based solution is probably better long term solution,
although I'm not sure of the current state of liboil / orc on armv7.
That, and I wanted an excuse to teach myself about NEON ;-)
So I don't know if you want to integrate this patch as-is, which is
why I didn't create an issue in bugzilla yet. I guess my next side-
project is to learn a bit more about liboil / orc.
> and then some comments on the patch itself:
> - Don't use __atribute__(weak), it's not portable. Instead use
> liboil to
> detect at runtime if the CPU supports a specific instruction set and
> then use the appropiate function pointer to the unpack/quantize
> function
[RC] oh, darn.. it was such a clever trick too..
> - Add a configure check to see if the compiler supports the specific
> instruction set and only compile that ARMv7 code then
[RC] I did put the whole file within a '#ifdef __ARM_NEON__ /
#endif'.. which should also work even if the compiler supports NEON
but user doesn't give '-mfpu=neon'. But I admit that my configure-
foo is weak, so there is certainly a better way to do this.
> - The start of a buffer might not be 16 byte aligned or what alignment
> is required by VFP. It's only guaranteed to be aligned to the sample
> type, i.e. 2 byte aligned for 16 bit samples, etc
>
[RC] AFAIK, VLDR/VSTR doesn't require 128bit alignment, although the
cycle count is lower for aligned accesses. So I guess it could be
made a bit faster by handling alignment a little better. As-is, it is
a night and day difference and the gstaudioconvert related functions
only show up a couple pages down in oprofile output. Now it is liba52
that needs some optimization ;-)
> In general this patch is a good idea though, something like this
> really
> needs to go into audioconvert at critical places for other
> architectures
> too.
>
> FYI, David Schleef has partially converted audioconvert to use orc[0].
> Together with the orc VFP backend this would obsolete your patch I
> guess.
>
> [0] http://cgit.freedesktop.org/~ds/gst-plugins-base/log/?h=orc
> <signature.asc><ATT00001.txt><ATT00002.txt>
[RC] ok, I'll check out his patch.. that is almost certainly the
better long term approach. I just didn't know what was the current
state of ORC for NEON/VFP..
BR,
-R
More information about the Gstreamer-embedded
mailing list