audioconvert inefficient when converting only endianness

Wed Feb 1 07:58:36 PST 2012

Hi,

I have two main questions listed near the end, but first here is the 
background information ...

I have been evaluating the CPU load of various GStreamer pipelines. 
In one case of interest, I needed to use audioconvert simply to 
change the endianness of a PCM audio stream. 
Server pipeline:
	alsasrc ! audioconvert ! \
	"audio/x-raw-int, endianness=(int)4321, \
	signed=(boolean)true, width=(int)16, depth=(int)16, \
	rate=(int)44100, channels=(int)2" ! \
	rtpL16pay ! udpsink
Client pipeline:
	udpsrc ! rtpL16depay ! queue2 ! audioconvert ! 
	"audio/x-raw-int, endianness=(int)1234, \
	signed=(boolean)true, width=(int)16, depth=(int)16, \
	rate=(int)44100, channels=(int)2" ! \
	alsasink

I found that the audioconvert plugin was consuming more than 50% 
of the processor budget for the entire chain. When I investigated 
this I noticed that audioconvert was always up-converting the 
input stream to at least 32-bit signed, presumably to retain 
precision for the potential conversions that it would do. But in 
the case that I need, where endianness conversion is the only 
change required, this up-conversion is unnecessary and wasteful 
in terms of CPU load and potentially memory consumption as well.

I implemented a modification (see the end of my post for a diff 
showing my current, rough proposed changes) which first checks 
if endianness conversion is the only required change, and if so 
avoids the type conversion step. The performance improvement is 
dramatic for the 16-bit width case that I've tested. I also 
verified proper operation for 8, 32, and 64-bit width cases, but 
have not compared performance for these cases.

I tried to test the 24-bit width case, which supposedly is a 
valid setting for width, but I always get the error 
"size 4096 is not a multiple of unit size 6" as the pipeline is 
being set up and so have not been able to verify that case. 
I also tried to use "width=24, depth=32" but it did not help. 
(For all the other tests I used settings with depth == width.) 
If someone could explain how to test the 24-bit width case, 
please let me know and I'll give it a try.

Though the 8-bit case works as implemented, I think that it 
could be further optimized if the input buffer could simply be 
passed along as the output buffer rather than being copied. 
I attempted to use gst_buffer_replace(), but that causes the 
gst-launch process to hang. I have seen the 
gst_base_transform_is_passthrough flag in the GstBaseTransform 
object which looks promising, but I'm not sure how or if I could 
use it for this case. I would appreciate a recommendation or a 
reference to example code on the proper method for passing along 
a buffer without change from input to output.

Notice that I had to add a special clause to detect the 8-bit width 
case since the endianness comparison alone was not catching it. 
Endianness really doesn't matter for the 8-bit width case, since 
we are only talking about byte endianness, not bit endianness.

So, here are my main questions:
	1. Is it better to optimize endianness conversion 
		within the audioconvert plugin or 
		to create a new plugin for this explicit purpose?
	2. What is the proper way to pass a buffer from input 
		to output without doing a copy?

Best Regards,
Ed Endejan

P.S. Below are the changes I've tested thus far:
---------------

--- gst-plugins-base-0.10.35/gst/audioconvert/audioconvert.c.org
+++ gst-plugins-base-0.10.35/gst/audioconvert/audioconvert.c
@@ -709,6 +709,81 @@
   return TRUE;
 }
 
+static inline void
+audio_convert_endianness_only (gpointer src, gpointer dst, 
+    gint width, gint samples)
+{
+  /* assumes src & dst already checked for NULL and samples > 0 */
+  /* which is a valid assumption since this is only called from */
+  /* within audio_convert_convert after such checks */
+  switch(width) {
+    case 64: {
+      uint64_t *sp = (uint64_t *)src;
+      uint64_t *dp = (uint64_t *)dst;
+      for(; samples; samples--) {
+        *dp = (*sp << 56) 
+            | ((*sp & 0x000000000000FF00) << 40) 
+            | ((*sp & 0x0000000000FF0000) << 24) 
+            | ((*sp & 0x00000000FF000000) <<  8) 
+            | ((*sp & 0x000000FF00000000) >>  8) 
+            | ((*sp & 0x0000FF0000000000) >> 24) 
+            | ((*sp & 0x00FF000000000000) >> 40) 
+            | (*sp >> 56);
+        sp++;
+        dp++;
+      }
+      break;
+    }
+    case 32: {
+      uint32_t *sp = (uint32_t *)src;
+      uint32_t *dp = (uint32_t *)dst;
+      for(; samples; samples--) {
+        *dp = (*sp << 24) 
+            | ((*sp & 0x0000FF00) << 8) 
+            | ((*sp & 0x00FF0000) >> 8) 
+            | (*sp >> 24);
+        sp++;
+        dp++;
+      }
+      break;
+    }
+#if 0
+    case 24: {
+      /* 24-bit case is not real */
+      /* attempting to use width=24 causes error "size 4096 is not a multiple of unit size 6" */
+      GST_WARNING ("24-bit endianness conversion ignored\n");
+      (void)memcpy((void *)dst, (void *)src, (size_t)(width/8 * samples));
+      /* gst_buffer_replace(&dst, src); */ /* just pass along the unmodified buffer */
+      break;
+    }
+#endif
+    case 16: {
+      uint16_t *sp = (uint16_t *)src;
+      uint16_t *dp = (uint16_t *)dst;
+      for(; samples; samples--) {
+        *dp = (*sp << 8) | (*sp >> 8);
+        sp++;
+        dp++;
+      }
+      break;
+    }
+    case 8: {
+      /* endianness conversion is a nop for 8-bit width */
+      (void)memcpy((void *)dst, (void *)src, (size_t)samples);
+      /* gst_buffer_replace(&dst, src); */ /* just pass along the unmodified buffer */
+      break;
+    }
+#if 0
+    default: {
+      GST_WARNING ("Unexpected width - endianness conversion ignored\n");
+      (void)memcpy((void *)dst, (void *)src, (size_t)(width/8 * samples));
+      /* gst_buffer_replace(&dst, src); */ /* just pass along the unmodified buffer */
+      break;
+    }
+#endif
+  }
+}
+
 gboolean
 audio_convert_convert (AudioConvertCtx * ctx, gpointer src,
     gpointer dst, gint samples, gboolean src_writable)
@@ -725,6 +800,14 @@
   if (samples == 0)
     return TRUE;
 
+  if (ctx->mix_passthrough && (ctx->in.width == ctx->out.width) 
+      && ((ctx->in.width == 8) || (ctx->in.endianness != ctx->out.endianness))) {
+    /* no need for type conversion if only endianness conversion is needed */
+    audio_convert_endianness_only (src, dst, ctx->out.width, samples * ctx->out.channels);
+
+    return TRUE;
+  }
+
   insize = ctx->in.unit_size * samples;
   outsize = ctx->out.unit_size * samples;

---------------