Encoding speech utterances in flac (discontinuous chunks problem)

Stefan Sauer ensonic at hora-obscura.de
Tue Feb 28 07:32:23 PST 2012


On 02/26/2012 10:09 PM, Alex K wrote:
> Hello, 
>
> I am working on extracting speech out of a live microphone stream. The
> speech must be in  flac format and stored in memory for further
> processing. 
>
> Currently I am using pocketsphinx's vader plugin to do voice activity
> detection. And a fakesink in order to store the result in memory
> without writing it to file. 
>
> The pipeline that I currently have looks like this:
> "gconfaudiosrc ! audioconvert ! audioresample ! vader
> auto-threshold=true ! flacenc ! fakesink"
>
> The vader plugin provides two signals to indicate the start and end of
> a speech utterance:
> 1) vader-start
> 2) vader-stop
>
> I use the fakesink's handoff signal in order to buffer the incremental
> results, and finally I hook up to vader's "vader-stop" and
> "vader-start" signals to flush the buffer and further process it.
What extactly are you doing in the vader-start/stop signal handlers?

>  Currently I am just dumping the results to different files (each file
> is a different utterance) to play it back to examine it. 
>
> The problem is with flacenc. If I don't use flacenc but rather just
> dump the raw audio, the speech utterances are clearly marked. However
> if I add flacenc to the pipeline, the final 1 second of the previous
> utterance gets put into the start of the next utterance and messes up
> the result.
You might need to mark the first buffer of each new utterance with a
discont flag.
>
> Another problem is that the audio data passed by the vader plugin is
> in discontinuous (in terms of timestamps) chunks. A speech might start
> at 1s and end at 5s. Then another speech segment might start at 15s
> and end at 18s. The problem is that the flacenc plugin doesn't like
> that and I'm not sure how to reset the clock at the end of each speech
> utterance. I tried using audiorate but that inserted X amount of
> silence at the beginning to compensate for the different timestamps.

Use a smaller buffersize on the capture size or write your own chunking
element. There is also a "removesilence" element and a "cutter" element
which you might want to check.

Stefan
>
> Can anyone help me find a reasonable solution to my problems? 
>
> Thank you in advance,
> Alex. 
>
>
> _______________________________________________
> gstreamer-devel mailing list
> gstreamer-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/gstreamer-devel/attachments/20120228/725ea2be/attachment-0001.html>


More information about the gstreamer-devel mailing list