<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 02/26/2012 10:09 PM, Alex K wrote:
<blockquote
cite="mid:1330290555.25060.YahooMailNeo@web161002.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: times new roman,new york,times,serif;
font-size: 12pt;">
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">Hello, </div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"><br>
</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">I am working on
extracting speech out of a live microphone stream. The speech
must be in flac format and stored in memory for further
processing. </div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"><br>
</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">Currently I am using
pocketsphinx's vader plugin to do voice activity detection.
And a fakesink in order to store the result in memory without
writing it to file. </div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"><br>
</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">The pipeline that I
currently have looks like this:</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">"gconfaudiosrc !
audioconvert ! audioresample ! vader auto-threshold=true !
flacenc ! fakesink"</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"><br>
</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">The vader plugin provides
two signals to indicate the start and end of a speech
utterance:</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">1) vader-start</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">2) vader-stop</div>
<div style=""><br>
</div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">I use the fakesink's
handoff signal in order to buffer the incremental results, and
finally I hook up to vader's "vader-stop" and "vader-start"
signals to flush the buffer and further process it.</div>
</div>
</blockquote>
What extactly are you doing in the vader-start/stop signal handlers?
<br>
<br>
<blockquote
cite="mid:1330290555.25060.YahooMailNeo@web161002.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: times new roman,new york,times,serif;
font-size: 12pt;">
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"> <span style="font-size:
12pt;">Currently I am just dumping the results to different
files (each file is a different utterance) to play it back
to examine it. </span></div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"><span style="font-size:
12pt;"><br>
</span></div>
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"><span style="font-size:
12pt;">The problem is with flacenc. If I don't use flacenc
but rather just dump the raw audio, the speech utterances
are clearly marked. However if I add flacenc to the
pipeline, the final 1 second of the previous utterance gets
put into the start of the next utterance and messes up the
result.</span></div>
</div>
</blockquote>
You might need to mark the first buffer of each new utterance with a
discont flag.<br>
<blockquote
cite="mid:1330290555.25060.YahooMailNeo@web161002.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: times new roman,new york,times,serif;
font-size: 12pt;">
<div style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"><span style="font-size:
12pt;"><br>
</span></div>
<div><span><font face="'times new roman', 'new york', times,
serif" size="3">Another problem is that the audio data
passed by the vader plugin is in </font><font face="'times
new roman', 'new york', times, serif">discontinuous</font><font
face="'times new roman', 'new york', times, serif"
size="3"> (in terms of timestamps) chunks. A speech might
start at 1s and end at 5s. Then another speech segment
might start at 15s and end at 18s. The problem is that the
flacenc plugin doesn't like that and I'm not sure how to
reset the clock at the end of each speech utterance. I
tried using audiorate but that inserted X amount of
silence at the beginning to compensate for the different
timestamps. <br>
</font></span></div>
</div>
</blockquote>
<br>
Use a smaller buffersize on the capture size or write your own
chunking element. There is also a "removesilence" element and a
"cutter" element which you might want to check.<br>
<br>
Stefan<br>
<blockquote
cite="mid:1330290555.25060.YahooMailNeo@web161002.mail.bf1.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: times new roman,new york,times,serif;
font-size: 12pt;">
<div><span><font face="'times new roman', 'new york', times,
serif" size="3"><br>
</font></span></div>
<div><font face="'times new roman', 'new york', times, serif">Can
anyone help me find a reasonable solution to my problems? </font></div>
<div><font face="'times new roman', 'new york', times, serif"><br>
</font></div>
<div><font face="'times new roman', 'new york', times, serif">Thank
you in advance,</font></div>
<div><font face="'times new roman', 'new york', times, serif">Alex. </font></div>
</div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
gstreamer-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gstreamer-devel@lists.freedesktop.org">gstreamer-devel@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel">http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>