<html><body><div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt"><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">Hello, </div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; "><br></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">I am working on extracting speech out of a live microphone stream. The speech must be in flac format and stored in memory for further processing. </div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; "><br></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">Currently I am using pocketsphinx's vader plugin to do voice activity detection. And a fakesink in order to store the result in memory without writing it to file. </div><div
style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; "><br></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">The pipeline that I currently have looks like this:</div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">"gconfaudiosrc ! audioconvert ! audioresample ! vader auto-threshold=true ! flacenc ! fakesink"</div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; "><br></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">The vader plugin provides two signals to indicate the start and end of a speech utterance:</div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">1) vader-start</div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">2) vader-stop</div><div style="font-family: 'times new
roman', 'new york', times, serif; font-size: 12pt; "><br></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">I use the fakesink's handoff signal in order to buffer the incremental results, and finally I hook up to vader's "vader-stop" and "vader-start" signals to flush the buffer and further process it. <span style="font-size: 12pt; ">Currently I am just dumping the results to different files (each file is a different utterance) to play it back to examine it. </span></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; "><span style="font-size: 12pt; "><br></span></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; "><span style="font-size: 12pt; ">The problem is with flacenc. If I don't use flacenc but rather just dump the raw audio, the speech utterances are clearly marked. However if I add flacenc to the pipeline, the
final 1 second of the previous utterance gets put into the start of the next utterance and messes up the result.</span></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; "><span style="font-size: 12pt; "><br></span></div><div><span><font face="'times new roman', 'new york', times, serif" size="3">Another problem is that the audio data passed by the vader plugin is in </font><font face="'times new roman', 'new york', times, serif">discontinuous</font><font face="'times new roman', 'new york', times, serif" size="3"> (in terms of timestamps) chunks. A speech might start at 1s and end at 5s. Then another speech segment might start at 15s and end at 18s. The problem is that the flacenc plugin doesn't like that and I'm not sure how to reset the clock at the end of each speech utterance. I tried using audiorate but that inserted X amount of silence at the beginning to compensate for the different
timestamps. </font></span></div><div><span><font face="'times new roman', 'new york', times, serif" size="3"><br></font></span></div><div><font face="'times new roman', 'new york', times, serif">Can anyone help me find a reasonable solution to my problems? </font></div><div><font face="'times new roman', 'new york', times, serif"><br></font></div><div><font face="'times new roman', 'new york', times, serif">Thank you in advance,</font></div><div><font face="'times new roman', 'new york', times, serif">Alex. </font></div></div></body></html>