<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> On 02/26/2012 10:09 PM, Alex K wrote: <blockquote cite="mid:1330290555.25060.YahooMailNeo@web161002.mail.bf1.yahoo.com" type="cite"> <div style="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); font-family: times new roman,new york,times,serif; font-size: 12pt;"> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">Hello, </div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;"><br> </div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">I am working on extracting speech out of a live microphone stream. The speech must be in  flac format and stored in memory for further processing. </div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;"><br> </div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">Currently I am using pocketsphinx's vader plugin to do voice activity detection. And a fakesink in order to store the result in memory without writing it to file. </div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;"><br> </div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">The pipeline that I currently have looks like this:</div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">"gconfaudiosrc ! audioconvert ! audioresample ! vader auto-threshold=true ! flacenc ! fakesink"</div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;"><br> </div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">The vader plugin provides two signals to indicate the start and end of a speech utterance:</div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">1) vader-start</div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">2) vader-stop</div> <div style=""><br> </div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;">I use the fakesink's handoff signal in order to buffer the incremental results, and finally I hook up to vader's "vader-stop" and "vader-start" signals to flush the buffer and further process it.</div> </div> </blockquote> What extactly are you doing in the vader-start/stop signal handlers? <br> <br> <blockquote cite="mid:1330290555.25060.YahooMailNeo@web161002.mail.bf1.yahoo.com" type="cite"> <div style="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); font-family: times new roman,new york,times,serif; font-size: 12pt;"> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;"> <span style="font-size: 12pt;">Currently I am just dumping the results to different files (each file is a different utterance) to play it back to examine it. </span></div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;"><span style="font-size: 12pt;"><br> </span></div> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;"><span style="font-size: 12pt;">The problem is with flacenc. If I don't use flacenc but rather just dump the raw audio, the speech utterances are clearly marked. However if I add flacenc to the pipeline, the final 1 second of the previous utterance gets put into the start of the next utterance and messes up the result.</span></div> </div> </blockquote> You might need to mark the first buffer of each new utterance with a discont flag.<br> <blockquote cite="mid:1330290555.25060.YahooMailNeo@web161002.mail.bf1.yahoo.com" type="cite"> <div style="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); font-family: times new roman,new york,times,serif; font-size: 12pt;"> <div style="font-family: 'times new roman','new york',times,serif; font-size: 12pt;"><span style="font-size: 12pt;"><br> </span></div> <div><span><font face="'times new roman', 'new york', times, serif" size="3">Another problem is that the audio data passed by the vader plugin is in </font><font face="'times new roman', 'new york', times, serif">discontinuous</font><font face="'times new roman', 'new york', times, serif" size="3"> (in terms of timestamps) chunks. A speech might start at 1s and end at 5s. Then another speech segment might start at 15s and end at 18s. The problem is that the flacenc plugin doesn't like that and I'm not sure how to reset the clock at the end of each speech utterance. I tried using audiorate but that inserted X amount of silence at the beginning to compensate for the different timestamps. <br> </font></span></div> </div> </blockquote> <br> Use a smaller buffersize on the capture size or write your own chunking element. There is also a "removesilence" element and a "cutter" element which you might want to check.<br> <br> Stefan<br> <blockquote cite="mid:1330290555.25060.YahooMailNeo@web161002.mail.bf1.yahoo.com" type="cite"> <div style="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); font-family: times new roman,new york,times,serif; font-size: 12pt;"> <div><span><font face="'times new roman', 'new york', times, serif" size="3"><br> </font></span></div> <div><font face="'times new roman', 'new york', times, serif">Can anyone help me find a reasonable solution to my problems? </font></div> <div><font face="'times new roman', 'new york', times, serif"><br> </font></div> <div><font face="'times new roman', 'new york', times, serif">Thank you in advance,</font></div> <div><font face="'times new roman', 'new york', times, serif">Alex. </font></div> </div> <pre wrap=""> <fieldset class="mimeAttachmentHeader"></fieldset> _______________________________________________ gstreamer-devel mailing list <a class="moz-txt-link-abbreviated" href="mailto:gstreamer-devel@lists.freedesktop.org">gstreamer-devel@lists.freedesktop.org</a> <a class="moz-txt-link-freetext" href="http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel">http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel</a> </pre> </blockquote> <br> </body> </html>