webrtcdsp voice activity detection

Thu Mar 31 13:14:41 UTC 2022

Le jeudi 31 mars 2022 à 10:16 +0100, Rob Agar via gstreamer-devel a écrit :
> Thanks Nicolas, that's very helpful :)   
> It seems to be working now - any tips for getting the best results,
> specifically for detecting a person speaking with background noise?  Or to
> make it even harder, background chatter 
> I see that webrtc-audio-processing is up to version 1.1 in the source repo...
> has there been any significant changes or improvements since the 0.3.1 release
> that Ubuntu has packaged? 

I'm not certain exactly, this library is an extract from libwebrtc, and is
normally private API inside of it. I think its mostly performance that got
improved. So its not like if its all well known. I think though they dropped the
beamforming support, that was working, but was impossible to enable in practice.
There is generic noise reduction filter that can certainly help a little, it
will likely use PSNR to try and cut out the background a little. It will not
perform as beamforming, but the later requires information about the mic array,
this is what get impossible. The next level would probably be done through an ML
based filter.

>   
> Rob
> 
> On 30/03/2022 18:18, Nicolas Dufresne via gstreamer-devel wrote:
>  
> > Le mercredi 30 mars 2022 à 16:37 +0100, Rob Agar via gstreamer-devel a écrit :
> >  
> > > Hi all 
> > > I'm looking at the webrtcdsp plugin with a view to using it to detect people
> > > speaking, but it's not terribly clear how to use it programmatically.  
> > > Is there any example code for voice activity detection? 
> > I must admit, the documentation could be improved. The voice activity will be
> > delivered using an element message. So you can handle this similarly to other
> > messages (EOS, ERROR, etc.). The type is GST_MESSAGE_ELEMENT, it will contain a
> > GstStructure with a name set to "voice-activity". So something like:
> > 
> > const GstStructure *s = gst_message_get_structure (msg);
> > if (msg->type == GST_MESSAGE_ELEMENT && gst_structure_has_name (s, "voice-activity")) {
> >   gboolean has_voice = FALSE;
> >   gst_structure_get_boolean (s, "stream-has-voice", &have_voice);
> >   . . .
> > }
> > 
> > As a reference, heres the code the emit the message:
> > 
> >   s = gst_structure_new ("voice-activity",
> > "stream-time", G_TYPE_UINT64, stream_time,
> > "stream-has-voice", G_TYPE_BOOLEAN, stream_has_voice, NULL);
> > 
> > GST_LOG_OBJECT (self, "Posting voice activity message, stream %s voice",
> > stream_has_voice ? "now has" : "no longer has");
> > 
> > gst_element_post_message (GST_ELEMENT (self),
> > gst_message_new_element (GST_OBJECT (self), s));
> > 
> > 
> > If you need more fine grain information, the state of the voice activity is also
> > available inside GstAudioLevelMeta, which is attached to every buffer.
> > 
> > Nicolas