webrtcdsp voice activity detection

Wed Mar 30 17:18:57 UTC 2022

Le mercredi 30 mars 2022 à 16:37 +0100, Rob Agar via gstreamer-devel a écrit :
> Hi all 
> I'm looking at the webrtcdsp plugin with a view to using it to detect people
> speaking, but it's not terribly clear how to use it programmatically.  
> Is there any example code for voice activity detection? 

I must admit, the documentation could be improved. The voice activity will be
delivered using an element message. So you can handle this similarly to other
messages (EOS, ERROR, etc.). The type is GST_MESSAGE_ELEMENT, it will contain a
GstStructure with a name set to "voice-activity". So something like:

const GstStructure *s = gst_message_get_structure (msg);
if (msg->type == GST_MESSAGE_ELEMENT && gst_structure_has_name (s, "voice-activity")) {
  gboolean has_voice = FALSE;
  gst_structure_get_boolean (s, "stream-has-voice", &have_voice);
  . . .
}

As a reference, heres the code the emit the message:

  s = gst_structure_new ("voice-activity",
"stream-time", G_TYPE_UINT64, stream_time,
"stream-has-voice", G_TYPE_BOOLEAN, stream_has_voice, NULL);

GST_LOG_OBJECT (self, "Posting voice activity message, stream %s voice",
stream_has_voice ? "now has" : "no longer has");

gst_element_post_message (GST_ELEMENT (self),
gst_message_new_element (GST_OBJECT (self), s));

If you need more fine grain information, the state of the voice activity is also
available inside GstAudioLevelMeta, which is attached to every buffer.

Nicolas