webrtcdsp voice detection

Fri Apr 15 17:54:22 UTC 2022

Le ven. 15 avr. 2022 08 h 47, Dejan Cotra <Dejan.Cotra at nttdata.com> a
écrit :

> Hi Nicolas,
>
> Thank you that was very helpful.
>
> I have one similar question. I also play around with facedetect element. I
> know that I can retrieve information about face from bus message emitted by
> facedetect element.
>
> Is there a way to retrieve informations about face from video frame
> metainfo? Something similar to voice_activity in GstAudioLevelMeta?
>

I'm currently away from a real computer to check the code, though the
"in-band" way is GstVideoRegionOfInterest. ROI are rectangles with a type
(a simple string). So facedetect should be adding ROI meta to the frames.

A typical use case is to detect these with a pad probe and set a qp-delta
(not sure of the name) so that capable encoders can be told to work harder
on the details of that rectangle.

This could also be used for other purposes. Note that ONNX (ML plugins we
have) tend to prefer having more shapes then just rectangles, so they have
their own meta.

> Br,
> Dejan
>
> -----Original Message-----
> From: Nicolas Dufresne <nicolas at ndufresne.ca>
> Sent: Friday, April 8, 2022 3:31 PM
> To: Discussion of the development of and with GStreamer <
> gstreamer-devel at lists.freedesktop.org>
> Cc: Dejan Cotra <Dejan.Cotra at nttdata.com>
> Subject: Re: webrtcdsp voice detection
>
> Le vendredi 08 avril 2022 à 11:10 +0000, Dejan Cotra via gstreamer-devel a
> écrit :
>
> [...]
> >
> > I know that I can retrieve informations from webrtcdsp voice detection
> > via bus messages. I receive GST_MESSAGE_ELEMENT message from webrtcdsp
> > element with payload like this:
> >
> > voice-activity, stream-time=(guint64)2640000000, stream-has-
> > voice=(boolean)false;
> >
> > Question is can I retrieve informations about voice detection in some
> > other way. Like metainfo of each sample that I pull from appsink
> > element? Or something similar?
>
> It also sets the voice_activity boolean in GstAudioLevelMeta (along with
> the audio amplitude). This is per buffers, not per samples. So you get
> feedback every 10ms more or less.
>
> Nicolas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20220415/90a9c9ed/attachment.htm>