<div dir="ltr"><div>This sounds like the Onvif metadata streaming <br></div><div>It is a standard substream as far as rtsp and rtp. instead of audio, it just has frames of xml. <br></div><div>Event security cameras that are not connecting via onvif use this normalized XML. <br></div><div><br></div><div> <a href="https://www.onvif.org/specs/srv/analytics/ONVIF-Analytics-Service-Spec.pdf">https://www.onvif.org/specs/srv/analytics/ONVIF-Analytics-Service-Spec.pdf</a></div><div>section 5.2 defines the coordinate system and basic layout and the</div><div>Onvif Streaming Spec describes how it fits into rtsp.</div><div><br></div><div><a href="https://www.onvif.org/specs/stream/ONVIF-Streaming-Spec.pdf">https://www.onvif.org/specs/stream/ONVIF-Streaming-Spec.pdf</a></div><div><br></div><div>Disclaimer, I am not an Onvif proponent, fan or official, far from it, but this part they seem to have gotten right and people are using it to interoperate.<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 6, 2022 at 8:38 AM Manuel Wagesreither via gstreamer-devel <<a href="mailto:gstreamer-devel@lists.freedesktop.org">gstreamer-devel@lists.freedesktop.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br> <br> For scientific purposes I'd like to send a video stream together with textual data over an RTP stream. The textual data amend the visual stream and contain information such as X, Y coordinates of points of interest. Thus, they need to be in sync with the video.<br> <br> I searched the internet for quite a bit and this is the most promising I found: <a href="https://gist.github.com/justinjoy/e0f563194bbff432d2ab46954f98c6a6" rel="noreferrer" target="_blank">https://gist.github.com/justinjoy/e0f563194bbff432d2ab46954f98c6a6</a><br> <br> Sadly, as I do not have prior GStreamer experience, I don't even find the line of code where the text actually gets fed into the stream. It's not clear to me where the text gets taken from.<br> <br> Here's some StackOverflow question [1] which claims that RTP suggests RFC4103 [2] and RFC4396 [3] to encapsulate text in RTP but also that GStreamer has no support for it yet.<br> <br> [1] <a href="https://stackoverflow.com/questions/60553497/gstreamer-rtp-transmission-of-video-text" rel="noreferrer" target="_blank">https://stackoverflow.com/questions/60553497/gstreamer-rtp-transmission-of-video-text</a><br> [2] <a href="https://datatracker.ietf.org/doc/html/rfc4103" rel="noreferrer" target="_blank">https://datatracker.ietf.org/doc/html/rfc4103</a><br> [3] <a href="https://www.rfc-editor.org/rfc/rfc4396" rel="noreferrer" target="_blank">https://www.rfc-editor.org/rfc/rfc4396</a><br> <br> I feel a bit list and would appreciate any advice.<br> <br> Thanks in advance,<br> Manuel<br> </blockquote></div>