I am working on an IP camera RTSP relay server using gstreamer. The app
simply relays the given rtps stream link to another endpoint after
embedding some metadata in it.

A required feature is to request from the app to write an image based on
the received metadata by a client, the requested image could be from a
moment back in time (1 minute max) . We wish to achieve that with
minimal CPU usage.

Obviously, the app doesn't do any decoding when relaying the H264/H265
stream but it will decode when such a request is made. Currently, the app
pulls the relayed stream (with metadata) and using appsink, The H264/H265
packets are stored in a queue as GstSamples for later reference.

When a request is made for an image, I locate the required image (via the
metadata) and take a range of GstSamples around it and kick another
pipeline for the purpose of writing the required image.

Now my problem mainly lies in the range of GstSamples, it's about 300
Samples just to get around the H264/H265 parsers dropping frames and so on
and finally the image in question is written. In some cases that range of
samples is not enough because the parser hasn't figured out the SPS/PPS
NALs and continues to drop frames resulting in the image not being written.

Is there a way to get around this problem? Or another approach to write
images on-demand without having to fully commit to decoding the entire
stream? I'm open for ideas/suggestions.

Thanks in advance.

