Reading frame numbers in a local video file with precision
Will McElderry
wm-gstreamer at switchd.net
Tue Jan 23 15:37:48 UTC 2024
Hi Ken,
Short(?) version:
1. I think nobody can say what that number is for sure - you'd have to
look at the pipeline. (see tutorial 11:
https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html?gi-language=python#getting-pipeline-graphs
)
2. I'd guess that the number is an average number of units per frame
that is unlikely to be predictable in advance so will not transfer to
other files (e.g number of audio samples/frame or number of
bytes/frame). You'd be better off using FPS and multiplying by time -
mainly because FPS is programmatically obtainable and so transferrable
(though not necessarily set correctly by the encoding process).
3. If I understand correctly Nicolas was saying nothing tracks the frame
index in the low-level data, so higher level data cannot ask establish
the frame index after a seek.
4. The only reliable method seems to be to scan the file to step through
all frames and build a lookup table (e.g. from frame PTS to index
number)
5. There's gotchas when attempting to seek to the time from the lookup
table, but looking up frame index from the time should work well (I
imagine!)
6. How do you know you're really at frame 12345? May be worth double
checking?
Long version:
Please take all of my comments as opinion, guesses and with a healthy
dose of uncertainty bounded by my lack of knowledge and experience with
gstreamer internals!
My understanding is that the meaning of your 'kludge' number depends on
what's inside the playbin and which sink(s) the playbin sends your query
to.
Have you inspected the pipeline graph?
https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html?gi-language=python#getting-pipeline-graphs
As you can see (probably have seen?) from:
https://gstreamer.freedesktop.org/documentation/gstreamer/gstformat.html?gi-language=python#GstFormat
The meaning of 'DEFAULT' depends on what element receives the request (a
video stream MAY return frame numbers - but it clearly isn't in your
case, pipeline is giving something else - it may not be the video stream
replying?).
A very _uninformed_ guess would be that your video file has an audio
stream and what's happening is you're getting the audio-sink replying to
the query. In that case the 'DEFAULT' would indicate audio samples
through the file and the kludge number you are working out would be the
ratio of audio samples to video frames. Essentially the number of audio
samples would be generated at a specified number of samples per second,
and therefore it's really just a measure of time.
IF that's not a million miles from what's going on (i.e. the query is
giving you a number which is basically equivalent to time, but with a
different scale) you'd probably find it easier to query stream time
position and FPS, then multiply the two together. Both of these
approaches would require a constant frame rate to have any hope of being
accurate, but you could estimate it and it may be good enough, depending
on video data and the accuracy you require. It would certainly be more
transferable between videos if you query the frame rate!
Another thing the number may be is 'average number of bytes per frame',
though the number you've given looks very low for that, and I'd still
think you'd be better off using the time as a better approximating
factor. It (almost certainly) will not be predictable or transfer
between files, unless your footage is from an indoor scene where nothing
ever happens...
I should flag: I think Nicolas has addressed this from a slightly
different angle in the other thread:
If I understand correctly he was saying:
he believe there is no information in the MP4 container or H264 stream
that tracks the frame number.
The inference I make from that is if a seek occurs, there is no
component able to "answer the question" what frame index is currently
being viewed. Before the seek there _may_ be a counter somewhere that
ticks up every frame, but after a seek nothing can really *know* what
number frame it has ended up at.
Or to phrase it another way: the MP4 container and H264 stream do not
care about frame numbers, they only care about timing. You can ask them
what time the pipeline is at, but not what frame number is being
processed If it is not available from the container or the stream there
is no hope of any higher level objects getting access to that
information after a seek, so stepping through the stream is the only way
to keep track of the frame index.
If your video stream is nice and has a constant frame rate, you can use
that to convert from time to frame index, but otherwise the information
isn't available to you without scanning the file and building the lookup
table yourself!
To discuss around scanning the video and building a lookup: you can
extract the (stream time) PTS and frame index by stepping through each
frame. As I do that I also take a hash of the pixel data - because I
like to be certain that the pixel data really is what I've asked for!
Some other libraries I have used demonstrated they don't always return
the same pixels after a seek as they would if the code were stepping
only, so you may want to consider if you want to hash frames or not, or
just trust the meta data...
Once you have the lookup, you can use it to either:
(usage 1) 'seek to the frame' by looking up the corresponding time
and seeking to that time
or (usage 2) identify which frame index the pipeline is operating on
by requesting the buffer's PTS (or frame hash) and looking that up to
get the frame index
I've been away a while as I've been exploring exaclty how (usage 1)
would work: the 'seeking to the corresponding time' is more complex than
it sounds. The corresponding time isn't the frame PTS, or necessarily
just before as one would expect, but depends on the type of frame
(I-frame/P-frame) and the 2 prior frames PTSs and durations as well.
I'll post full details in the other thread "soon" to ensure I'm not
misunderstanding the evidence and maybe help anyone else with the same
usage.
Your current question is about identifying the frame index (usage 2)
from the pipeline's state, so the approach Nicolas suggests may well
work for this (I haven't seen anything in my testing that would suggest
an issue with the approach).
Finally, I'd have to admit: I'm a little surprised if your code really
is seeking to frame '12345' as you expect: specifically the
'GST_SEEK_FLAG_KEY_UNIT' would suggest that it will seek to a key frame
near frame '12345', which is probably not 12345, unless you are quite
lucky (although that may be why you chose that frame, but I suspect
it's just a placeholder?), or maybe something else I don't get is
happening?
I'd encourage you to build the lookup containing buffer.pts and frame
index, then confirm the next frame after your seek yields the buffer.pts
you expect for your target frame number, to be sure that you've got the
accuracy you expect.
Having written all that, I've been surprised before, so I won't mind
being shown to be wrong this time! I just wanted to make you aware of
my thoughts so you can consider if you want to double check or help
correct my misunderstanding.
I hope there are some ideas in there that help you move forward!
All the best,
Will.
On 2024-01-17 19:55, Kenneth Feingold via gstreamer-devel wrote:
> Hi Will,
> Thanks very much for working with me on this, to whatever limit extent
> is possible :-)
> My application has a very simple pipeline. It uses playbin to show a
> video file in a gtkgl window:
>
> data.playbin = gst_element_factory_make ("playbin", "playbin");
> videosink = gst_element_factory_make ("glsinkbin", "glsinkbin");
> gtkglsink = gst_element_factory_make ("gtkglsink", "gtkglsink");
>
> When I seek to a frame like this:
>
> gst_element_seek_simple (data->playbin, GST_FORMAT_DEFAULT,
> GST_SEEK_FLAG_FLUSH |GST_SEEK_FLAG_KEY_UNIT, 12345);
> send_seek_event (data);
>
> /*and then in my seek function:*/
> gst_element_send_event (data->video_sink, seek_event);
>
> It actually takes me to precise frame #12345.
>
> But, when I try to retrieve frame numbers while playing the file:
> /* Query the current position in frames */
> if (gst_element_query_position (data->playbin, GST_FORMAT_DEFAULT,
> ¤t)) {
> framenum=(current/1599.49);
> g_print ("Current Frame: %ld\n", framenum);
> }
>
> I need to use that specific (kludge) factor
> "framenum=(current/1599.49);" in order to get *close* to the app
> giving me the right frame number. What is curious is that with a
> different video file having a different length I need to use a
> different divisor to get the "right" value. Is this related to media
> duration/stream time?
>
> I am working with mp4 and mov files, and I am wondering if compression
> between frames is a factor? Would uncompressed video yield greater
> accuracy?
>
> And, as I was wondering in my earlier post here, what values do you
> think this:
> gst_element_query_position (data->playbin, GST_FORMAT_DEFAULT,
> ¤t))
>
> is giving me (without my kludge)?
>
> Thanks again!
> Ken
More information about the gstreamer-devel
mailing list