<div dir="auto"><div dir="ltr"><div dir="ltr">Hi Will,<div><br></div><div>Thank you for your extraordinarily thorough and thoughtful consideration of my question, and your patience with my newbie lack of understanding of where the issues really lie. I'm going to take the time to really work through all of the points and address your questions when I have better answers, but this guidance will surely help me to get going in a less context-blind direction and make some real progress.</div><div dir="auto"><br></div><div dir="auto">My application is, basically, a player to replicate old computer interactive laserdisc functionality with video files made using the Domesday Duplicator (<a href="https://github.com/simoninns/DomesdayDuplicator/wiki/Overview" target="_blank" rel="noreferrer">https://github.com/simoninns/DomesdayDuplicator/wiki/Overview</a>), and, by using the ld-analyze application (<a href="https://github.com/happycube/ld-decode/wiki/ld-analyse" target="_blank" rel="noreferrer">https://github.com/happycube/ld-decode/wiki/ld-analyse</a>) I can read the original frame numbers from the discs in the video files. But you are correct, I need to more closely examine the difference between seeking to an exact frame and a nearby frame when an image doesn't have much difference between one frame and another, I may certainly have been fooling myself that seeking to "12345" or any other, was getting exactly there when the one that actually is "12345" looks just like adjacent frames. I will try some files with time code burned in for comparison. </div><div dir="auto"><br></div><div dir="auto">You (and Nicholas) have offered some really helpful pointers (as well as a true paradigm shift for me as to how to think about frames in the GStreamer context) not the least of which is patiently pointing me to the GStreamer debug tools that are available and other ways of thinking about the nature of H264 video and stream position. I have lots of work to do.</div><div dir="auto"><br></div><div dir="auto">Again, thanks!</div><div dir="auto"><br></div><div dir="auto">All the best,</div><div dir="auto">Ken</div><div dir="auto"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jan 23, 2024 at 10:37 AM Will McElderry <<a href="mailto:wm-gstreamer@switchd.net" target="_blank" rel="noreferrer">wm-gstreamer@switchd.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Ken,<br>
<br>
<br>
Short(?) version:<br>
<br>
1. I think nobody can say what that number is for sure - you'd have to <br>
look at the pipeline. (see tutorial 11: <br>
<a href="https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html?gi-language=python#getting-pipeline-graphs" rel="noreferrer noreferrer" target="_blank">https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html?gi-language=python#getting-pipeline-graphs</a> <br>
)<br>
2. I'd guess that the number is an average number of units per frame <br>
that is unlikely to be predictable in advance so will not transfer to <br>
other files (e.g number of audio samples/frame or number of <br>
bytes/frame). You'd be better off using FPS and multiplying by time - <br>
mainly because FPS is programmatically obtainable and so transferrable <br>
(though not necessarily set correctly by the encoding process).<br>
3. If I understand correctly Nicolas was saying nothing tracks the frame <br>
index in the low-level data, so higher level data cannot ask establish <br>
the frame index after a seek.<br>
4. The only reliable method seems to be to scan the file to step through <br>
all frames and build a lookup table (e.g. from frame PTS to index <br>
number)<br>
5. There's gotchas when attempting to seek to the time from the lookup <br>
table, but looking up frame index from the time should work well (I <br>
imagine!)<br>
6. How do you know you're really at frame 12345? May be worth double <br>
checking?<br>
<br>
<br>
Long version:<br>
<br>
Please take all of my comments as opinion, guesses and with a healthy <br>
dose of uncertainty bounded by my lack of knowledge and experience with <br>
gstreamer internals!<br>
<br>
<br>
My understanding is that the meaning of your 'kludge' number depends on <br>
what's inside the playbin and which sink(s) the playbin sends your query <br>
to.<br>
Have you inspected the pipeline graph?<br>
<a href="https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html?gi-language=python#getting-pipeline-graphs" rel="noreferrer noreferrer" target="_blank">https://gstreamer.freedesktop.org/documentation/tutorials/basic/debugging-tools.html?gi-language=python#getting-pipeline-graphs</a><br>
<br>
As you can see (probably have seen?) from:<br>
<br>
<a href="https://gstreamer.freedesktop.org/documentation/gstreamer/gstformat.html?gi-language=python#GstFormat" rel="noreferrer noreferrer" target="_blank">https://gstreamer.freedesktop.org/documentation/gstreamer/gstformat.html?gi-language=python#GstFormat</a><br>
The meaning of 'DEFAULT' depends on what element receives the request (a <br>
video stream MAY return frame numbers - but it clearly isn't in your <br>
case, pipeline is giving something else - it may not be the video stream <br>
replying?).<br>
<br>
A very _uninformed_ guess would be that your video file has an audio <br>
stream and what's happening is you're getting the audio-sink replying to <br>
the query. In that case the 'DEFAULT' would indicate audio samples <br>
through the file and the kludge number you are working out would be the <br>
ratio of audio samples to video frames. Essentially the number of audio <br>
samples would be generated at a specified number of samples per second, <br>
and therefore it's really just a measure of time.<br>
<br>
IF that's not a million miles from what's going on (i.e. the query is <br>
giving you a number which is basically equivalent to time, but with a <br>
different scale) you'd probably find it easier to query stream time <br>
position and FPS, then multiply the two together. Both of these <br>
approaches would require a constant frame rate to have any hope of being <br>
accurate, but you could estimate it and it may be good enough, depending <br>
on video data and the accuracy you require. It would certainly be more <br>
transferable between videos if you query the frame rate!<br>
<br>
Another thing the number may be is 'average number of bytes per frame', <br>
though the number you've given looks very low for that, and I'd still <br>
think you'd be better off using the time as a better approximating <br>
factor. It (almost certainly) will not be predictable or transfer <br>
between files, unless your footage is from an indoor scene where nothing <br>
ever happens...<br>
<br>
<br>
I should flag: I think Nicolas has addressed this from a slightly <br>
different angle in the other thread:<br>
If I understand correctly he was saying:<br>
he believe there is no information in the MP4 container or H264 stream <br>
that tracks the frame number.<br>
The inference I make from that is if a seek occurs, there is no <br>
component able to "answer the question" what frame index is currently <br>
being viewed. Before the seek there _may_ be a counter somewhere that <br>
ticks up every frame, but after a seek nothing can really *know* what <br>
number frame it has ended up at.<br>
<br>
Or to phrase it another way: the MP4 container and H264 stream do not <br>
care about frame numbers, they only care about timing. You can ask them <br>
what time the pipeline is at, but not what frame number is being <br>
processed If it is not available from the container or the stream there <br>
is no hope of any higher level objects getting access to that <br>
information after a seek, so stepping through the stream is the only way <br>
to keep track of the frame index.<br>
<br>
If your video stream is nice and has a constant frame rate, you can use <br>
that to convert from time to frame index, but otherwise the information <br>
isn't available to you without scanning the file and building the lookup <br>
table yourself!<br>
<br>
<br>
<br>
<br>
To discuss around scanning the video and building a lookup: you can <br>
extract the (stream time) PTS and frame index by stepping through each <br>
frame. As I do that I also take a hash of the pixel data - because I <br>
like to be certain that the pixel data really is what I've asked for! <br>
Some other libraries I have used demonstrated they don't always return <br>
the same pixels after a seek as they would if the code were stepping <br>
only, so you may want to consider if you want to hash frames or not, or <br>
just trust the meta data...<br>
<br>
<br>
Once you have the lookup, you can use it to either:<br>
(usage 1) 'seek to the frame' by looking up the corresponding time <br>
and seeking to that time<br>
or (usage 2) identify which frame index the pipeline is operating on <br>
by requesting the buffer's PTS (or frame hash) and looking that up to <br>
get the frame index<br>
<br>
<br>
I've been away a while as I've been exploring exaclty how (usage 1) <br>
would work: the 'seeking to the corresponding time' is more complex than <br>
it sounds. The corresponding time isn't the frame PTS, or necessarily <br>
just before as one would expect, but depends on the type of frame <br>
(I-frame/P-frame) and the 2 prior frames PTSs and durations as well.<br>
I'll post full details in the other thread "soon" to ensure I'm not <br>
misunderstanding the evidence and maybe help anyone else with the same <br>
usage.<br>
<br>
Your current question is about identifying the frame index (usage 2) <br>
from the pipeline's state, so the approach Nicolas suggests may well <br>
work for this (I haven't seen anything in my testing that would suggest <br>
an issue with the approach).<br>
<br>
<br>
Finally, I'd have to admit: I'm a little surprised if your code really <br>
is seeking to frame '12345' as you expect: specifically the <br>
'GST_SEEK_FLAG_KEY_UNIT' would suggest that it will seek to a key frame <br>
near frame '12345', which is probably not 12345, unless you are quite <br>
lucky (although that may be why you chose that frame, but I suspect <br>
it's just a placeholder?), or maybe something else I don't get is <br>
happening?<br>
I'd encourage you to build the lookup containing buffer.pts and frame <br>
index, then confirm the next frame after your seek yields the buffer.pts <br>
you expect for your target frame number, to be sure that you've got the <br>
accuracy you expect.<br>
Having written all that, I've been surprised before, so I won't mind <br>
being shown to be wrong this time! I just wanted to make you aware of <br>
my thoughts so you can consider if you want to double check or help <br>
correct my misunderstanding.<br>
<br>
I hope there are some ideas in there that help you move forward!<br>
<br>
<br>
All the best,<br>
<br>
Will.<br>
<br>
<br>
On 2024-01-17 19:55, Kenneth Feingold via gstreamer-devel wrote:<br>
> Hi Will,<br>
> Thanks very much for working with me on this, to whatever limit extent<br>
> is possible :-)<br>
> My application has a very simple pipeline. It uses playbin to show a<br>
> video file in a gtkgl window:<br>
> <br>
> data.playbin = gst_element_factory_make ("playbin", "playbin");<br>
> videosink = gst_element_factory_make ("glsinkbin", "glsinkbin");<br>
> gtkglsink = gst_element_factory_make ("gtkglsink", "gtkglsink");<br>
> <br>
> When I seek to a frame like this:<br>
> <br>
> gst_element_seek_simple (data->playbin, GST_FORMAT_DEFAULT,<br>
> GST_SEEK_FLAG_FLUSH |GST_SEEK_FLAG_KEY_UNIT, 12345);<br>
> send_seek_event (data);<br>
> <br>
> /*and then in my seek function:*/<br>
> gst_element_send_event (data->video_sink, seek_event);<br>
> <br>
> It actually takes me to precise frame #12345.<br>
> <br>
> But, when I try to retrieve frame numbers while playing the file:<br>
> /* Query the current position in frames */<br>
> if (gst_element_query_position (data->playbin, GST_FORMAT_DEFAULT,<br>
> ¤t)) {<br>
> framenum=(current/1599.49);<br>
> g_print ("Current Frame: %ld\n", framenum);<br>
> }<br>
> <br>
> I need to use that specific (kludge) factor<br>
> "framenum=(current/1599.49);" in order to get *close* to the app<br>
> giving me the right frame number. What is curious is that with a<br>
> different video file having a different length I need to use a<br>
> different divisor to get the "right" value. Is this related to media<br>
> duration/stream time?<br>
> <br>
> I am working with mp4 and mov files, and I am wondering if compression<br>
> between frames is a factor? Would uncompressed video yield greater<br>
> accuracy?<br>
> <br>
> And, as I was wondering in my earlier post here, what values do you<br>
> think this:<br>
> gst_element_query_position (data->playbin, GST_FORMAT_DEFAULT,<br>
> ¤t))<br>
> <br>
> is giving me (without my kludge)?<br>
> <br>
> Thanks again!<br>
> Ken<br>
</blockquote></div></div></div>