Identifying h264 frame types?

Will McElderry wm-gstreamer at switchd.net
Mon Feb 5 00:32:50 UTC 2024


<snippet>
> Encoded H.264 frames coming out of qtdemux should have the DELTA_UNIT
> flag set on buffers if it's a P/B frame, and should have the
> DELTA_UNIT flag cleared (not set) if it's a key/IDR frame. This will
> be based on information in the container.
>  Tim
</snippet>

Hi All,

@Tim:  Thanks for your input! - it's really appreciated and /almost/ 
works! Certainly much faster too.


Short version:
I now have two methods to identify key-frames and they don't agree!
I suspect the new method (inspecting data from qtdemux for DELTA_UNIT 
flag) is not working exactly as I expect as the first frame in a file 
appears never to be a key frame, and it seems _one_ frame early compared 
to seek( KEY_UNIT | SNAP_AFTER | FLUSH) method of identifying key frames 
-- but I'm also suspicious of my limited knowledge too!
Can anyone comment?

Thank you in advance!


More details:

I have two methods to identify key frames:

1. seeking to previous key frame PTS time (initially time 0) and using 
flags: Gst.SeekFlags.KEY_UNIT | Gst.SeekFlags.SNAP_AFTER | 
Gst.SeekFlags.FLUSH to successively identify all key-frames in the file 
(very slow, especially when decoding frames!)

2. using  a pipeline of the form:
     filesrc location=my.mp4 ! qtdemux ! tee name=t
       t. ! video/x-h264 ! queue ! appsink name=h264_appsink
       t. ! h264parse ! nvh264dec ! queue ! appsink 
name=frame_data_appsink
    (acknowledgement: missing out a couple of elements for clarity.  Also 
I have tried moving 'h264parse' before the tee - same results)
    In this method I inspect the sample I pull from h264_appsink to see 
if the frame has the DELTA_UNIT flag,
    As a sanity check, I compare the stream time for the sample obtained 
from frame_data_appsink - (both samples timestamps always match)

Input video:
Generated using:
    appsrc ! video/x-raw,...,framerate=15 ! nvh264enc gop-size=15 ... ! 
h264parse ! mp4mux ! filesink location=...
  (again: hiding details in attempt to increase clarity)



What I see:
NB: times are given in ms as I find them easier to read.
method 1 yields frame times: [66.66, 1066.66, 2066.66, 3066.66, 4066.66, 
...]
                 frame indices: [0, 15, 30, 45, 60, ...]
method 2 yields frame times: [1000,2000,3000,4000, ...]
                 frame indices: [14,29,44,59, ...]

What I expect:
I'm doing something wrong, but I cannot see what.

I intuitively expect a file to start with an I-frame, then after every 
GOP frames, another I-frame (though I also expect a heuristic may be 
used to identify 'good times' to introduce new I-frames may mess that 
up).
That would tie up with results from method 1 to identify I-frames, but 
then, why would this new method be off by one frame? (one frame early, 
and missing the first key frame)
Can I reliably assume the flag is one frame early?

My intuition isn't exactly worth much though as I don't have knowledge 
in this area, so I'd not be too surprised to hear I'm looking in the 
wrong place.

Can anyone who knows more comment?

NB: In case it's relevant, I'm running on version 1.20.3 from Ubuntu 
22.04.


Thanks again!

Will.


More information about the gstreamer-devel mailing list