Problem with seeking "subparse"

Andy Robinson andy at seventhstring.com
Thu Dec 31 14:38:20 UTC 2020


GST 1.18.2 on Mac Big Sur (and for all I know it might well happen on 
Windows too).

I find that subparse gets confused by seeking. I attach a simple 
subtitle file Test.srt in "subrip" format, obviously you can put these 
subtitles on whatever video you might have at hand.

The pipeline looks like this:

gst-launch-1.0 \
    textoverlay name=ov ! autovideosink \
    filesrc location=my-video.mp4 ! decodebin ! videoconvert ! 
videoscale ! ov.video_sink \
    filesrc location=Test.srt ! subparse ! ov.text_sink

but of course I am doing this programmatically and this pipeline works 
fine if you don't "seek" it. And I don't think it's possible to seek 
with gst-launch?

However if you programmatically seek this pipeline to 8 seconds with 
GST_DEBUG=subparse:7 then subparse produces errors. I have attached a 
file subparse_log.txt showing the crucial lines.

The crucial lines from the source are these, at line 1060 in the 
function parse_subrip in gstsubparse.c, dealing with "state 2" 
(expecting subtitle text):

       if (in_seg) {
         state->start_time = clip_start;
         state->duration = clip_stop - clip_start;
       } else {
         state->state = 0;
         return NULL;
       }

That is, if we are out of segment (parsing lines before the ones we are 
interested in) then throw away the subtitle text and transition 
immediately to state 0 (expecting sequence number). IMHO this is wrong, 
the next thing we are in fact going to see is either another line of 
subtitle text or a blank line.

The problem is then compounded by the fact that in state 0 the parser 
accepts almost anything - even a blank line - as a valid sequence 
number, and transitions to state 1 (expecting timestamps).

These two factors cause the parsing errors to cascade, often destroying 
the first 2 or 3 timestamps that we *did* want to see.

Looking at the log I've attached, we see the segment event, start time 8 
secs, and then:

State 0. Parsing line '1'
State 1. Parsing line '00:00:01,000 --> 00:00:05,000'
parse_subrip_time: parsing timestamp '00:00:01,000'
parse_subrip_time: parsing timestamp '00:00:05,000'
State 2. Parsing line '<i>Test message 1</i>'
    // At this point we transition to state 0 which is wrong -
    // we should still be in state 2, waiting for blank line.
State 0. Parsing line ''
    // Here we wrongly transition to state 1 because the
    // blank line we just saw has been wrongly accepted as
    // a valid sequence number. Now we are lost!
State 1. Parsing line '2'
error parsing subrip time line '2'
State 0. Parsing line '00:00:07,000 --> 00:00:12,000'
    // I haven't checked out why that was not accepted as a sequence
    // number. But is wasn't because we are still in state 0.
State 0. Parsing line '<i>Another test message'
    // However that was accepted as a sequence number!
    // so we transition to state 1.
State 1. Parsing line 'on two lines</i>'
error parsing subrip time line 'on two lines</i>'

It seems to me that two fixes are needed:

1) The parser should only transition from state 2 to state 0 when it 
sees a blank line.

2) In order to re-synchronise after any error (e.g. after a format error 
in the subtitle file), it should only transition from state 0 to state 1 
when it sees a line with a single decimal number on it.

Can anyone suggest a workaround?

My Humax TV hard disk recorder shows the same symptoms : after a seek, 
it is often the case that several subtitles go missing before they get 
back in sync. I wonder why!

Regards,
Andy Robinson, Seventh String Software, www.seventhstring.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test.srt
Type: application/x-subrip
Size: 281 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20201231/59041c1f/attachment.bin>
-------------- next part --------------
0:00:00.779415000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1856:handle_buffer:<ift!subparse> pushing newsegment event with time segment start=0:00:08.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x01, time=0:00:08.000000000, base=0:00:00.000000000, position 0:00:08.000000000, duration 99:99:99.999999999
0:00:00.779425000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 0. Parsing line '1'
0:00:00.779430000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 1. Parsing line '00:00:01,000 --> 00:00:05,000'
0:00:00.779435000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:905:parse_subrip_time: parsing timestamp '00:00:01,000'
0:00:00.779441000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:905:parse_subrip_time: parsing timestamp '00:00:05,000'
0:00:00.779445000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 2. Parsing line '<i>Test message 1</i>'
0:00:00.779449000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 0. Parsing line ''
0:00:00.779453000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 1. Parsing line '2'
0:00:00.779516000  1663 0x7f8d10855940 DEBUG               subparse gstsubparse.c:1044:parse_subrip: error parsing subrip time line '2'
0:00:00.779521000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 0. Parsing line '00:00:07,000 --> 00:00:12,000'
0:00:00.779525000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 0. Parsing line '<i>Another test message'
0:00:00.779529000  1663 0x7f8d10855940 LOG                 subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 1. Parsing line 'on two lines</i>'
0:00:00.779533000  1663 0x7f8d10855940 DEBUG               subparse gstsubparse.c:1044:parse_subrip: error parsing subrip time line 'on two lines</i>'


More information about the gstreamer-devel mailing list