Problem with seeking "subparse"
Andy Robinson
andy at seventhstring.com
Thu Dec 31 14:38:20 UTC 2020
GST 1.18.2 on Mac Big Sur (and for all I know it might well happen on
Windows too).
I find that subparse gets confused by seeking. I attach a simple
subtitle file Test.srt in "subrip" format, obviously you can put these
subtitles on whatever video you might have at hand.
The pipeline looks like this:
gst-launch-1.0 \
textoverlay name=ov ! autovideosink \
filesrc location=my-video.mp4 ! decodebin ! videoconvert !
videoscale ! ov.video_sink \
filesrc location=Test.srt ! subparse ! ov.text_sink
but of course I am doing this programmatically and this pipeline works
fine if you don't "seek" it. And I don't think it's possible to seek
with gst-launch?
However if you programmatically seek this pipeline to 8 seconds with
GST_DEBUG=subparse:7 then subparse produces errors. I have attached a
file subparse_log.txt showing the crucial lines.
The crucial lines from the source are these, at line 1060 in the
function parse_subrip in gstsubparse.c, dealing with "state 2"
(expecting subtitle text):
if (in_seg) {
state->start_time = clip_start;
state->duration = clip_stop - clip_start;
} else {
state->state = 0;
return NULL;
}
That is, if we are out of segment (parsing lines before the ones we are
interested in) then throw away the subtitle text and transition
immediately to state 0 (expecting sequence number). IMHO this is wrong,
the next thing we are in fact going to see is either another line of
subtitle text or a blank line.
The problem is then compounded by the fact that in state 0 the parser
accepts almost anything - even a blank line - as a valid sequence
number, and transitions to state 1 (expecting timestamps).
These two factors cause the parsing errors to cascade, often destroying
the first 2 or 3 timestamps that we *did* want to see.
Looking at the log I've attached, we see the segment event, start time 8
secs, and then:
State 0. Parsing line '1'
State 1. Parsing line '00:00:01,000 --> 00:00:05,000'
parse_subrip_time: parsing timestamp '00:00:01,000'
parse_subrip_time: parsing timestamp '00:00:05,000'
State 2. Parsing line '<i>Test message 1</i>'
// At this point we transition to state 0 which is wrong -
// we should still be in state 2, waiting for blank line.
State 0. Parsing line ''
// Here we wrongly transition to state 1 because the
// blank line we just saw has been wrongly accepted as
// a valid sequence number. Now we are lost!
State 1. Parsing line '2'
error parsing subrip time line '2'
State 0. Parsing line '00:00:07,000 --> 00:00:12,000'
// I haven't checked out why that was not accepted as a sequence
// number. But is wasn't because we are still in state 0.
State 0. Parsing line '<i>Another test message'
// However that was accepted as a sequence number!
// so we transition to state 1.
State 1. Parsing line 'on two lines</i>'
error parsing subrip time line 'on two lines</i>'
It seems to me that two fixes are needed:
1) The parser should only transition from state 2 to state 0 when it
sees a blank line.
2) In order to re-synchronise after any error (e.g. after a format error
in the subtitle file), it should only transition from state 0 to state 1
when it sees a line with a single decimal number on it.
Can anyone suggest a workaround?
My Humax TV hard disk recorder shows the same symptoms : after a seek,
it is often the case that several subtitles go missing before they get
back in sync. I wonder why!
Regards,
Andy Robinson, Seventh String Software, www.seventhstring.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test.srt
Type: application/x-subrip
Size: 281 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/gstreamer-devel/attachments/20201231/59041c1f/attachment.bin>
-------------- next part --------------
0:00:00.779415000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1856:handle_buffer:<ift!subparse> pushing newsegment event with time segment start=0:00:08.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x01, time=0:00:08.000000000, base=0:00:00.000000000, position 0:00:08.000000000, duration 99:99:99.999999999
0:00:00.779425000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 0. Parsing line '1'
0:00:00.779430000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 1. Parsing line '00:00:01,000 --> 00:00:05,000'
0:00:00.779435000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:905:parse_subrip_time: parsing timestamp '00:00:01,000'
0:00:00.779441000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:905:parse_subrip_time: parsing timestamp '00:00:05,000'
0:00:00.779445000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 2. Parsing line '<i>Test message 1</i>'
0:00:00.779449000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 0. Parsing line ''
0:00:00.779453000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 1. Parsing line '2'
0:00:00.779516000 1663 0x7f8d10855940 DEBUG subparse gstsubparse.c:1044:parse_subrip: error parsing subrip time line '2'
0:00:00.779521000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 0. Parsing line '00:00:07,000 --> 00:00:12,000'
0:00:00.779525000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 0. Parsing line '<i>Another test message'
0:00:00.779529000 1663 0x7f8d10855940 LOG subparse gstsubparse.c:1880:handle_buffer:<ift!subparse> State 1. Parsing line 'on two lines</i>'
0:00:00.779533000 1663 0x7f8d10855940 DEBUG subparse gstsubparse.c:1044:parse_subrip: error parsing subrip time line 'on two lines</i>'
More information about the gstreamer-devel
mailing list