[Accessibility] TTS API: say_text/say_defered (was: TTS API
interface description)
Hynek Hanke
hanke at brailcom.org
Thu May 11 13:34:47 PDT 2006
Hello,
so there is a suggestion to modify the say_text() and say_defered()
functions:
1) Argument(s) should be added to specify where speaking should
terminate.
Previously, the above functions only had arguments specifying where
speaking should start. However, for the very same reason (maintaining
SSML context etc.) it was proposed to add such capability also
for end of synthesis inside the message.
It seems to me, however, this is possible to achieve with use of
index marks and events even without having this capability explicitly
in in TTS API.
More precisely said, it is possible on word/sentence level for both
AUDIO_OUTPUT_RETRIEVAL and AUDIO_OUTPUT_PLAYBACK. It is not possible on
phoneme level. We decided not to handle phoneme level events and
synchronization in this early version of the API for the great
difficulty it presents.
The mechanism would be as follows: Say the application wants to only
speak the message till character position 3124. Regardless of whether
audio is being played on the audio device, or if audio data are
delivered to the application, it will wait until an event on position
past character position 3124 is reached and then stop the process
(stop synthesis, audio, discard the other blocks of data received).
2) It should be possible to start speaking from a given character
position in the message.
I think this is a reasonable requirement. I'll add "character position"
as another type of position type for the above two functions, if nobody
protests.
The ``speak from character position'' capability can't be achieved
currently relying on index marks/events only as the events on the given
character position might not be known yet and audio
might not be produced yet.
3) It should be possible to start speaking from the position where last
defer() was called.
As in 1), this capability can already be achieved by the application
index marks/events. It remembers the last event from the given message
and then requests start from this place.
Adding the proposed value of '0' to say_defered() meaning, speak from
the place where defer() was last called, seems redudant and it would
introduce inconsistency between say_text() and say_defered(). Remember
not all synthesizers which support say_text() will support
say_defered(). There is no way to extend say_text() with this feature
as say_text() is not necessarily preceeded by any defer() call.
With regards,
Hynek Hanke
More information about the accessibility
mailing list