[Accessibility] Updated requirements document
Olaf Schmidt
ojschmidt@kde.org
Thu Jan 6 14:43:25 PST 2005
[Milan Zamazal, Montag, 15. November 2004 23:42]
> OPEN ISSUE:
>
> - Should an application be able to determine if SHOULD HAVE and
> NICE TO HAVE features are supported or not?
Yes, because the higher level speech framework might decide to avoid the
features otherwise, or to emulate them.
> 3.1. MUST HAVE: An application will be able to specify the default
> voice to use for a particular synthesizer, and will be able to
> change the default voice in between `speak' requests.
Selecting a default language here would also be needed, because in some
rare cases, a voice could be able to speak several languages. Perhaps we
could also make the setting of the default voice could be language
specific, but I guess this would complicate things too much.
> - Still not clear consensus on how to return the synthesized audio
> data (if at all). The main issue here is mostly with how to
> align marker and other time-related events with the audio being played
> on the audio output device.
>
I see three possibilities here:
1. Return a series of raw audio streams (as function result or to a
callback function). It would be the task of the application to play the
right stream whenever it wished to jump to a certain marker.
2. Return a single raw audio stream and information that marker A starts
at time A1 after a number of A2 bytes (as function result or to a
callback function).
3. Use a library like portaudio to handle the playing in speech drivers
themselves.
> - Not clear on how to (or if we even should) specify the audio
> format to be used by a synthesizer.
>
A multimedia developer told me that the format of raw, uncompressed audio
data is recognised by all multimedia frameworks, so I don't think we need
to pass any special information back to the applications.
> - Implementation issue: Will the interaction with the driver be
> synchronous or asynchronous? For example, will a call to `speak'
> wait to return until all the audio has been processed?
I think both synchronous and asynchronous would be possible. In one case,
we could use an id for every call and a callback function for passing the
audio stream. In the other case, the speak function could return a
pointer to the audio stream.
> If not,
> what happens when a call to "speak" is made while the synthesizer
> is still processing a prior call to "speak?"
>
This should be up to the driver. An SSML tag at the end of the first text
snippet might change the parameters that are used for the second text
snippet, so at least the XML parsing of the first call needs to be
finished before the second is synthesised.
Olaf
--
KDE Accessibility Project
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/accessibility/attachments/20050106/c1c90e54/attachment.pgp
More information about the Accessibility
mailing list