[Accessibility] Updated requirements document

Thu Jan 6 14:43:25 PST 2005

[Milan Zamazal, Montag, 15. November 2004 23:42]
>     OPEN ISSUE:
>
>     - Should an application be able to determine if SHOULD HAVE and
> NICE TO HAVE features are supported or not?

Yes, because the higher level speech framework might decide to avoid the 
features otherwise, or to emulate them.

>     3.1. MUST HAVE: An application will be able to specify the default
>       voice to use for a particular synthesizer, and will be able to
>       change the default voice in between `speak' requests.

Selecting a default language here would also be needed, because in some 
rare cases, a voice could be able to speak several languages. Perhaps we 
could also make the setting of the default voice could be language 
specific, but I guess this would complicate things too much.

>     - Still not clear consensus on how to return the synthesized audio
>       data (if at all).  The main issue here is mostly with how to
> align marker and other time-related events with the audio being played
> on the audio output device.
>

I see three possibilities here:

1. Return a series of raw audio streams (as function result or to a 
callback function). It would be the task of the application to play the 
right stream whenever it wished to jump to a certain marker.
2. Return a single raw audio stream and information that marker A starts 
at time A1 after a number of A2 bytes (as function result or to a 
callback function).
3. Use a library like portaudio to handle the playing in speech drivers 
themselves.

>     - Not clear on how to (or if we even should) specify the audio
>       format to be used by a synthesizer.
>

A multimedia developer told me that the format of raw, uncompressed audio 
data is recognised by all multimedia frameworks, so I don't think we need 
to pass any special information back to the applications.

>     - Implementation issue: Will the interaction with the driver be
>       synchronous or asynchronous?  For example, will a call to `speak'
>       wait to return until all the audio has been processed?

I think both synchronous and asynchronous would be possible. In one case, 
we could use an id for every call and a callback function for passing the 
audio stream. In the other case, the speak function could return a 
pointer to the audio stream.

>       If not, 
>       what happens when a call to "speak" is made while the synthesizer
>       is still processing a prior call to "speak?"
>

This should be up to the driver. An SSML tag at the end  of the first text 
snippet might change the parameters that are used for the second text 
snippet, so at least the XML parsing of the first call needs to be 
finished before the second is synthesised.

Olaf
-- 
KDE Accessibility Project
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/accessibility/attachments/20050106/c1c90e54/attachment.pgp