[Accessibility] TTS API interface description + reqs update

Gary Cramblitt garycramblitt at comcast.net
Sun Apr 30 05:18:42 PDT 2006


On Saturday 29 April 2006 08:31, Hynek Hanke wrote:
> Hello,
>
> I've updated the requirements document with the latest suggestions
> and tried to incorporate what we discussed on one FSG meeting and
> a subsequent meeting of a subgroup specifically about TTS API.

2.4.  I suggest adding a method that returns a list of supported sound icon 
names, so that applications can discover what is supported.  Also, I think we 
should provide a recommended list of "standard" sound icons, along with .wav 
files to make it easy for implementors to support them.

2.5.  The defer() method implies a heap of "paused" messages.  I'm wondering 
if that is really necessary.  Is there an upper limit on the number of 
messages in the heap?  To ease the burden on implementors, I suggest changing

bool_t can_defer_message;

to

int can_defer_message;

where the value is the max heap depth (number of defers) supported.  0 means 
no defer capability.  -1 means no maximum depth (constrained only by 
available memory).  1 means only one defer at a time is supported.

For the discard(message_id_t message) method, I suggest that applications can 
pass a 0 for the message id, which means "discard the last deferred message".  
This is especially appropriate if only one defer at a time is supported.  The 
application will not need to keep track of msg ids.

We should add a sentence to the discard() method saying that applications 
should take care to discard unneeded messages, lest the defer() heap grow 
unnecessarily.

2.7.  AFAIK, all existing speech synthesizers produce raw pcm or wav audio.  
Rather than deal with the complexity of other formats (ogg, flac, etc.), I 
suggest that we limit the api to these two formats.  Furthermore, only 
uncompressed wav files.   The downside to this suggestion is that compression 
might be desirable for network efficiency.  But perhaps that could be 
implemented internally?  What are the use cases that would apply here?

I was looking at the wav file format here:

http://www.borg.com/~jglatt/tech/wave.htm

and noticed that it includes a "Cue Chunk" that is almost the same as the 
index marking capability we need.  Unfortunately, the Cue Chunk format seems 
rather complex and not exactly a good fit for our needs.  Too bad.

It would be helpful if eventually there were a Use Case section in the 
document that explains how to use the API for common situations.   This would 
help people to understand the API and the rationale for some of the decisions 
we've made, such as defer().

I have some spelling and wording changes I will post in a separate message.

-- 
Gary Cramblitt (aka PhantomsDad)


More information about the accessibility mailing list