[Accessibility] TTS API interface description + reqs update
Gary Cramblitt
garycramblitt at comcast.net
Sun Apr 30 05:18:42 PDT 2006
On Saturday 29 April 2006 08:31, Hynek Hanke wrote:
> Hello,
>
> I've updated the requirements document with the latest suggestions
> and tried to incorporate what we discussed on one FSG meeting and
> a subsequent meeting of a subgroup specifically about TTS API.
2.4. I suggest adding a method that returns a list of supported sound icon
names, so that applications can discover what is supported. Also, I think we
should provide a recommended list of "standard" sound icons, along with .wav
files to make it easy for implementors to support them.
2.5. The defer() method implies a heap of "paused" messages. I'm wondering
if that is really necessary. Is there an upper limit on the number of
messages in the heap? To ease the burden on implementors, I suggest changing
bool_t can_defer_message;
to
int can_defer_message;
where the value is the max heap depth (number of defers) supported. 0 means
no defer capability. -1 means no maximum depth (constrained only by
available memory). 1 means only one defer at a time is supported.
For the discard(message_id_t message) method, I suggest that applications can
pass a 0 for the message id, which means "discard the last deferred message".
This is especially appropriate if only one defer at a time is supported. The
application will not need to keep track of msg ids.
We should add a sentence to the discard() method saying that applications
should take care to discard unneeded messages, lest the defer() heap grow
unnecessarily.
2.7. AFAIK, all existing speech synthesizers produce raw pcm or wav audio.
Rather than deal with the complexity of other formats (ogg, flac, etc.), I
suggest that we limit the api to these two formats. Furthermore, only
uncompressed wav files. The downside to this suggestion is that compression
might be desirable for network efficiency. But perhaps that could be
implemented internally? What are the use cases that would apply here?
I was looking at the wav file format here:
http://www.borg.com/~jglatt/tech/wave.htm
and noticed that it includes a "Cue Chunk" that is almost the same as the
index marking capability we need. Unfortunately, the Cue Chunk format seems
rather complex and not exactly a good fit for our needs. Too bad.
It would be helpful if eventually there were a Use Case section in the
document that explains how to use the API for common situations. This would
help people to understand the API and the rationale for some of the decisions
we've made, such as defer().
I have some spelling and wording changes I will post in a separate message.
--
Gary Cramblitt (aka PhantomsDad)
More information about the accessibility
mailing list