[Accessibility] TTS API - voice selection

Sat May 13 03:57:36 PDT 2006

Hello Jonathan,

thanks for your work on this. Your feedback is very useful!

> Jonathan Duddington writes on Fri 05/12/2006:
> 1. Can an application call set_voice_by_properties() without a previous
> call of set_driver() ?  i.e. use a default (preferred) synthesizer
> which has been set up in Speech Dispatcher as a whole rather than
> particular synthesizer which is known to the application?

I'd say no. If this is desired, it can be achieved by a high-level
interface like Speech Dispatcher. I don't see a need to have this
supported in the low-level interface.

The intention with TTS API is to keep it with the smallest set
of functionality that is already enough to cover all the needs.
As we have seen, it gets quite complex even like this.

> 2. set_voice_by_properties() states that fields in voice_description_t
> may be "blank".  Does "blank" mean a NULL pointer, or a pointer to a
> zero character (i.e. an empty string), or either?  This should be
> specified.

Yes, this needs to be clarified. I think it is 0 for age and NULL
pointer for the string values. It will possibly be different for
implementation in other languages.

> 3. list_drivers() and list_voices() return lists of driver and voice
> descriptions.  It should be specified how the end of these lists is
> indicated.

Yes.

> 4. Can an application call set_voice_by_properties() with a
> voice_description_t which has ALL its fields blank?  

I think the application should specify at least the desired
language (some default value it has). Otherwise the user could
possibly get different languages when switching synthesizers
even if both support the language he needs.

The doc says that if the language is not supported, the synthesizer
will choose a different language.

> i.e. to choose the default voice in the default language (as currently
> set in Speech Dispatcher).  For example, an email client may not know
> the language of the text that it's displaying.

Again, a task well suited for a high-level interface like Speech
Dispatcher.

> 5. Consider an application which wants to indicate text spoken either
> with the "default" voice or an "alternate" voice.  For example, an
> "alternate voice" may be wanted by a word processor speaking italic
> text, or a web browser speaking "blockquote" text, or an email client
> speaking quoted text.
> 
> Does the application simply call set_voice_by_properties() with blank
> voice_descripton fields and variant=0 to select the default voice, and
> then variant=1 to select an alternative voice (and variant=2 for a
> additional level of quoting)?  Or should that be variant=1, 2, and 3?

The binding mechanism here is the one described in the SSML specs.
I think what you describe is the intention. Not all the fields
must necessarily be left blank. You could for example ask for variants
of female voices in the same way (and one bright day in the future,
you might even get them :)

But the specs are not very clear on this. It is for example not very
clear to me how this should work with age. For example is
	(age=4, variant=1)
	(age=5, variant=1)
and
	(age=5, variant=1)
	(age=4, variant=2)
the same two voices? Is it actually one voice only for the four options
because the synthesizer has a voice of age 4 and another of age 7?

Does
	(age=12, variant=1)
select the prefered child voice or does it select the voice most
closely matching the age parameter? If the later is the case, how do
I ask for several variants of a ``child'' voice, no matter what the
exact age is?

Variant is a positive integer according to SSML specs. So counting
starts from 1. I'll make this clear in the TTS API document.

> 6. While speaking SSML, if a <voice> <s> or <p> tag within the text
> indicates a change to a language which is not supported by the current
> synthesizer, is it intended that the synthesizer should report this to
> S.D. which could then use a different synthesizer for that part of the
> text?  Or is it intended that SSML tags can only change to a voice
> within the same synthesizer?

I don't know. Of course allowing only switching inside the synthesizer
is the easy way. I think we do not have many multi-lingual messages
properly marked with the languages comming from the AT technologies
today, so maybe we can leave this issue for the future versions
of the API.

I'll shortly update the document with all recent suggestions from
the mailing list.

With regards,
Hynek Hanke