[Accessibility] RE: Multimedia framework requirements

Pete Brunet brunet at us.ibm.com
Mon Mar 7 22:53:59 PST 2005


Thanks Marc, I agree with you that quick start/stop times are needed. 
However, I am still trying to understand what you mean by latency.  I 
thought quick start time equals low latency.  Can you describe a scenario 
with a quick start time but (relatively) long latency?  Thanks, Pete

=====
Pete Brunet, (512) 838-4594, TL 678-4594, brunet at us.ibm.com, ws4g
IBM Accessibility Architecture and Development, 11501 Burnet Road, MS 
9026D020, Austin, TX  78758



"Marc Mulcahy" <marc at plbb.net> 
03/07/2005 02:11 PM

To
Pete Brunet/Austin/IBM at IBMUS, <accessibility at lists.freedesktop.org>
cc

Subject
RE: [Accessibility] RE: Multimedia framework requirements






pete, it's just another case of needing the ability to start and stop 
speech immediately-- not a case of a need for low-latency audio.  the 
scenario is:
 
* Speech is talking
* User presses a key
* Speech is interrupted (could be by halting DMA and/or resetting the 
sound card)
* key is echoed (synthesis is started and audio starts streaming to the 
audio device)
 
note that hardware latency isn't a factor in either the start or stop 
scenario-- in the stop scenario, it's more a factor of how fast the sound 
card can be reset (DMA halted).  this has nothing to do with hardware 
latency.
 
in the start case-- the perceived response time for the user is based on 
how fast the sound card can start transfering data from RAM to the 
hardware audio buffer, not a factor of how big the chunks being 
transferred are.
 
When you start mixing in software, the latency of the software mixer does 
play a factor-- since the stream to the soundcard is then continuous.  but 
when characterizing the accessibility requirement, I think specifying 
low-latency is the wrong terminology-- what we need is quick start and 
shut up times.
 
Marc
-----Original Message-----
From: accessibility-bounces at lists.freedesktop.org 
[mailto:accessibility-bounces at lists.freedesktop.org]On Behalf Of Pete 
Brunet
Sent: Monday, March 07, 2005 12:24 AM
To: accessibility at lists.freedesktop.org
Subject: [Accessibility] RE: Multimedia framework requirements


Marc, How does the need for instant echoing of keyed characters when 
entering text fit in with this situation?  Thanks, Pete 

=====
Pete Brunet, (512) 838-4594, TL 678-4594, brunet at us.ibm.com, ws4g
IBM Accessibility Architecture and Development, 11501 Burnet Road, MS 
9026D020, Austin, TX  78758 

---------------------------------------------------------------------- 
Date: Sat, 5 Mar 2005 17:55:27 -0700 
From: "Marc Mulcahy" <marc at plbb.net> 
Subject: RE: [Accessibility] Multimedia framework requirements 
To: "Gary Cramblitt" <garycramblitt at comcast.net>, 
                 <accessibility at lists.freedesktop.org> 
Message-ID: <KKEGJCDELINGIGICHANAGEKBEDAA.marc at plbb.net> 
Content-Type: text/plain;                 charset="iso-8859-2" 

Well, for what it's worth, here is my $.02. 

1. We in the accessibility community will never succeed in trying to 
re-invent the multimedia server.  There have been many attempts by people 
with expertise in multimedia with varying degrees of success.  So I think 
the right approach is to focus on selecting an existing solution that 
comes 
closest to what we need, and either living with it, or proposing changes 
which will bring it closer to what we need. 

2. The biggest oversight in gnome-speech was that it did not directly 
handle 
the audio coming out of software synthesizers.  Given my experience with 
several commercial and open source speech engines, I came to the 
conclusion 
that the speech framework *must* have control over the audio samples and 
where they go.  If we leave it up to the speech engines, they will all 
implement things differently, and we have much less of a good chance of 
providing a good experience for the end user.  Having control over the 
audio 
gives us better control over quick startup and stop times, as well as the 
ability to route speech to diferent destinations-- files, headsets, 
speakers, telephone lines, etc. 

3. To my mind, ALSA comes the closest to what we need in an audio 
framework 
on Linux.  It's now standard, and provides methods for mixing audio 
streams 
on soundcards which can't do it in hardware.  The prioritization of 
audio-- 
I.E., muting the MP3 player when the computer needs to speak something or 
when a user receives an internet phone call, is the only piece which 
appears 
to be missing. 

Another audio-related aside...  I think there's been some 
mischaracterization of a requirement.  Everyone seems to suggest that what 

we need is low-latency in an audio server or environment, and I'm not 
convinced that this is the case.  You need low-latency, or at least good 
synchronization, if for example you want to animate a character using 
text-to-speech as the voice.  But, I think from an accessibility point of 
view, what we really need is quick start and shut up times, not 
necessarily 
low latency, although low latency is better.  For example, from a blind 
usability point of view, I don't care if the ap sends the sound card a 128 

KB buffer of audio or a 1 KB buffer of audio, as long as the sound stops 
immediately when I press a key, and as long as it starts immediately when 
there's something to be spoken. 

My experience shows that low-latency is in fact not necessarily desirable 
when working with speech.  Presumably speech is a background process which 

goes on while other more intensive tasks are happening in the foreground-- 

copying a file, filtering audio, or something of that sort.  the lower the 

latency, the harder it is to keep speech happy in the background, 
especially 
during periods of high disk activity or network load. 

Rather than having to feed the soundcard 1 K blocks of data, I'd rather 
synthesize 64 K of data, and dump it to the sound card, and let the DMA 
controller transfer it while the processor does something else.  and as 
long 
as I can shut it up immediately, the user doesn't know the difference. 

Marc 

-----Original Message----- 
From: accessibility-bounces at lists.freedesktop.org 
[mailto:accessibility-bounces at lists.freedesktop.org]On Behalf Of Gary 
Cramblitt 
Sent: Saturday, March 05, 2005 6:20 AM 
To: accessibility at lists.freedesktop.org 
Subject: Re: [Accessibility] Multimedia framework requirements 


On Friday 04 March 2005 03:43 pm, Hynek Hanke wrote: 
> 2 Audio requirements 

You may want to think about supported audio formats.  Most existing synths 

seem to produce .wav files (Microsoft riff) or .au. 

Also, there's the issue of how to deliver the audio to the audio 
framework. 
Streams could be more efficient than files?  The TTS API discussion has 
this 
as an unresolved item. 

-- 
Gary Cramblitt (aka PhantomsDad) 
KDE Text-to-Speech Maintainer 
http://accessibility.kde.org/developer/kttsd/index.php 

------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/accessibility/attachments/20050308/993e7d18/attachment-0001.html


More information about the Accessibility mailing list