[Accessibility] Multimedia framework requirements

Hynek Hanke hanke at brailcom.org
Fri Mar 4 07:43:51 PST 2005


Hello all,

this message is meant as a continuation of the efforts launched on the recent
Free Standards Group meeting *to ensure there is a reasonable multimedia
framework for accessibility in near future* and to strengthen the cooperation
between accessibility and multimedia developers.

Since this list is a general accessibility list and it aims at developing
standards and recommendations with the immediate goal of writing the code and
fixing the problem, I think it's the right place for this discussion. Many
accessibility developers are already present and I've sent additional
invitation letters to some of them last week. I hope all the major
accessibility projects are present here. I've also sent invitation letters to
the developers of GStreamer, KDE Multimedia, NMM, MAS, Alsa and Agnula.

What follows is not just a statement of goal and a technical document, but also
an explanation of the problem and the description of our motivation. I believe
this is very important for the people who are new to the problem or might not
have thought about it extensively before.


1) Motivation
-------------

We found there is a need for a good multimedia framework when developing
accessibility solutions. In Brailcom, we discovered there is at present not a
single good component that we could use for audio output from speech
synthesis. The fact that all the synthesizers use their own means of audio
output instead of some standard interface confirms our problems.

Currently, users and developers in Free Software face these issues:

* There is no common architecture for working with multimedia.

  a) This means confusion for the developers. The possibility of choice is good
for sure, but having to investigate all the existing solutions each time one
has to write a multimedia application and having to rewrite your media output
layer for a different framework several times only to discover none of them
does what you need is really bad.

  b) It also means that users have to switch between different mutually
incompatible systems when working with different applications (e.g. it's not
possible to open /dev/dsp more than once on many sound cards, it's not possible
to centrally control the volume...) and the developers have to support multiple
systems and spend their time developing bridges between these systems. For
visually impaired users, this incompatibility is even worse, since they must
use a certain media framework supported by their accessibility solutions
(because of speech synthesis) and can't just skip from one to another all the
time. Because of that, many media applications are inaccessible for them.

* The developers have to reinvent the wheel each time they want to implement
multimedia output. A huge amount of code related to data format decoding
and encoding, system audio output to the underlying kernel drivers and other
is written over and over instead of being centralized on one place.

* As far as we know, none of the solutions we have looked into is working well
for accessibility and fulfils even the basic needs described bellow, even just
for the simplest task of audio output.

* A Slashdot discussion following the announcement of the availability of the
Media Application Server in February 03 unanimously showed that the absence of
a good common media framework is a great pain not just for the developers, but
also for the users (in all areas, not just accessibility). It doesn't seem the
situation got significantly better since then.
[http://developers.slashdot.org/article.pl?sid=03/02/03/2137213]


2) Proposal to work on accessibility requirements
-------------------------------------------------

It is not clear how accessibility developers can overcome these problems so
that they can focus on really solving the accessibility question and so that
the users are happy with how the whole system works. The issue of multimedia
output is very complex and there have been many failed attempts at solving
it. Additionally, we have discovered multimedia developers are not aware of
what accessibility needs and maybe the requirements of accessibility are a bit
unusual for them. A conclusion of the recent Free Standards Group meeting on
Hawaii was that we will try to formulate our requirements, divide them between
several categories of importance, and then we will search a way to fulfill them
in near future, hopefully with the help of the multimedia community.

There are several possible ways how that could be done:
	1) Develop our own multimedia server (...and probably fail again)
	2) Extend an existing solution to fulfill our needs
	3) Publish a document about what other developers need to
	do for us so that accessibility works
	4) ...

This short list should serve just to have an idea of where we might be
heading. In the first round of discussion, we should regard these questions as
``outside scope'' and we should try to focus on figuring out what our requirements
really are and ensure that we can use a solution that fulfills these
requirements.

Below is included a draft of the core requirements that were brought up on
Hawaii or in private discussions. The points are quite general and it doesn't
include division by importance, because we first need to agree on the general
and then we can try to define some things more precisely. I think we are in a
bit more difficult situation than with the TTS API where the consensus was
already partially developed.


3) Requirements draft
---------------------

1 General requirements

1.1 Good documentation

Reason: Documentation is absolutely essential for users as well as for
application developers. Still, many projects in this area don't give
it enough importance and the manuals are old or nonexistent.

1.2 Portability

Reason: On of the reasons for having a common media framework is the architecture
independent character of the interface. Applications should not have to care
about different media systems on different architectures.

1.3 Network transparency

Reason: Accessible desktop must be accessible over the network. This means that
not just the graphical output is redirected (it's possible with X), but also
the audio output is redirected. It's also sometimes convenient for the user
to run the media server on a different system to minimize system load.
Network transparency is an established practice in GNU systems.

1.4 Capability of handling different formats

Reason: The application shouldn't have to care about the particular data format
and duplicate the code needed for it's encoding and decoding. Software speech
synthesis systems can use advanced features like inserting user-configurable
sound files into the speech to signal certain characters or events, thus
they will return data in multiple formats.

1.5 Extensibility

Reason: It's essential that plugins for new codecs and devices, as well as
completely new capabilities, can be added easily, since there is so much
diversity in the word of multimedia.

1.6 Existence of a programmers interface that allows easy use of the
basic features as well as a more complicated use of the more advance
capabilities (flow diagrams, etc.) if available.

Reason: The programmer should not be forced to follow the complicated internal
design and to duplicate the code needed to establish the underlying design if
he wants to perform just the basic tasks. On the other hand, the programmer
should be allowed to control the whole process completely when doing a
specific task.

2 Audio requirements

2.1 ``Real-time audio'' output

 	2.1.1 Immediate start and stop of playback (~20ms)

	2.1.2 Notification of the client application when the playback
              terminates

	2.1.3 Capability to tell the current status of the playback

Reason: Accessibility tools need to be able to quickly start and stop
utterances according to user's actions. There is a similar requirement in
professional audio applications, in games and elsewhere. (20ms means 50
starts/stops per second, which should be enough even for such situations as
autorepeat of keys on the keyboard, while still allowing network transport and
higher level processing of requests.)

2.2 Ability to play several audio streams at once and mix them together
*without any effort of the client application itself*.

Reason: It must be possible to run several applications using audio output
without the fear that one of them will block the output for the rest. For
accessibility this is essential since the speech output must always pass
through, but shouldn't block any other media output.

2.3 Ability to separately control the volume of different flows in one place

Reason: The user needs to be able to specify volume levels so that he is
sure, for example, that nothing is louder than the voice he needs to hear
to be able to control the application he is working with. Also, when using
headphones, the user wants to eliminate extremely loud beeps or other sounds.

2.4 Compatibility with the basic low-level sound architectures (at least on the
GNU/Linux platform at the beginning).

Reason: Accessibility needs a solution that works. If a media framework can
only work on some sound architectures and can't work on others in wide use,
that would mean the users of the not (yet) supported architectures are excluded
from using the computer. Unlike other projects, we can't switch to a
technically better alternative unless it already works for most today users.

3 Video requirements

If there are any accessibility requirements on video output in multimedia
frameworks, these should probably be dealt with in a different document.


4) What needs to be done
------------------------

I ask the participants of this mailing list:

* To give their opinion about this issue and especially about the section 
(3 Requirements draft) of this email.  Please resist sliding into arguing which
multimedia framework is better, since that would probably not lead anywhere and
is off-topic for now.

* To invite to this mailing list all the other developers that would like to
seriously participate but were not invited so far.


Thank you,
Hynek Hanke
Brailcom
http://www.freebsoft.org/


More information about the Accessibility mailing list