[Bidi] Automatic text direction

Omer Zak w1 at zak.co.il
Mon Feb 28 06:43:25 PST 2005


On Mon, 2005-02-28 at 16:21 +0200, Ely Levy wrote:
> On Mon, 28 Feb 2005, Omer Zak wrote:
> 
> > 1. By file type - regular text files and HTML files specify explicit
> > directionality by different means.  Windows-1255 and UTF-8 files use
> > different character encodings to specify explicit directionality.
> > Besides, it is outgrowth of MIME file types handling.
> 
> So you would have a rule on how to parse the file to get the right
> encoding? do we need to make a lib that parse it by file type?

Yes, why not?
Reuse existing Unix, MIME and XML tools and facilities.

> > 2. I did mention by program (and by GUI control) below.
> 
> are we going to override locale information?

The out-of-the-box ATDSS rules may want to take into account the current
locale but users will be able to override it.

Consider the use-case of someone, who usually works in English (with
occasional use of Hebrew words), but once in a while gets a technical
paper written in Hebrew with several technical terms written in English
inside it.

We'll support this by allowing the user to specify an alternate ATDSS,
and this alternate ATDSS will override the locale information.

> > 3. We want to handle HTML/XML tags in an appropriate way, to be
> > specified in the ATDSS (Automatic Text Direction Style Sheet) rules.

Anyway, another way to view my ATDSS suggestion is as a means to allow
the community to quickly evolve a future high-quality ATD (automatic
text direction) standard by having each one trying various approaches
and finding&documenting the approach which works best for him.
                                                 --- Omer

> > On Mon, 2005-02-28 at 13:02 +0200, Ely Levy wrote:
> > > nice idea:)
> > > But why do you need to do it by file type?
> > > doesn't it make more sense to do ti by program?
> > > I mean we always want to ignore xml tags in direction setting no?
> > > why would we want an option to disable it?
> > >
> > > Ely Levy
> > > System group
> > > Hebrew University
> > > Jerusalem Israel
> > >
> > >
> > >
> > > On Mon, 28 Feb 2005, Omer Zak wrote:
> > >
> > > > I suggest that a standard be set for an "automatic text direction style
> > > > sheet" rather than for "automatic text direction".  This is because
> > > > people may have different expectations from an automatic text direction
> > > > algorithm implementation, according to the language with which they work
> > > > most of the time at the moment.
> > > >
> > > > The style sheet (or rather configuration file) will, by default, be in a
> > > > standard location (say, /etc/bidi/autobidi, overridable by
> > > > ~/.bidi/autobidi).
> > > >
> > > > The style sheet will specify, for each file type (identified by filename
> > > > extension, MIME type, or file contents, according to standard Unix
> > > > mechanisms).  For each relevant file type, some rules for setting
> > > > automatic text direction will be defined.
> > > >
> > > > The case of *.xml files will be handled specifically - the automatic
> > > > text direction rules will be associated with the scheme, which is
> > > > associated with the *.xml file.
> > > >
> > > > Another special case is the text displayed in GUI controls of various
> > > > types in various applications.  For this, we need a standard to identify
> > > > application instance, application and control within application, to
> > > > which the rules are applicable.
> > > >
> > > > About the format of the rules themselves:
> > > > I suggest that some predicates be defined, according to the rules
> > > > already proposed and implemented for automatic text direction.
> > > > The predicates will be combined by means of Scheme code.
> > > >                                                   --- Omer
> > > >
> > > > On Mon, 2005-02-28 at 11:54 +0200, Ely Levy wrote:
> > > > > Hey,
> > > > > I think we should create a standard for automatic text direction.
> > > > > There are few issues on how to detect the text direction (RTL or LTR)
> > > > > for example numbers should not be the setter of direction, the ability
> > > > > to manual override, split text ( for example gaim nick in english line is
> > > > > in hebrew or arabic what should be the text direction?should it be splited
> > > > > nick from one side text from the other? should it ignore the nick and
> > > > > detect it by the text?)
> > > > > What about html text?how do we ignore tags as direction setters?
> > > > >
> > > > > the big question, can a good autodetect actually be done? or should we
> > > > > give up on it and do maunal setting only?

-- 
My own blog is at http://www.livejournal.com/users/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html



More information about the bidi mailing list