[Uim] [Docs] Terminology problems

Martin Swift martin at swift.is
Mon Mar 12 16:17:15 EET 2007


 [ I'm terribly sorry about this horribly long email and hope that it
   doesn't scare too many readers from making it through. If anyone
   would like to get an abstract or discuss this live, please let me
   know. ]

In this past month I've gone full circle a few times on this issue,
and though I'm still not completely sure about the details, haven't
seen robust enough an argument for the "input method framework",
"input method" proposal.

Furthermore, I find no little evidence for the past use of this
terminology, making it questionable whether changing to it is worth
the temporary confusion.

Both impressions may, of course, be due to my short-sightedness, but
please bear with me.

Below, I've tried to clarify my thoughts and note where I still am
confused in the hope that this may be cleared up.

On Tue, Feb 06, 2007 at 01:06:12PM +0100, Jan Willem Stumpel wrote:
> Martin Swift wrote:
> 
> > Hi Jan,
> 
> > [..] In your terminology, a monolithic software that converted 
> > to a single language is an "input method" while if it were 
> > modularized and catered to multiple characters, an "input 
> > method framework".
> 
> But if modularized, it would not monolithic! So yes, I would call
> it a framework.

Err ... I'm not sure what you're getting at there. I wasn't
disagreeing with you, but setting up the terms to lead up to this:

On Tue, Feb 06, 2007 at 11:17:36AM +0900, Martin Swift wrote:
> > What about modularized software that "could" be extended to use other
> > IMs but isn't and won't? Inversely, what about monolithic software
> > that clumps together various character conversions?

Granted, this is a hypothetical, but raises the issue of where to draw
the line between "input methods" and "input method frameworks" that I
don't think you've adequately explained. The way I understand it
"input method framework", "input method" terminology you suggested may
be presented as:

                    one ruleset       multiple rulesets
                +----------------------------------------------
  monolithic    |  "input method"    "Input method collection"
                |
  modularized   |   ???              "input method framework"

I couldn't see how you defined the lower off-diagonals in your
terminology. Perhaps this example will explain my confusion: If one
were to write software that was functionally identical to uim, but had
only one "input method/conversion engine", what would this be called
in your terminology.

Also, what is the purpose of labelling software differently that
differs only in design structure ("collection" vs. "framework")?

As I fail to see the benefit from seperating these terms, I think that
having a single, unambiguous term for software that is functionally
identical, regardless of internal structure or the range of
capabilities (Windows and Linux are both OSes) is worth the
simplicity.

Returning to Jeroen's reply to my original email and bringing up the
issue of consistancy with past terminology, it is valuable to note
that his understanding was drawn from experience with documentation.
Though I may have phrased my email a little naïvely, I didn't approach
this without any background research and had, in fact, come to the
same conclusion as him.

The point is that hypotheticals aside, choosing a terminology
irrespective of tradition may cause some confusion as old
documentation is phased out. Cooperation from related projects is,
furthermore, not guaranteed.

>From what I've found, these are some of the terms stated by the
projects themselves:

  Uim:       Input method library
  SCIM:      Input method
  IIIMF:     Input method framework

  Anthy:     Conversion engine
  Canna:     Conversion system
  FreeWnn:   Conversion system

  scim-m17n: Input module
  scim-uim:  Input module

  PRIME:     Input Method Editor

My impression is that this supports the understanding that Jeroen and
I currently have. Judging from forum discussions, there is, however,
still a lot of confusion among users.

At this stage, there are two things I would be very interested in
finding out:

* Jan's rationale for why the change of terms is useful and comments
  on my arguments against.

* Some opinions from those on the inside that have worked on the
  "input methods/conversion engines" on how these tools are best
  labelled. In particular in terms of self-consistency and
  backwards-compatibility with past (though very unstandardized)
  terminology.

On Tue, Feb 06, 2007 at 01:06:12PM +0100, Jan Willem Stumpel wrote:
> I think uim is a *framework* which has a few *input methods*
> shipped with it (as a sort of demos, but actually usable) by
> default (like hangul2/3, romaja, py). In principle, it could be
> shipped without them; but that would not be good marketing -- the
> user should be able to do something with uim "out of the box".

This is precisely the sort of ambiguity that I think is unhelpful. The
software has different terms depending on what it ships with, not
depending on what it does.

> Interestingly, in the case of uim, when you install m17nlib, none
> of the methods offered by m17n are accessible to uim by default.
> You can make them available (or un-available) individually.

Isn't this the case with all the "input methods/conversion
engines"?
  uim-pref-gtk > Global settings > Input method deployment > Enabled
  input methods

> > It seems more logical to me to use:
> 
> > * "input method" for any software that offers a complete system
> >  of converting input.
> 
> What do you mean by "complete" here? A complete method for one
> language, or something capable of doing input in all the world's
> languages?

I meant the former. I'm not so sure about this any more, though, as
uim could easily be distributed without any "input methods/conversion
engines".

> I chose my terms because "input method" has traditionally been
> used for things like anthy and canna (for one language). Things
> like uim / scim / IIIMF (with extensibility for all languages) are
> relatively new. So it may make sense to invent some new word for
> them.

I'm not sure this is the case. According to
  <http://anthy.sourceforge.jp/cgi-bin/hikien/hiki.cgi?uim%27s+history>
uim was started as a sub-project of Anthy in 2002. Anthy was
registered with Sourceforge at the end of March, 2002.

Furthermore,
  <http://anthy.sourceforge.jp/cgi-bin/hikien/hiki.cgi?What+is+anthy%3F>
refers to UIM as a "multilingual collection of input methods" and
itself as a "kana-kanji conversion engine",
"anthy-the-conversion-engine" and a "Japanese input system".

 ... *sigh*.

> > In this language, Anthy would be a used by the "input method" 
> > uim as a "conversion engine". Does uim delegate it's job to the
> > "input method" Anthy, or does uim use an internal part of
> > Anthy (then better termed "conversion engine")? If all uim does
> > is delegate its job to other input methods, then, yes, Jan's 
> > terminology of "input method frameworks" might fit better.
> 
> I _think_ that is what uim does (delegating the actual work to
> anthy, only handling things like catching the keystrokes, and
> pushing anthy's output to the application that the
> keystrokes came from), but the experts should answer this.

Yes, expert response would be greatly appreciated.

> I am not so sure that it is useful (for user-level documentation)
> to make a distinction between an input method (or as you call it,
> a conversion engine) and a "dictionary" (the actual set of
> internal rules that the method/engine uses). For the user, the
> "method" (or "engine") is how it behaves. And this is completely
> determined by the rule set it happens to use at the given moment.

By dictionary I mean, for example, Anthy's user defined dictionary
(uim-dict-gtk), so I would say that the term is necessary.

> I suppose you would consider the m17n lib to be one "conversion
> engine" with many different "dictionaries". But I think that
> calling it a "collection of many different input methods" (which
> can be plugged into a framework like uim/scim/IIIMF) is more
> understandable to users.

But earlier you said

  "If such a thing exists (perhaps m17nlib is one, but it is not
  monolithic) I would call it an "input method collection".

Don't these conflict? m17nlib being a "collection of many different
input methods", but at the same time not, because it isn't monolithic?
I'm a little confused by this.

Kudos to those who made it this far... ;-)

Cheers,
Martin

-- 
\u270C



More information about the uim mailing list