[Clipart] Authority Control

Nicu Buculei nicu at apsro.com
Sun Jan 23 23:38:07 PST 2005


Jonadab the Unsightly One wrote:
> I've run it over (a copy of) the 0.9 release, and after some basic
> consolidation (such as case folding) it came up with the statistics
> below.  (Keywords that only occur once in the whole collection are not
> listed.)  I'd like to have a couple of additional sets of eyeballs
> besides mine look over this list.  I know there are pairs of keywords
> in there that are functionally equivalent and should be combined.
> 
> There are some that I already know need to be combined:
> bug,bugs
> mammal,mammals
> bird,birds
> animal,animals

bill,gates
tool,tools
sign, signs

> Those may be combined already on the list below, as are versions of
> keywords that differ only in capitalization.  Plus I also already know
> we need to strip leading whitespace.  And I already know we want to

striping leading whitespace will make the list easier to parse and 
easily identify other potential problems

> remove the "unsorted" keyword from images that aren't in the unsorted
> directory.  But I'm sure there are more on the list than that.  For
> example, are "action" and "actions" used in the same sense?  What
> about "application" and "apps"?

i believe "apps" comes for the subdirectory name in the icon themes

we cal also remove unuseful errors introcuced by the system, like:
improvisedkeywordparse
hash
0x996c42c

and meaningless ones, which were part of longer sentences, like:
un
u

-- 
nicu



More information about the clipart mailing list