[Clipart] Re: coordinate the new release

Nathan Eady eady at galion.lib.oh.us
Thu Sep 1 10:09:54 PDT 2005


Nicu Buculei wrote:

> I suspect something wrong with the counting:

The counting is not perfect.  There are known issues with it.  Among
other things, it does not take author or title into account, only
the filename.  It discounts duplicates that are caused by the same
file being in more than one directory, but it does not discount
duplicates due to the same file appearing with slightly different
names.  Going the other direction, if there are two different files
with the same filename, in different directories (e.g., some of
the icon sets have overlapping filenames if you throw out the path,
but the images are different), it only counts one.

I think it is a pretty good approximation, however.  If we use
it that way (e.g., if at Christmas time count-unique-images.pl
says 10537 and so based on that we say, "over 10 thousand"),
I think that's a reasonable approach.  Otherwise we could spend
a lot of effort on an exact count that might be better spent in
other ways.

> at 0.16 we had 4442 images,  a few hours before 0.17 the
> incoming folder contained 600-700 files (some archives, some
> non-clipart, some not valid). Simple math tell we can't be
> at 6584

There is a single collection in shapes/stars that contains rather
a large portion of the excess.  I do not recall the exact number
now, but I remember that we did meet the 5K goal without it, but it
does account for a sizeable portion of the margin by which we
exceeded the goal.

They're very simple, and counting them all as unique images is not
entirely fair, but OTOH they don't take up very much space, because
they are small and compress well[1], so if the only real harm they're
doing is boosting the raw number of images count, that is IMO not
a very big deal.  Right now this seems significant (by raw image
count, they're something on the order, roughly, of 20% of the
collection), but over the long haul this should get lost in the
underflow.  When we hit 35K, nobody will care that that includes
1K of fairly similar star-shaped geometric constructions.

My advice is to just hype on the fact that we met and exceeded
the 5K goal, and not make a big deal about exactly how much we
passed it by.

[1] Well, the .svg compresses well.  The .png, less well.
    These probably could benefit from being greyscale .png
    images, actually, or indexed.



More information about the clipart mailing list