[Clipart] 0.16 Release

Jonadab the Unsightly One jonadab at bright.net
Sun Jul 31 03:37:51 PDT 2005


We should be able to release 0.16 on Tuesday.  I have it completed
here now, but it has not been uploaded to the server yet.  What I'm
planning to do this month is burn the files onto a CD, carry it to
work, put the files on a system there, then come home Monday night,
ssh onto the server, and scp the files over.  That way they don't have
to travel over my slow dialup connection.

It is a good release, with some new artwork.  Various details...

The top of the statistics file looks like this:
  951	Danny Allen
  705	Nicu Buculei
  453	Jose Hevia
  366	Gerald G.
  290	Jean-Victor Balin
  251	Andy Fitzsimon
  187	Alan Horkan
  137	Benji Park
  106	Architetto Francesco Rollandin
   84	John Cliff
   82	yves GUILLOU

(This doesn't take into account duplication, but also we have not done
authority control on the author names yet, so some of these people
have more submissions under other names a bit down the list.)

This release really fleshes out the people collection rather nicely,
thanks to a number of contributors, several of whom contributed
several nice images each in this previously rather thin category.

There is now an XML index of the metadata, called index.xml, so that
people who are writing tools in languages other than Perl don't have
to try to parse keywords.idx.  Both of these files (despite the name
of the latter, which is retained for historical reasons and
compatibility) index the authors and the titles in addition to the
keywords.  (The search page has not yet been updated to take full
advantage of this; at this time it still only looks at keywords.)
This is generated by the new version of the authority control script,
which can be found in the tools release.

Nicu's cards collection, which was rather a mess in 0.15, is now
neatly organized into subdirectories similar to the directory
structure in the zipfile he submitted; they can be found under
recreation/games/cards/

I did already upload the failed-files archive, in case someone on the
list wants to have a look through it.  It looks big, but it's really
not that bad.  Here's how it breaks down...

/unrepaired (13M uncompressed)
    These are the unrepaired versions of files that the repair scripts
    got replacement copies of from the upload log.  Most of them have
    HASH in them; a few are zero bytes; all of them have been replaced.
/wrongformat (12M uncompressed)
    These are files that did not appear to be SVG or WMF.  Most of the
    bulk here is in the .avi and the various raster images, but there
    are also some PDFs and one .ai and assorted other things.  I
    didn't really look very hard at any of them.
/duplicates (8.5M uncompressed)
    These are files that appear to be exact or near-exact duplicates
    of other files that were processed for the release.  I tried to
    keep the most recent version in most cases and put the older one
    here, but it may be that occasionally the newer one ended up here.
    Some of these duplicates were uploaded by users, but others were
    created by my repair scripts, which in some cases spit out
    multiple bitwise-identical copies of a repaired image; I kept
    one of each for processing and put the rest here.  I know that
    there are still duplicates in the collection.
/not-public-domain (2.3M uncompressed)
    Various things that are unlikely to be public domain.  Most of the
    tuxen ended up here this time, as per the discussion on the list.
    We probably still need to go over the collection for other things
    that need to be in this category.  It may be that we should have a
    separate collection for things that are not public domain but
    worth distributing anyway; tux images seem like they belong in
    that category.
/not-clip-art (380K uncompressed)
    Vector images that pass the technical tests but do not appear
    to resemble clip art in any meaningful way.
/needs-repair (172K uncompressed)
    Images that still need repairs.
/no-data (16K uncompressed)
    Some files that were submitted that appear to contain only
    metadata, and no actual data.
/not-wellformed-xml (412K uncompressed)
    Files that appear to be (supposed to be) SVG, but XML::Twig
    doesn't like them.  I have not investigated the details of why.

There are also, as usual, assorted loose files in the failed-files
archive, which failed the svg_validate process in one way or another.
About 27 of them.

-- 
Open Clip Art Library:  Drawing Together
http://www.openclipart.org/




More information about the clipart mailing list