[Clipart] 0.16 Release
Jonadab the Unsightly One
jonadab at bright.net
Sun Jul 31 03:37:51 PDT 2005
We should be able to release 0.16 on Tuesday. I have it completed
here now, but it has not been uploaded to the server yet. What I'm
planning to do this month is burn the files onto a CD, carry it to
work, put the files on a system there, then come home Monday night,
ssh onto the server, and scp the files over. That way they don't have
to travel over my slow dialup connection.
It is a good release, with some new artwork. Various details...
The top of the statistics file looks like this:
951 Danny Allen
705 Nicu Buculei
453 Jose Hevia
366 Gerald G.
290 Jean-Victor Balin
251 Andy Fitzsimon
187 Alan Horkan
137 Benji Park
106 Architetto Francesco Rollandin
84 John Cliff
82 yves GUILLOU
(This doesn't take into account duplication, but also we have not done
authority control on the author names yet, so some of these people
have more submissions under other names a bit down the list.)
This release really fleshes out the people collection rather nicely,
thanks to a number of contributors, several of whom contributed
several nice images each in this previously rather thin category.
There is now an XML index of the metadata, called index.xml, so that
people who are writing tools in languages other than Perl don't have
to try to parse keywords.idx. Both of these files (despite the name
of the latter, which is retained for historical reasons and
compatibility) index the authors and the titles in addition to the
keywords. (The search page has not yet been updated to take full
advantage of this; at this time it still only looks at keywords.)
This is generated by the new version of the authority control script,
which can be found in the tools release.
Nicu's cards collection, which was rather a mess in 0.15, is now
neatly organized into subdirectories similar to the directory
structure in the zipfile he submitted; they can be found under
I did already upload the failed-files archive, in case someone on the
list wants to have a look through it. It looks big, but it's really
not that bad. Here's how it breaks down...
/unrepaired (13M uncompressed)
These are the unrepaired versions of files that the repair scripts
got replacement copies of from the upload log. Most of them have
HASH in them; a few are zero bytes; all of them have been replaced.
/wrongformat (12M uncompressed)
These are files that did not appear to be SVG or WMF. Most of the
bulk here is in the .avi and the various raster images, but there
are also some PDFs and one .ai and assorted other things. I
didn't really look very hard at any of them.
/duplicates (8.5M uncompressed)
These are files that appear to be exact or near-exact duplicates
of other files that were processed for the release. I tried to
keep the most recent version in most cases and put the older one
here, but it may be that occasionally the newer one ended up here.
Some of these duplicates were uploaded by users, but others were
created by my repair scripts, which in some cases spit out
multiple bitwise-identical copies of a repaired image; I kept
one of each for processing and put the rest here. I know that
there are still duplicates in the collection.
/not-public-domain (2.3M uncompressed)
Various things that are unlikely to be public domain. Most of the
tuxen ended up here this time, as per the discussion on the list.
We probably still need to go over the collection for other things
that need to be in this category. It may be that we should have a
separate collection for things that are not public domain but
worth distributing anyway; tux images seem like they belong in
/not-clip-art (380K uncompressed)
Vector images that pass the technical tests but do not appear
to resemble clip art in any meaningful way.
/needs-repair (172K uncompressed)
Images that still need repairs.
/no-data (16K uncompressed)
Some files that were submitted that appear to contain only
metadata, and no actual data.
/not-wellformed-xml (412K uncompressed)
Files that appear to be (supposed to be) SVG, but XML::Twig
doesn't like them. I have not investigated the details of why.
There are also, as usual, assorted loose files in the failed-files
archive, which failed the svg_validate process in one way or another.
About 27 of them.
Open Clip Art Library: Drawing Together
More information about the clipart