Fwd: Re: [Clipart] metadata: aren't the keywords actually categories (and can keywords be added)?
Andrew Archibald
andrew.archibald at sympatico.ca
Fri Apr 15 21:55:43 PDT 2005
Mike Traum wrote:
> Andrew,
> I understand the issues you raise with the big xml file proposal.
> But, I don't think symbolic links will work, and I do think that hard
> links is a very messy solution to the problem. It doesn't scale well
> - what happens if openclipart has 50,000 images? So many duplications
> in the package will just end up in bloat.
What's messy about hard links? They're a pretty good approximation to
shared copy-on-write; no duplication at all, in fact having the same file
in multiple directories is what they're for. I have no idea what Windows
thinks of them, though (haven't used Windows in a long time).
> How about a flat file structure with no path whatsoever? I think this
> would make the most sense.
That's nearly as useless for people using a file browser as un one big XML
file. Perhaps each package could single out one keyword as the most
important (if the package doesn't specify, just pick one at random) and it
could wind up in the corresponding category directory. Or when the
categories are made (essentially from search queries) they could be given a
(perhaps implicit) priority.
That last suggestion in detail:
When making a localized package, one specifies a category tree; each
category is specified by a set of keywords (or more generally, by a boolean
metadata expression, regular expression, etcetera). The file gets dumped
in the directory corresponding to the first matching category (files that
fall through go in the root "miscellaneous" category). A big XML catalog
goes in the root directory as well. An interactive tool, then, simply
presents images as being in all matching categories. A file browser sees
the files in an appropriate category (but only one). The system could even
include (optional) hardlinks so a file appears in all the category
directories where it belongs.
Keep in mind that OCAL and other SVG clipart collections could reach the
multi-gigabyte mark, with hundreds of thousands of files - imagine a street
maps collection, or mass-conversion of multiple-CD professional clipart
collections.
A suitably designed collection format could accomodate offline storage as
well - the index is online, but when you pick an image it tells you "That
image is on CD #347, go get it".
> Regarding the application I'm proposing, sure, I'll be able to
> support pretty much anything. But, this all seems to be up in the air
> right now, and I'd like to see some data definitions of proposed xml
> files and a roadmap on the package structure before I start a project
> based on all of that.
There's really no hurry to design an all-encompassing database design.
We'll throw the first one away anyway. We should just be careful to record
enough information in the images that we won't have to go through and
manually change all of them later. The whole point of using XML is so if
we change something, it's easy to write a hack that reads the one format
and writes the other.
Andrew
More information about the clipart
mailing list