Fwd: Re: [Clipart] metadata: aren't the keywords actually categories (and can keywords be added)?

Jon Phillips jon at rejon.org
Sat Apr 16 09:02:35 PDT 2005

On Sat, 2005-04-16 at 05:10 -0700, Mike Traum wrote:
> Andrew,
> Regarding hard links, I thought it was the same as a copy (oddly
> enough - I've been working on unix/linux systems for 10+ years).
> After doing a bit of research, it appears that it's not (it's an
> inode reference) on ext2 and ext3 filesystems, and probably others.
> And, tar does support this. Zip does not, but that's to be expected
> and proabably acceptable. So, I agree, this would be the best
> solution by far. You'd have a download that isn't bloated becuase of
> duplicate files. You'd have an extraction that isn't bloated on
> filesystems that support it. The only issue is extraction filesystems
> doesn't support it (Windows XP / NTFS doesn't, I believe), but I
> think we can all live with that.

I don't think we can follow this line of thinking. The clipart should
not be about exclusivity, as what we are trying to make accessible is
the clipart and not a file browser, or only a system for unix/linux/etc.
In all honesty, support of windows is a major priority for projects like
Inkscape and the gimp. For Inkscape, the number of windows downloads is
much much much larger than for linux/unix.

> It appears the 'link' command in perl can make hardlinks, but it
> looks like it will fail if a hardlink cannot be made (or, maybe it'll
> just copy?), so this will need to be taken into account in the perl
> packaging scripts. 

I disagree with doing packages this way thoroughly because it locks in
the system to certain file systems. Remember: The file hierarchy in the
packages is temporary. Everything is going to be moved to a document
management system on the web server, and then customized packages, and
releases can be built by anyone. This is the goal. I think it best to
keep our eyes on the prize.

> Also, this does place the requirement that packages be generated on a
> filesystem that support's this. Is this acceptable to the openclipart
> Release Manager, whoever he may be?
> Regading the data definition, I definitely agree that having the data
> in the images is the highest priority. And I also agree that doing a
> data definition now will likely result in changes. But, right now
> there is NO data definition for certain things that are critical to
> write the application I'm proposing. For example, I previously had
> discussions about transforming the .idx in the package to xml, as
> well as a future planned 'hierarchy-en.xml' file. The hierarchy file
> may not be necessary for the clip organizer app, by an index
> certainly would be of great help (especially since it's already
> there, but in perl form). Otherwise, importing a openclipart package
> into a client app will take FOREVER, being that every individual file
> would need to be read and parsed.

Go ahead and make a schema for this index, or even better find one that
is already standardized, and see how you can plug this into the project.
The interest is there, so I think if you just do the work it will be
supported. Also, please join up on the project. If you email me your
username, email, gpg/pgp key, and ssh key, I will get you access to our
fd.o project/web server. Hopefully that will empower you further.

> Again, thanks for clarifying the hardlink thing - I think that's a
> perfect solution...
> mike
> --- Andrew Archibald <andrew.archibald at sympatico.ca> wrote:
> > Mike Traum wrote:
> > > Andrew,
> > > I understand the issues you raise with the big xml file proposal.
> > > But, I don't think symbolic links will work, and I do think that
> > hard
> > > links is a very messy solution to the problem. It doesn't scale
> > well
> > > - what happens if openclipart has 50,000 images? So many
> > duplications
> > > in the package will just end up in bloat.
> > 
> > What's messy about hard links?  They're a pretty good approximation
> > to 
> > shared copy-on-write; no duplication at all, in fact having the
> > same file 
> > in multiple directories is what they're for.  I have no idea what
> > Windows 
> > thinks of them, though (haven't used Windows in a long time).
> > 
> > > How about a flat file structure with no path whatsoever? I think
> > this
> > > would make the most sense.
> > 
> > That's nearly as useless for people using a file browser as un one
> > big XML 
> > file.  Perhaps each package could single out one keyword as the
> > most 
> > important (if the package doesn't specify, just pick one at random)
> > and it 
> > could wind up in the corresponding category directory.  Or when the
> > 
> > categories are made (essentially from search queries) they could be
> > given a 
> > (perhaps implicit) priority.
> > 
> > That last suggestion in detail:
> > When making a localized package, one specifies a category tree;
> > each 
> > category is specified by a set of keywords (or more generally, by a
> > boolean 
> >   metadata expression, regular expression, etcetera).  The file
> > gets dumped 
> > in the directory corresponding to the first matching category
> > (files that 
> > fall through go in the root "miscellaneous" category).  A big XML
> > catalog 
> > goes in the root directory as well.  An interactive tool, then,
> > simply 
> > presents images as being in all matching categories.  A file
> > browser sees 
> > the files in an appropriate category (but only one).  The system
> > could even 
> > include (optional) hardlinks so a file appears in all the category 
> > directories where it belongs.
> > 
> > Keep in mind that OCAL and other SVG clipart collections could
> > reach the 
> > multi-gigabyte mark, with hundreds of thousands of files - imagine
> > a street 
> > maps collection, or mass-conversion of multiple-CD professional
> > clipart 
> > collections.
> > 
> > A suitably designed collection format could accomodate offline
> > storage as 
> > well - the index is online, but when you pick an image it tells you
> > "That 
> > image is on CD #347, go get it".
> > 
> > > Regarding the application I'm proposing, sure, I'll be able to
> > > support pretty much anything. But, this all seems to be up in the
> > air
> > > right now, and I'd like to see some data definitions of proposed
> > xml
> > > files and a roadmap on the package structure before I start a
> > project
> > > based on all of that.
> > 
> > There's really no hurry to design an all-encompassing database
> > design. 
> > We'll throw the first one away anyway.  We should just be careful
> > to record 
> > enough information in the images that we won't have to go through
> > and 
> > manually change all of them later.  The whole point of using XML is
> > so if 
> > we change something, it's easy to write a hack that reads the one
> > format 
> > and writes the other.
> > 
> > Andrew
> > _______________________________________________
> > clipart mailing list
> > clipart at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/clipart
> > 
> __________________________________ 
> Do you Yahoo!? 
> Plan great trips with Yahoo! Travel: Now over 17,000 guides!
> http://travel.yahoo.com/p-travelguide
> _______________________________________________
> clipart mailing list
> clipart at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/clipart
Jon Phillips

USA PH 510.499.0894
jon at rejon.org

Inkscape (http://inkscape.org)
Open Clip Art Library (www.openclipart.org)
CVS Book (http://cvsbook.ucsd.edu)
Scale Journal (http://scale.ucsd.edu)

More information about the clipart mailing list