Fwd: Re: [Clipart] metadata: aren't the keywords actually categories (and can keywords be added)?

Sat Apr 16 05:10:48 PDT 2005

Andrew,
Regarding hard links, I thought it was the same as a copy (oddly
enough - I've been working on unix/linux systems for 10+ years).
After doing a bit of research, it appears that it's not (it's an
inode reference) on ext2 and ext3 filesystems, and probably others.

And, tar does support this. Zip does not, but that's to be expected
and proabably acceptable. So, I agree, this would be the best
solution by far. You'd have a download that isn't bloated becuase of
duplicate files. You'd have an extraction that isn't bloated on
filesystems that support it. The only issue is extraction filesystems
doesn't support it (Windows XP / NTFS doesn't, I believe), but I
think we can all live with that.

It appears the 'link' command in perl can make hardlinks, but it
looks like it will fail if a hardlink cannot be made (or, maybe it'll
just copy?), so this will need to be taken into account in the perl
packaging scripts. 

Also, this does place the requirement that packages be generated on a
filesystem that support's this. Is this acceptable to the openclipart
Release Manager, whoever he may be?

Regading the data definition, I definitely agree that having the data
in the images is the highest priority. And I also agree that doing a
data definition now will likely result in changes. But, right now
there is NO data definition for certain things that are critical to
write the application I'm proposing. For example, I previously had
discussions about transforming the .idx in the package to xml, as
well as a future planned 'hierarchy-en.xml' file. The hierarchy file
may not be necessary for the clip organizer app, by an index
certainly would be of great help (especially since it's already
there, but in perl form). Otherwise, importing a openclipart package
into a client app will take FOREVER, being that every individual file
would need to be read and parsed.

Again, thanks for clarifying the hardlink thing - I think that's a
perfect solution...

mike

--- Andrew Archibald <andrew.archibald at sympatico.ca> wrote:

> Mike Traum wrote:
> > Andrew,
> > I understand the issues you raise with the big xml file proposal.
> > But, I don't think symbolic links will work, and I do think that
> hard
> > links is a very messy solution to the problem. It doesn't scale
> well
> > - what happens if openclipart has 50,000 images? So many
> duplications
> > in the package will just end up in bloat.
> 
> What's messy about hard links?  They're a pretty good approximation
> to 
> shared copy-on-write; no duplication at all, in fact having the
> same file 
> in multiple directories is what they're for.  I have no idea what
> Windows 
> thinks of them, though (haven't used Windows in a long time).
> 
> > How about a flat file structure with no path whatsoever? I think
> this
> > would make the most sense.
> 
> That's nearly as useless for people using a file browser as un one
> big XML 
> file.  Perhaps each package could single out one keyword as the
> most 
> important (if the package doesn't specify, just pick one at random)
> and it 
> could wind up in the corresponding category directory.  Or when the
> 
> categories are made (essentially from search queries) they could be
> given a 
> (perhaps implicit) priority.
> 
> That last suggestion in detail:
> When making a localized package, one specifies a category tree;
> each 
> category is specified by a set of keywords (or more generally, by a
> boolean 
>   metadata expression, regular expression, etcetera).  The file
> gets dumped 
> in the directory corresponding to the first matching category
> (files that 
> fall through go in the root "miscellaneous" category).  A big XML
> catalog 
> goes in the root directory as well.  An interactive tool, then,
> simply 
> presents images as being in all matching categories.  A file
> browser sees 
> the files in an appropriate category (but only one).  The system
> could even 
> include (optional) hardlinks so a file appears in all the category 
> directories where it belongs.
> 
> Keep in mind that OCAL and other SVG clipart collections could
> reach the 
> multi-gigabyte mark, with hundreds of thousands of files - imagine
> a street 
> maps collection, or mass-conversion of multiple-CD professional
> clipart 
> collections.
> 
> A suitably designed collection format could accomodate offline
> storage as 
> well - the index is online, but when you pick an image it tells you
> "That 
> image is on CD #347, go get it".
> 
> > Regarding the application I'm proposing, sure, I'll be able to
> > support pretty much anything. But, this all seems to be up in the
> air
> > right now, and I'd like to see some data definitions of proposed
> xml
> > files and a roadmap on the package structure before I start a
> project
> > based on all of that.
> 
> There's really no hurry to design an all-encompassing database
> design. 
> We'll throw the first one away anyway.  We should just be careful
> to record 
> enough information in the images that we won't have to go through
> and 
> manually change all of them later.  The whole point of using XML is
> so if 
> we change something, it's easy to write a hack that reads the one
> format 
> and writes the other.
> 
> Andrew
> _______________________________________________
> clipart mailing list
> clipart at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/clipart
> 

__________________________________ 
Do you Yahoo!? 
Plan great trips with Yahoo! Travel: Now over 17,000 guides!
http://travel.yahoo.com/p-travelguide