[Clipart] Change needed for how openclipart is managed

Bryce Harrington bryce at bryceharrington.com
Fri Mar 25 01:20:20 PST 2005


On Wed, Mar 23, 2005 at 11:47:02PM -0800, Jon Phillips wrote:
> On Wed, 2005-03-23 at 18:02 +0100, Christian Fredrik Kalager Schaller wrote:
> > On Tue, 2005-03-22 at 15:24 -0500, Nathan Eady wrote:
> > > Executive summary:
> > > I agree in principle that we need a repository tool to help us manage
> > > the collection better -- but I don't think a source-code-oriented
> > > version-control system like CVS is the tool we need.
> > > 
> > > > So why can't we put the clipart collection into CVS or Subversion
> > > 
> > > CVS and Subversion don't have the repository features we need.  They
> > > don't support the metadata we need, among other things.  Fundamentally,
> > > they were designed for source code, and what we're managing isn't
> > > source code.
> >
> > In my opinion there isnt much of a difference between a XML SVG file and
> > a C file for instance. Our images are not binary blobs, but files with
> > plaintext source, like code.

Well, the crucial issue here for this particular point is the means of
editing.

With source code files, you typically edit the files with a text editor,
and in this situation diffs provided by CVS are useful for tracking and
assisting with the editing work.  I.e., if three people add some lines
of code to the same file, then they can be merged and issues resolved by
viewing the contents of the file in a text editor and adjusting as
needed.

With an SVG file, the principle means of editing would be with a
graphics program, not with a text editor.  It is true that you *could*
edit an SVG file, view diff lines, and review patch errors in a text
editor, but I don't think that would be a preferrable way of handling
it.  Thus, the features of a code VCS system like CVS don't buy you very
much, and in fact even though SVG is not a binary format, in essense it
shares similar issues to what binary files have, and the SVG files may
as well be binary blobs.

> > > Bryce is working on a document repository system to address this need,
> > > but it is not ready for use yet, as far as I am aware.
> > > 
> > >  > If we use CVS we could collaborate much better
> > > 
> > > I am not convinced of this.

Well, looking at CVS simply as a networked shared file system with
version control capabilities, CVS sort of looks like it could improve
things.  However, the features we really need - author level tracking,
per-document states and workflow, flexibility in hierarchical
re-sorting, and file-level filtering based on properties - are not
provided by CVS, and are really features more typically found with
document management systems.

> > > There are things a document repository can do for us, that our current
> > > system does not do, but CVS would not do them either.  For instance,
> > > any arbitrary author needs to be able to update their own contributions,
> > > but we cannot require all the artists to have CVS installed on their
> > > workstations, have CVS acounts on the freedesktop.org server or, for
> > > that matter, know what CVS is.
> > Getting a system long term which lets artists maintain their own artwork
> > might be fine. But such a system is probably quite hard to create in the
> > sense that there are a lot of considerations that need to be taken into
> > account. Short term using CVS instead of the current manual setup could
> > at least make more of us able to work efficiently as people like myself
> > could accomplish my goal without being dependent on others.
> 
> Right, well, we have not really considered this as interfaces to CVS are
> not that great. It would be better for us to put work into Bryce's
> Document Management System (DMS). Unfortunately, we have all been so
> tied up with other work and jobs that it has been hard to do.

Yeah, I'm sure I've compounded the problem by taking the lead on
developing this when I have such a full plate already.

However, dms is fairly well along already.  Much of the underlying
framework is there and works, such as being able to check documents in
and out.  The properties handling has not been implemented, and that is
a pretty huge chunk, but the coding for it is not that complex.  Like
Jon said, with only a few days, a lot of it could be hacked into shape.
Unfortunately, whatever freetime I've been able to scrape together over
the past few months I've devoted to the Inkscape gtkmm conversion.

> > > Worse, CVS is not metadata-aware at *all*, so in some respects it
> > > would be an actual step backwards for us.
> > I am guessing I am missing something as you mentioned the metadata issue above too,
> > please explain.
> 
> Well, DMS is a general document management system that stores metadata
> exteranal from the document. SVG is diff. in  that it allows metadata
> internally. We want to support some other formats that will require
> metadata to be stored externally. Also, by storing a files metadata
> independent, it will allow other manipulations. I'm not the most fluent
> in the reasonings, but they are good. ;)

That's correct.  We will have things like screenshots, patterns, fonts,
and other non-SVG files, each of which may or may not support metadata,
so we can't always count on having it stored in the file itself.  Thus,
the need to have separate metadata files.

The idea is to parse and store that metadata into a read-only database
to enable fast querying capabilities on the data.  The metadata file is
the authoritative location of the info, but a database permits fast
access to it.

> > > I'm also not convinced it scales well enough.
> >
> > Our release tarballs contain a tons of png files for instance. Also
> > having the SVG files in a system like CVS will enable us to clean up
> > stuff as more people can clean.
> 
> Yes, you are right. There is so much cleaning that needs to happen and
> we right now are getting primarily first wave clip art. We really need
> to do cleaning, but this all waits on an easy way to do this.

Note that this can certainly be done with the system as it exists now,
by simply downloading the package, making the modifications, and then
uploading the changes back into the system.  In some ways, it is easier
to do this *without* CVS - it can be difficult to move or rename
directories or files under CVS, as an example.

> If you are serious about this, then we need to really push getting DMS
> ready for the primetime:
> http://openclipart.org/cgi-bin/wiki.pl?DocumentManagementSystem

I can imagine that this probably looks pretty intimidating on first
look, but I feel that this is among the best code I've ever written in
terms of being clean and well structured.  It's lacking features, but
with the structure I think it should now be straightforward how to add
them.

A lot of the features will require knowledge and ideas from the rest of
the team.  For instance, I've made some guesses about how people would
want to add things or edit existing things, but I think other developers
would have better ideas and insights into how it should work than me.

If others are interested in working on it, definitely count me in too.
I can set aside some time in the coming weeks for it, and with more
people working on it I think we could see some rapid progress with it.  

Bryce



More information about the clipart mailing list