[Clipart] Document::Manager 0.12

Fri May 20 13:35:57 PDT 2005

On Fri, May 20, 2005 at 01:52:35PM -0400, Nathan Eady wrote:
> >   * Tarball generation - I want to be able to provide the same
> >     functionality our current release scripts have, as a minimum.
> 
> If there's checkout, we can get by without tarball/zipfile generation
> initially, doing them from the command-line (as we have been anyway)
> until they are implemented.

Will it be enough if I can dump all .svg's of a given state into a
single dir?  Do we have other tools for organizing them into a
hierarchy?  That's the hard part here.

> >   * Per-document commenting - Just a really basic mechanism to append
> >     to a log of comments on a given image, to give us some
> >     proto-tracking capabilities.
> 
> Since our current system doesn't have this, we could go ahead and
> adopt the DMS before this is implemented, IMO, and add it subsequently.

Okay.  I think it can be done fairly easily though, and I imagine would
be pretty handy.

> >   * Querying.  I want to extend ls_docs to allow more powerful
> >     searching of the repository, so folks can create canned scripts to
> >     do things like, list all svg's for a given artist, list the most
> >     recent svg's, etc.
> 
> Again, our current system doesn't do this (except for keyword
> searching), so we would not be losing any functionality if we
> adopted the DMS before these features are working.

Okay.  This one is a bit harder to do, so leaving it for later will save
some time.

> >   * Optimization.  I have found that dms works ok with <100 items in
> >     it, but it isn't scaling well.  
> 
> Okay, that would be a roadblock for our adoption.  There are *well*
> past 100 cliparts in our collection now :-)

Yeah.  Fortunately, when using it from the commandline you can just fire
off the script and go watch a movie. ;-)  Using it from the web would be
rather more annoying...

One of the causes seems to be that the daemon is calculating certain
things list "last document id" on a per-usage basis, rather than doing
it only once at start up.  I think the scalability problem will
disappear once I sort this out.  I'll put this high on the todo list for
this weekend.

Also, I've noticed that the performance is much better when running
locally on my box than when running between my box and the fdo server,
so perhaps for the web interface (which will run client/server locally
on the fdo system), I'll also see a similar performance boost.

> > I did a test of inserting all of
> >     April's library, and while I was able to easily submit all the
> >     docs, doing operations with that many items was unacceptably slow.
> 
> It may be noted that some operations that we do with our current system
> take significant time to process the whole collection.  For instance,
> running the clipart-authority-control script over the whole collection
> takes about 10-15 minutes on my system.

*Nod*  My performance issue is more due to a bug than anything else; by
design, non-searching operations on dms should always be O(1).

What does the clipart-authority-control script do?  Is it something that
could be made faster by taking advantage of dms' state tracking?  I.e.,
do a query on dms for documents of state 'new', then do something, and
then tell dms to set the state to something different?

My hope is that the above algorithm can be used in a lot of our scripts,
so that each one can focus on doing its one thing, but share its results
back to the central repository for other scripts to do.

> If searching operations are slow, that can be solved by keeping an
> index.
>
> But yes, to fully adopt the DMS, it needs to be able to do some
> operations quickly -- e.g., for browsing it needs to be able to
> return all the items in a given category, quickly.

*Nod*  Longer term I plan on putting copies of the metadata into a
database for faster querying. 

> Yes, but once that's working we can build a CGI/XHTML interface for it.

Yup.

> > Another important
> >thing I intend to begin working on in the near term is a CGI interface.
> >I've done some preliminary experimentation with Perl's CGI::Builder
> >module and think I can come up with a good system using it.
> 
> The CGI interface, in theory, should be the easy part, at least to get
> something basic working initially.

Agreed.  If I put together a really simple proof of concept cgi, would
you be interested in taking that on?

Bryce