[Clipart] Document::Manager 0.12

Mon May 23 12:55:15 PDT 2005

On Mon, May 23, 2005 at 02:05:43PM -0400, Nathan Eady wrote:
> Bryce Harrington wrote:
> >On Fri, May 20, 2005 at 01:52:35PM -0400, Nathan Eady wrote:
> >
> >>> * Tarball generation - I want to be able to provide the same
> >>>   functionality our current release scripts have, as a minimum.
> >>
> >>If there's checkout, we can get by without tarball/zipfile generation
> >>initially, doing them from the command-line (as we have been anyway)
> >>until they are implemented.
> >
> >Will it be enough if I can dump all .svg's of a given state into a
> >single dir?  Do we have other tools for organizing them into a
> >hierarchy?  That's the hard part here.
> 
> Yes, with the qualification that we'll need to write a quick tool,
> but the quick tool in question will be Grade-A Easy to write, given
> what we already have in our toolbox.  I can do it even if I'm fairly
> busy, because I can do it in the evening when my brain is not up to
> anything complicated.  In fact, basically, it's just this:
> 
> my %bighash = the one we already have in convert-release-to-browse;
> for $file (<*.svg>) {
>    my $svg = SVG::Metadata->new();
>    $svg->parse($file);
>    my @destination = map { $bighash{$_} } $svg->keywords();
>    for $dir (uniq sort @destination) {
>       copy $file catfile($dir,$file);
>    }
> }
> 
> Some of that is pseudocode, but it's a matter of fifteen minutes
> to make it run and test it lightly.

Excellent.  

> >One of the causes seems to be that the daemon is calculating certain
> >things list "last document id" on a per-usage basis, rather than doing
> >it only once at start up.  I think the scalability problem will
> >disappear once I sort this out.  I'll put this high on the todo list for
> >this weekend.
> 
> Are you familiar with the orcish manoeuver?  Using that you can avoid
> the need to do something ahead of time at startup  but still do it
> only once.

Well, the issue is not the speed of the algorithm but rather the ability
to cache between calls.  SOAP::Lite doesn't seem to make it trivial to
just have a static or global variable, so I just need to figure out a
sneaky way to cache stuff in the daemon.

> >Also, I've noticed that the performance is much better when running
> >locally on my box than when running between my box and the fdo server,
> 
> Interesting.  Do you suppose this is a bandwidth issue, or latency,
> or something else?  (If it's bandwidth or latency, I could be in for
> some long waits, as I'm on dialup.)

It's possibly a latency issue; I've noticed some bad traceroutes to fdo
from my machine in the past (which is one reason why I do most
development locally).

> > Is it something that could be made faster by taking advantage
> > of dms' state tracking?
> 
> It could be made a good deal faster if the DMS keeps up-to-date
> indeces of the metadata.  That's an optimization we presumably
> can do once we get the basic functionality working.  Actually,
> most or all of the functionality of this script is stuff that
> long-term should become part of the DMS's functionality eventually.

Okay, great.  This weekend I banged out much of the keyword handling.
There are two routines keyword_add() and keyword_remove().  They take
one or more keywords and do the respective operation.  Currently they do
not actually modify the .svg itself, but that should be fairly simple.
I also created a new commandline tool that takes one or more document
ID's and prints all of the properties, including keywords.  I also have
a script ls_docs that prints out a list of documents, one line per doc.
It retrieves the keywords but doesn't print them out (since that'd make
the lines too long).

At this point it would not be difficult to create a script that works
like ls_docs but instead of printing out a line, calls keyword_remove()
and keyword_add() to rename the keyword.

Both keyword_add() and keyword_remove() do some normalization.  They put
the keywords into lowercase and remove any leading or trailing spaces.
If there is more normalization that should be done (such as stripping
out invalid characters or whatever), we could add that in.  However, for
more aggressive normalization (such as fixing pluralization), I think
it's probably best to keep that logic out in cmdline scripts, since I
suspect those'll be a lot easier to hack on and maintain separately from
dms itself.

> >>The CGI interface, in theory, should be the easy part, at least to get
> >>something basic working initially.
> >
> >Agreed.  If I put together a really simple proof of concept cgi, would
> >you be interested in taking that on?
> 
> Quite possibly.  Although I may be fairly busy in June.  It's also the
> sort of thing we might be able to share among several people, long term.

Okay, I started roughing together some code based on CGI::Builder.  I
can populate it with a few soap calls as examples.  I think it should be
pretty straightforward to build off of.  What CVS should I check it
into?

Bryce