[Clipart] Metadata

Bryce Harrington bryce at bryceharrington.com
Wed Jun 23 11:11:53 PDT 2004


On Wed, 23 Jun 2004, Jonadab the Unsightly One wrote:
> > I wonder whether it is better to have it a single parse() function that
> > figures out what to do, or to break it into four separate parsing
> > functions and require the caller to know what type of input they want.
> > Got a preference?
> 
> The only caveat I have to add is that some XML-generating programs do
> not include any optional whitespace, including newlines.  OpenOffice
> does not by default, for example (though you can get it to do so by
> changing an option).  I don't know about any existing SVG editors'
> doing this, but it is entirely possible that in the future some SVG
> editor might make output that contains only one newline after the XML
> declaration.

Hmm, true...  We'll have to keep an eye out for that.  OOo doesn't
generate SVG currently afaik, but you're right that other apps may
behave differently.

> There are two reasonable ways I can think of to handle that:  let the
> calling code specify whether it's the data or a filename, or else
> be clever about figuring out which it is -- e.g., if it has an XML
> declaration assume it's the data, otherwise if a -e filetest returns
> true assume it's the filename, otherwise try to parse it as the data,
> and if you can't do so, throw an error.  This will miss some extreme
> pathological cases such as where a user has a filename that includes
> an XML declaration, if you can imagine anyone doing such a thing.
> (Actually, that's only even possible on certain systems; Windows for
> example does not allow filenames to contain some of those characters,
> as well it should not IMO.)

*Nod*  Sticking all of the options in the one parse() routine was the
least work way to go, but maybe separate functions would be better in
being able to avoid all these odd corner cases.

Also, I've noticed that processing a big collection of files (such as
all the svg's submitted so far), takes a while, so separate function
calls could allow for better optimization, since I wouldn't need the
regexp's to check for newlines, etc.

> Also you should probably add https to the list of protocols, or at
> least print a good error message explaining that that scheme isn't
> supported.  (Does LWP support https transparently?  I'd be surprised
> if not, though I don't actually know.)''

Sorry, I should have said http* and ftp*, so https should be covered
too. 

Thanks,
Bryce




More information about the clipart mailing list