Jonadab the Unsightly One
jonadab at bright.net
Wed Jun 23 05:57:39 PDT 2004
> * If the string passed in has two or more newlines, it's assumed to
> be a text string and parsed as that.
> * If it's an open IO::Handle, it's parsed thataway
> * If begins with 'http' or 'ftp', it's assumed to be a URL, and
> downloaded and parsed.
> * Otherwise, it's assumed to be a filename and loaded from the file
> I wonder whether it is better to have it a single parse() function that
> figures out what to do, or to break it into four separate parsing
> functions and require the caller to know what type of input they want.
> Got a preference?
The only caveat I have to add is that some XML-generating programs do
not include any optional whitespace, including newlines. OpenOffice
does not by default, for example (though you can get it to do so by
changing an option). I don't know about any existing SVG editors'
doing this, but it is entirely possible that in the future some SVG
editor might make output that contains only one newline after the XML
There are two reasonable ways I can think of to handle that: let the
calling code specify whether it's the data or a filename, or else
be clever about figuring out which it is -- e.g., if it has an XML
declaration assume it's the data, otherwise if a -e filetest returns
true assume it's the filename, otherwise try to parse it as the data,
and if you can't do so, throw an error. This will miss some extreme
pathological cases such as where a user has a filename that includes
an XML declaration, if you can imagine anyone doing such a thing.
(Actually, that's only even possible on certain systems; Windows for
example does not allow filenames to contain some of those characters,
as well it should not IMO.)
Also you should probably add https to the list of protocols, or at
least print a good error message explaining that that scheme isn't
supported. (Does LWP support https transparently? I'd be surprised
if not, though I don't actually know.)''
As far as whether to use one clever function or several functions
for each possible scenerio (which presumably call one other function
to do the actual parsing), I'm ambivalent about that. HTML::Tree
uses separate functions, and that works fine, but if you can do
the detection well, the one-function approach will also be fine.
More information about the clipart