[Clipart] upload script suggestion

Jonadab the Unsightly One jonadab at bright.net
Tue Mar 22 06:30:10 PST 2005


"Jonadab the Unsightly One" <jonadab at bright.net> writes:

>> The only correct thing to do here is to use an XML parser.  
>
> If we have to, we have to, but I suspect it's not necessary.  I can
> certainly fix it so it doesn't match tags that end in />, much more
> easily than walking an XML tree.

Although, if there were an XML equivalent for HTML::Tree (with its
nifty look_down method), that would make it more reasonably doable.
However, I have searched the CPAN in the past for such a module, and I
did not find it.

> (A tag could theoretically have right brokets inside quoted
> attribute data, causing it to not be stripped out, but I suspect
> that the tags in question will not, in practice.)

Although it has occurred to me that nested rdf or metadata elements
would cause the opposite sort of breakage, wherein stuff that should
be removed is left in.  Something like this...
<rdf>
  <foo>
    <bar quux="wibble" />
  </foo>
  <rdf>
    <baz>Stuff that is not stripped out, and should be</baz>
  </rdf>
</rdf>

That seems pathological, and I'm not sure meta-metadata will ever
occur in practice, but it's worth considering.  It belongs to a class
of problem that is infamous in the Perl community, because Perl5
regular expressions have no easy solution for it.  (Perl6 regular
expressions can handle it, but we don't want to wait for that, I
think.)  If it turns out that we need to handle this case, we will
indeed have to go to using an XML parser.

Note, however, that an rdf element inside a metadata element (which
does occur, I think) is not a problem, since the three types of
elements are handled one at a time, not with foo|bar expressions.

-- 
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/




More information about the clipart mailing list