[Clipart] SVG::Metadata re-engineering

Jonadab the Unsightly One jonadab at bright.net
Mon Apr 4 16:19:34 PDT 2005


Bryce Harrington <bryce at bryceharrington.com> writes:

>> >    return encode_entities($text);

> I don't have any pending changes, other than the patch that was
> posted here.  

Okay.  I'll work from that then.

> I'm figuring given that patch it'll be time to release a new version
> so if there are any other changes folks need, this'd be a good time
> to get 'em in.

I was hoping to look at that this week, if time allows.

> Btw, iirc, that encode_entities() bit was added in there at someone
> else's request (Jonadab?) to fix a different issue

That rings a bell.  I will grep my archives...

Relevant messages that I found in my outbox:

<y8e12cb6.fsf at jonadab.homeip.net>, in which, in response to an issue
   that Nicu raised, I explained the completely naive approach I had
   been taking to character set issues up to that point, and inquired
   whether encoding all non-ASCII characters as entities was a viable
   solution to the charset issues.

<wttj18io.fsf at jonadab.homeip.net>, in which, after Stephen responds
   that he doesn't think named entities will work in XML like they do
   in HTML, but he suggests numeric entities, I offer to test numeric
   entities to see if Inkscape can handle them okay.  I don't recall
   whether I ever actually did this testing.

Hmmm...  was it maybe in that patch I sent you way back when?  Let me
see if I can find that...  Hmmm...  It doesn't seem to be there.

Well, I can't find the specific message at the moment, but no matter.
I'm pretty sure the problem we intended to solve with this was the
character set issue, and it fundamentally isn't solving that.

Note, however, that unless I'm missing something, we still do need to
encode angle brackets, ampersands, and quote marks.  This can be done
by passing encode_entities a second argument that is a string
consisting of the characters that should be encoded, i.e., q('"<>&)

> (Maybe it had something to do with foreign characters in the
> metadata?)

I think it did.  But upon closer inspect I think that problem really
needs to be solved in the calling code, or perhaps even in other code
that it calls.  I'm now thinking perhaps getforminput() is really the
right place to solve this.

> I figure it should be easy to check it by just re-running it over
> the entire clipart package and see what turns up.

I don't think images that are already in and correct will have
problems with this.  It's the images that are failing that are causing
all the problems.

The best solution in technical terms would be to make the whole world
switch to English and do away with non-ASCII characters altogether,
but I think for political reasons we will have to go with some other
solution, even if it's a suboptimal workaround ;-)

In all seriousness, what I really need to do is hunt down a module
that converts charsets to UTF8.  Umm...  lesse...  Unicode::MapUTF8
seems to have something to do with that.  I'm putting its
documentation on my reading list.

-- 
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/




More information about the clipart mailing list