[Clipart] SVG Metadata

Jonadab the Unsightly One jonadab at bright.net
Thu Apr 7 19:15:56 PDT 2005


crschmidt at crschmidt.net (Christopher Schmidt) writes:

> Currently, the SVG metadata in SVG files on openclipart.org is
> broken RDF, as well as being slightly broken in a number of ways:
>
> First, there is a snippet of RDF like the following. This is invalid:
> <license rdf:resource="Public Domain">
>   <dc:date>28</dc:date>
> </license>
>
> This should be 
> <license rdf:resource="urloflicense" />
> <dc:date>date of copyright</dc:date>
> Which would cause the RDF to be valid. In fact, there is already an
> empty <dc:date> in the SVG file I'm currently looking at

I will look into that with my other changes.  I'll try to get parse to
read it either way, and to_rdf to write it the correct way.  Actually,
I think the dc:date field maybe was empty because of an unimplemented
thing that I have already fixed (in my local copy) when I was working
on some of the TODO comments.  For now I'll set it so that _date and
_license_date default to one another, the way creator and owner
already do.

> Note that the URL for the PublicDomain license is:
> http://web.resource.org/cc/PublicDomain , as per

The license-related code in to_rdf recognizes that URI as being the
same as 'Public Domain', but I don't think it currently changes one to
the other.  Should it?  Hmmm...  Seems like at least the rdf:about
attribute of the License element should be the URI, as about
attributes elsewhere in the RDF are used for URIs.

> Subjects right now seem to have some oddities. I see a subject on this
> SVG file is:
>
> <rdf:li>HASH(0x8989714)</rdf:li>

This is a known problem for some time now, that only happens with some
images.  We are not sure what causes it.  I am beginning to have an
inkling of a speculative idea about it, but I could well be wrong at
this point and need to investigate further.  I am not even certain
right now whether it is strictly reproduceable, in the sense of always
happening given certain inputs.

> I'm not sure if the metadata is generated using the SVG-Metadata perl
> module: 

Yes.  That module is normally maintained by Bryce, but right now I am
in the middle of making some changes to it, mostly to create a to_svg
method for inserting the metadata back into the XML::Twig object, but
while I'm at it I'm killing off some TODO items.

> Personally, I'd probably not put the subjects in a bag: the current
> movement in RDF is to move away from explicit containers in favor of
> multiple predicates with the same subject, but I'm not versed in SVG
> metadata, so that might not be the common way to do it in the SVG
> world.

Without a concrete reason, I don't know that it's necessary to change
that.  Multiple subjects would work, but the bag works too, and I
think we are not the only project using the metadata in this format.
(In particular, I think Inkscape does also.)  If other projects were
using it the other way, of course, we would consider changing to unify
and standardize on one way of doing it, but otherwise, my feeling is,
if the way we're doing it is not invalid, let's not change it.

> Third, the creator and rights holder have an rdf:about="" attribute
> which is blank.

Those right now are always blank due to two of the TODO items, which
I'm fixing up (so that they will not have to be blank now), but...

> This should either point to a URL of the creator and rights holder,
> or the entire rdf:about="" should be removed. 

I could do that, too:
   ($about_url) ? " rdf:about=\"$about_url\"" : ''
Then if these data are not supplied, they'll be omitted entirely.

> If the rights are assigned to openclipart.org, this should be used
> instead. 

Currently we are defaulting _publisher and _publisher_url to
'Open Clip Art Library' and 'http://www.openclipart.org/'
respectively, but we are defaulting the owner to the creator.
Although, since the items are placed into the public domain,
ownership seems largely moot.  Still, our intention is to track
that information as it was submitted to us, as much as possible.

> The dc:title of the <dc:rights> agent should be the person who
> created the work, rather than "Public Domain" (I think, although I
> may be acting as a lawyer there unintentionally.)

The dc:title subelement of the Agent subelement of the dc:rights
element is set to _owner.  It is possible some images were submitted
with "Public Domain" as the owner, but the way I read to_rdf, it is
putting the owner there, if an owner is known.  (If no owner is
specified, it defaults to the creator, as noted above.)

> Apologies if I come off rather harsh in this email: just trying to
> help the site generate the best metadata possible, because I'm going
> to start sucking it into an RDF tool I run.

Indeed, having good metadata is something we view as important, for
the manageability of the collection if for no other reason.

-- 
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/




More information about the clipart mailing list