[Clipart] Comma-separated keywords in metadata

Bryce Harrington bryce at bryceharrington.com
Mon Jun 28 10:48:34 PDT 2004


On Mon, 28 Jun 2004, Jonadab the Unsightly One wrote:
> > The proper way to achieve this in RDF is with a 'rdf:Bag':
> >
> >  <dc:subject>
> >    <rdf:Bag>
> >      <rdf:li>yield</rdf:li>
> >      <rdf:li>sign</rdf:li>
> >      <rdf:li>yield sign</rdf:li>
> >      <rdf:li>street sign</rdf:li>
> >      <rdf:li>traffic sign</rdf:li>
> >    </rdf:Bag>
> >  </dc:subject>
> 
> It is possible to automate this conversion, assuming we don't have to
> manually check for commas being used in other ways in the subject.

With Perl, all such things are possible.  ;-)

> (Even then, a command-line tool could prompt for each file:
>
> Image: yieldsign.svg
> Subject: yield, sign, yield sign, street sign, traffic sign
> Convert to rdf:Bag? (Y/N):
> 
> Are there a lot of images with this issue, or just a couple? 

By and large, *most* people did not include metadata, so by definition
there are not a lot of images with this issue.  Of those that did
include metadata, most either used the tool on the CC site (which
doesn't have a field for subject), or my svg_annotate (which also lacks
the subject capability).  I don't know how many pieces of art included
the subject, but did see that there are some.  I think this is good, as
it gives us a starting point to work on for the categorization work in
this next release.

If we can get svg_annotate, the upload tool and Inkscape to support this
Bag-oriented approach, I expect we'll be able to take care of the
keywords for 99% of the contributions.  For that last 1%, a script may
be worthwhile, but we can probably leave it a lower priority.  If we
start getting lots of clipart with keywords in the subject, we can look
into it then; it's likely that the submissions would use irregular
formats (commas, space-delimited, semicolons...) so would need special
parser attention anyway.

Bryce




More information about the clipart mailing list