[Clipart] OpenClipart 0.04 browsable

Wed Jun 30 10:43:19 PDT 2004

On Wed, 30 Jun 2004, [iso-8859-1] "Áki G. Karlsson" wrote:
> > By the way, remember that we're constructing this based on keywords
> > moreso than fixed categories.  So a given image can have both 'Dinosaur'
> > and 'Animal' keywords.  We would 'parent' the Dinosaur category to
> > Animal to indicate the relationship.  Then we'd just need to figure out
> > how to make the browser tool aware of parentage...
> 
> Of course, this is the right way to do things, but the browser would have 
> to have some limit as to the keywords it is sensitive to, right? Or else 
> the keywords would have to be restricted in the files themselves. Maybe 
> the browser tool could parse a simple xml file with a predefined keyword 
> hierarchy into an array and then match the keywords with the keywords in 
> the file? 

This is a good approach I think.  We should have a separate tool take
care of generating the categorization hierarchy.  The more we can
segment and modularize the tools and place less burden on any one, the
more maintainable and reusable our tools will be.  Huge tools that do
everything under the sun tend to have too much code in them to be easy
for a new person to learn, and can be tricky to add new features to.
"Each tool should do one thing, and do it well," as they say.

Anyway, so given that philosophy, it sounds like what we need is a
script that generates an XML file with a predefined keyword hierarchy
based on the contents of a set of SVG files.  Anyone feel like taking
this one one?

An advantage to this approach is that we can generate multiple of these
hierarchies for different situations, such as filtering out certain
content, or prioritizing things that a user cares more about (e.g., an
artist creating a flyer for a band will have different clipart
priorities than an office worker making a presentation).  Theoretically,
this would also enable users to edit and customize their own hierarchies
if they wish (given a decent XML editor).

The algorithm to generate this tree from the set of svg files would
presumably work by looking for keywords that have no parents, listing
those first, then for those that do have keywords, link them up to their
parent, and generate a tree data structure.  I wrote a perl module that
does something akin to this (Sort::Tree on CPAN), that might work for
this too.

> Anyway, here's an updated list. I also think maybe 'Countries' should be a 
> top-level category. I changed the clothes to costumes, intending images of 
> people wearing certain types of clothing, rather than clothing articles by 
> themselves.

> Animals
> 	insects
> Computer
> Countries
> 	maps
> 	flags
> Food
> 	fruit
> 	vegetables
> People
> 	costumes
> 	professions

We also need 'body parts' here for things like the intestines, hand,
etc.  

> Shapes
> 	jigsaw
> 	arrows
> 	callouts
> Signs
> 	icons
> 	roadsigns
> 		servicesigns

We'll also want to tag the road signs with 'Country:UK' and the
servicesigns with 'Country:Scandinavia'.  Hmm, although Scandinavia
isn't a country but more of a political region...
'Political:Scandinavia'?  I don't know...

> 	smileys
> 	logos
> 	symbols
> 	weather
> 	media
> Structures
> 	homes
> 	cars
> Things
> 	clothing
> 	office

Looks good.

Bryce