[Clipart] Is anyone working on categorizing the existing images?

Jon Phillips jon at rejon.org
Thu Jun 10 10:34:55 PDT 2004

I agree. IMO the best approach is definitely this emergent keyword
system, but think that part of our job as "librarians" is to encourage
the community to help comment on the graphics and possibly add to the
metadata. We could do this by both having this pool of new clipart with
some form of low barrier reputation and comment system on submissions.
Possibly, fields from the metadata could even be tweaked more by users
beyond requiring the initial submitter (author or group of authors) to
fill in required fields for their submission.

If we consider that basically one person's categorization of the clipart
repository would be like this Microsoft site's basket system
(http://office.microsoft.com/clipart/ - which I think is brilliant),
then we could have like Bob's clipart, Jim's clipart, etc, where the
emphasis on different people's preferences (like iTunes playlist) is
promoted. Therefore, if I was interested in these areas, logos, cities,
new york, then my selections would represent my interests.

Therefore, if we wanted to release packages, we could have an official
Librarian (Open Clip Art Project) package that would be some standard
categorization scheme that we could learn from an established project
like Yahoo or WIKIpedia, but this should not deny the importance of
allowing users of the system to be able to chart through and identify
their own favorite selections.

I mean, what is cooler, Google or Yahoo's category system??? I think
that the importance of empowering the user should be key, as this shift
to user's choice is primed.

As I posted before, I think our goal should be to tend to data and
encourage the growth of the community supporting the data, and not to
explicitly harvest and develop pre-defined categories. Also, our
releases are not standard releases like a software project, but rather
our releases should coincide with the development of the system of
dealing with submissions, which currently is web-only (but will expand).
I do think though that Bryce is totally right on on release early,
release often, and also us putting out official packages...I just want
us to consider the difference in running this as an Open Source project
vs. Open Content project.

Anyway, I need to pick up a shovel and do some development because I've
talked more than I've worked on this project in the last few.

Thoughts? I'm heading over to look at the roadmap. Time to sign up for
some tasks.


PS: So check this out. I'm going to be on a panel in NOVEMBER in at the
sfmoma.org on Open Source Art it seems. Thus, the panel is going to
consist of myself, Greg Niemeyer (a prof and artist at UCBerkeley), and
then someone from Creative Commons). I talked up the project last night
and I think we are so on the right path with how we are dealing with
this project. Everyone I talk to about it is interested both in metadata
and then the search systems and how we handle keywords. Also, I've
talked to several other people who want to adapt our system when in
place to other types of media like photos and video.

On Thu, 2004-06-10 at 04:46, "Áki G. Karlsson" wrote:
> ECMAScript tree widgets can surely be found with free licenses... In my 
> limited experience, they are a terror to write from scratch.
> The main thing IMHO is to separate image metadata keywords completely from 
> website/package categorization. The website categories would simply be 
> database options, possibly selectable by users at upload, but would still 
> require a bunch of moderators for classifying unclassified, badly 
> classified etc. images. Moderators is required in any case.
> I still think there should be a metadata recommendation somewhere that 
> would encourage authors to use keywords that facilitate searches. The site 
> should ideally provide a search form that looks at embedded metadata. It 
> could even list popular/saved/recent searches that would then form a sort 
> of ad-hoc "category".
> IMHO we need to keep our classification simple and straightforward. As it 
> is only a reality in the db it can be easily altered later. It may make 
> sense (given a sufficient number of images) when going into "Animals" to 
> be able to select wether you want "Birds", "Mammals", "Lizards" etc. but 
> going into "Birds" you *don't* want to have to select wether you want 
> "seabirds" "flightless birds" "birds with a half-eaten sardine in their 
> beak" etc - _even_ if the number of similar images might suggest it. That 
> is a borgesian nightmare to be avoided. A maximum of two nested levels 
> should be established I think, and the categories should be decided al 
> monte, and be traditional and of immediate understanding. For greater 
> flexibility images should be classified with more than one category 
> wherever appropriate.
> It might even make sense to have still-empty categories for stuff that the 
> developers think is lacking in existing packages , in order to encourage 
> artists to contribute such images. (a sort of fill-in-the-blanks 
> psychology - I'm a sucker for it myself always :).
> That's my opinion, anyway... :)
> Best regards
> Áki
> On 09 Jun 2004 23:31:33 -0400, Jonadab the Unsightly One 
> <jonadab at bright.net> wrote:
> > Bryce Harrington <bryce at bryceharrington.com> writes:
> >
> >> We need a mechanism (including web-based) that allows:
> >>    * Looking through tree of existing categories
> >>    * Browse uncategorized images
> >>    * Designate categories for images
> >>    * Generate XMP for an item
> >>    * View the categories for a given image
> >>    * For a given category get a list of images and subcategories
> >
> > Yes.  The exact details of the interface aren't deeply important.
> >
> >> > Are you talking about having the person who submits the image give
> >> > it a tentative category, or just marking it as uncategorized?  (I
> >> > can go either way on that...)
> >>
> >> Yes, I was thinking of allowing the submitter to designate a
> >> 'tentative category'.  Or categories.
> >>
> >> Perhaps the ideal would be to give them a list of checkboxes of
> >> available categories to choose from, with a "fill in the blank" at
> >> the bottom.  Or maybe that'd turn into too many checkboxes...  Maybe
> >> provide some sort of navigational system for assigning increasingly
> >> finer subcategories.  Hmm.  Ideas?
> >
> > If we can bring ourselves to use client-side scripting, selecting a
> > general category (e.g. by checking a checkbox) could cause a set of
> > subcategories to appear under it.  Users with scripts disabled could
> > still specify a general (top-level) category.  The client-side ECMA
> > script could be generated from the database (along with the rest of
> > the page) by the server-side stuff.
> >
> > An alternative is to make a round-trip to the server for each level in
> > the category heirarchy, which would expend a lot of user time and so
> > seems like the greater evil, IMO.
> >
> > Or we could put a tree of category choices in one frame and let the
> > user fill in the blank with one of them -- or users with client-side
> > scripting enabled could click on one of them and have it automatically
> > filled in.  This would work decently well with the keyword approach,
> > but a notable thing about it is that users would be able to type in
> > things that don't match any of the extant categories.  (That could be
> > construed as a feature or a bug, depending on your perspective; if
> > it's a feature, then the other options could add a fill-in "other" as
> > one of the choices.)  But having the list of existing categories there
> > would hopefully cut down somewhat on the synonym-category problem.
> > Hopefully.
> >
> > There's probably another option if those are all too odious.
> >
> >> Oh, supercategories -- interesting idea.
> >>
> >> Use Case
> >>    1.  Several cooked turkey images are uploaded
> >>    2.  The turkey images are assigned various keywords:
> >>        2 turkeys have keywords = "Thanksgiving"
> >>        1 turkey have keywords = "Holiday"
> >>        3 turkeys have keywords = "Food"
> >>        2 turkeys have keywords = "foods"
> >>        2 turkeys have keywords = "Food" and "Holiday"
> >>    3.  User identifies "Holiday" as a supercategory of "Thanksgiving"
> >>        System adjusts:
> >>        3 turkeys have keywords = "Holiday::Thanksgiving"
> >>        3 turkeys have keywords = "Food"
> >>        2 turkeys have keywords = "foods"
> >>        2 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
> >>    4.  User specifies "foods=>Food"
> >>        3 turkeys have keywords = "Holiday::Thanksgiving"
> >>        5 turkeys have keywords = "Food"
> >>        2 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
> >>    5.  User adds "Food" and "Holiday::Thanksgiving" keywords for all
> >>        turkeys.  So:
> >>        10 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
> >
> > Something like that.  And then somebody comes along and decides that
> > there are a bunch of cooked turkeys and they deserve their own
> > category, so he makes a CookedTurkeys category, gives all the cooked
> > turkey images *that* keyword, and then marks both Food and
> > Thanksgiving as supercategories for it.
> >
> > Ideally, the act of marking Foo as a supercategory of Bar should mean
> > that any image already designated as both Foo and Bar would then only
> > be listed as Bar (and any other keywords it might have, but not Foo),
> > since Foo is automatic, being a supercategory of Bar.  We could start
> > out with a system where we do some of this manually and add some of
> > the automatic processing as we go along and as it becomes necessary.
> >
> > Alternately, even better, it might be convenient when looking at a
> > list of images in the Foo category to be able to check several of them
> > and split them into a subcategory, Foo::Bar.  In terms of the
> > keywords, this would create a Bar keyword, change the keywords on the
> > marked images to remove Foo and add Bar instead, and in the categories
> > table add the Foo keyword to the Bar category.  Again, this is a
> > feature that could be added later.
> >
> > So then when we want to get a list of all Foo images, we get the list
> > of all the _other_ Foo images, plus a Bar subcategory we can click on
> > to see those.
> >
> > Is that too confusing?
> >
> >> So sounds like something like this:
> >>
> >> CREATE TABLE image (
> >>     id              INT NOT NULL AUTO_INCREMENT,
> >>     uri             VARCHAR(255),
> >>     author          VARCHAR(255),
> >>     source          VARCHAR(255),
> >>     format_id       INT
> >> );
> >
> > Do we want to keep date submitted in the db?  Or can we get that from
> > the uri e.g. by feeding the relative pathname to a filetest operator?
> > Or does it not even matter?
> >
> >> CREATE TABLE format (
> >>     id              INT NOT NULL AUTO_INCREMENT,
> >>     name            VARCHAR(255)
> >> );
> >>
> >> INSERT INTO format (id, name) VALUES
> >> ( 1, "svg" ),
> >> ( 2, "jpg" ),
> >> ( 3, "wmf" );
> >
> > We might ought to add PNG to that list.  Reducing all bitmapped
> > clipart to JPG is IMO not the way to go.  Consider that clipart often
> > gets inserted into fliers and things and printed with a four-color
> > process at 600dpi or higher; you don't want to introduce lossy
> > compression if you can avoid it.  (If the source materiel is that way
> > already, se la vi.)  There's also the issue of the alpha channel,
> > which JPG doesn't have, and which is exceedingly useful for
> > anti-aliasing bitmapped clipart that might not end up on the same
> > color of background every time.
> >
> > I'm *tempted* to say add XCF too, because a properly layered image is
> > *much* easier to modify, but I suspect most users in the target
> > audience wouldn't know what to do with those anyway.  (I sure would,
> > but I'm an oddball maybe.)  And most users of clip-art don't want to
> > modify it.  So XCF is probably unnecessary bloat.
> >
> > But definitely add PNG; if you're going to support just one bitmapped
> > format, that's the one.  If the user's going to use it where quality
> > doesn't matter, they can JPEG compress it themselves, scale it down to
> > 16x16 pixels, reduce the color depth to 4 bits, whatever.  Sure, there
> > are a lot of users who don't know how to do these things, but they
> > also don't know or care how many bytes their images take up; people
> > who are likely to care about that know what to do about it.
> >
> >> CREATE TABLE category (
> >>     id              INT NOT NULL AUTO_INCREMENT,
> >>     name            VARCHAR(255),
> >>     description     TEXT
> >> );
> >>
> >> CREATE TABLE category_to_image (
> >>     category_id         INT,
> >>     image_id            INT
> >> );
> >>
> >> CREATE TABLE category_inheretance (
> >>     category_id         INT,
> >>     supercategory_id    INT
> >> );
> >>
> >> CREATE TABLE category_synonym (
> >>     category_id         INT,
> >>     synonym_category_id INT
> >> );
> >
> > I would have designed this differently, but my design is almost
> > entirely functionally isomorphic to yours, so most of the differences
> > are not important.  Probably the biggest difference is that I would
> > have had fewer tables with more fields each, but it's not a big deal.
> >
> >> > With a simple db library like Class::DBI, this could probably be
> >> > tossed together well enough to start using it in a couple of hours.
> >>
> >> I've used DBI quite a bit but not Class::DBI.  If you got the ball
> >> rolling, though, I'm game to give it a go.
> >
> > I've actually not used Class::DBI yet either, but I keep meaning to.
> > I initially didn't know about it and so I rolled my own roughly
> > isomorphic (albeit not object-oriented) solution, but rather than
> > maintain that I've been meaning to migrate to Class::DBI.
> >
> >> > Given that there are only a couple of quite simple tables in the
> >> > database, it could even be flat files, or maybe DBD::SQLite.
> >>
> >> Nah, I've talked with the freedesktop admin about db's and they said
> >> a mysql db would be no prob to set up (in fact, it's on my todo list
> >> to give him the info to set up one for a bug tracker for us.)
> >
> > Ah, I've used MySQL at work for some things, and it's plenty good
> > enough for what we're doing.  It's almost overkill, even, but of
> > course you can never have too much overkill.
> >
Jon Phillips
Graduate Researcher
Visual Arts Department

PO BOX 948667

jon at rejon.org

More information about the clipart mailing list