[Clipart] Is anyone working on categorizing the existing images?

Thu Jun 10 04:46:36 PDT 2004

ECMAScript tree widgets can surely be found with free licenses... In my 
limited experience, they are a terror to write from scratch.

The main thing IMHO is to separate image metadata keywords completely from 
website/package categorization. The website categories would simply be 
database options, possibly selectable by users at upload, but would still 
require a bunch of moderators for classifying unclassified, badly 
classified etc. images. Moderators is required in any case.

I still think there should be a metadata recommendation somewhere that 
would encourage authors to use keywords that facilitate searches. The site 
should ideally provide a search form that looks at embedded metadata. It 
could even list popular/saved/recent searches that would then form a sort 
of ad-hoc "category".

IMHO we need to keep our classification simple and straightforward. As it 
is only a reality in the db it can be easily altered later. It may make 
sense (given a sufficient number of images) when going into "Animals" to 
be able to select wether you want "Birds", "Mammals", "Lizards" etc. but 
going into "Birds" you *don't* want to have to select wether you want 
"seabirds" "flightless birds" "birds with a half-eaten sardine in their 
beak" etc - _even_ if the number of similar images might suggest it. That 
is a borgesian nightmare to be avoided. A maximum of two nested levels 
should be established I think, and the categories should be decided al 
monte, and be traditional and of immediate understanding. For greater 
flexibility images should be classified with more than one category 
wherever appropriate.

It might even make sense to have still-empty categories for stuff that the 
developers think is lacking in existing packages , in order to encourage 
artists to contribute such images. (a sort of fill-in-the-blanks 
psychology - I'm a sucker for it myself always :).

That's my opinion, anyway... :)

Best regards

Áki

On 09 Jun 2004 23:31:33 -0400, Jonadab the Unsightly One 
<jonadab at bright.net> wrote:

> Bryce Harrington <bryce at bryceharrington.com> writes:
>
>> We need a mechanism (including web-based) that allows:
>>    * Looking through tree of existing categories
>>    * Browse uncategorized images
>>    * Designate categories for images
>>    * Generate XMP for an item
>>    * View the categories for a given image
>>    * For a given category get a list of images and subcategories
>
> Yes.  The exact details of the interface aren't deeply important.
>
>> > Are you talking about having the person who submits the image give
>> > it a tentative category, or just marking it as uncategorized?  (I
>> > can go either way on that...)
>>
>> Yes, I was thinking of allowing the submitter to designate a
>> 'tentative category'.  Or categories.
>>
>> Perhaps the ideal would be to give them a list of checkboxes of
>> available categories to choose from, with a "fill in the blank" at
>> the bottom.  Or maybe that'd turn into too many checkboxes...  Maybe
>> provide some sort of navigational system for assigning increasingly
>> finer subcategories.  Hmm.  Ideas?
>
> If we can bring ourselves to use client-side scripting, selecting a
> general category (e.g. by checking a checkbox) could cause a set of
> subcategories to appear under it.  Users with scripts disabled could
> still specify a general (top-level) category.  The client-side ECMA
> script could be generated from the database (along with the rest of
> the page) by the server-side stuff.
>
> An alternative is to make a round-trip to the server for each level in
> the category heirarchy, which would expend a lot of user time and so
> seems like the greater evil, IMO.
>
> Or we could put a tree of category choices in one frame and let the
> user fill in the blank with one of them -- or users with client-side
> scripting enabled could click on one of them and have it automatically
> filled in.  This would work decently well with the keyword approach,
> but a notable thing about it is that users would be able to type in
> things that don't match any of the extant categories.  (That could be
> construed as a feature or a bug, depending on your perspective; if
> it's a feature, then the other options could add a fill-in "other" as
> one of the choices.)  But having the list of existing categories there
> would hopefully cut down somewhat on the synonym-category problem.
> Hopefully.
>
> There's probably another option if those are all too odious.
>
>> Oh, supercategories -- interesting idea.
>>
>> Use Case
>>    1.  Several cooked turkey images are uploaded
>>    2.  The turkey images are assigned various keywords:
>>        2 turkeys have keywords = "Thanksgiving"
>>        1 turkey have keywords = "Holiday"
>>        3 turkeys have keywords = "Food"
>>        2 turkeys have keywords = "foods"
>>        2 turkeys have keywords = "Food" and "Holiday"
>>    3.  User identifies "Holiday" as a supercategory of "Thanksgiving"
>>        System adjusts:
>>        3 turkeys have keywords = "Holiday::Thanksgiving"
>>        3 turkeys have keywords = "Food"
>>        2 turkeys have keywords = "foods"
>>        2 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
>>    4.  User specifies "foods=>Food"
>>        3 turkeys have keywords = "Holiday::Thanksgiving"
>>        5 turkeys have keywords = "Food"
>>        2 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
>>    5.  User adds "Food" and "Holiday::Thanksgiving" keywords for all
>>        turkeys.  So:
>>        10 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
>
> Something like that.  And then somebody comes along and decides that
> there are a bunch of cooked turkeys and they deserve their own
> category, so he makes a CookedTurkeys category, gives all the cooked
> turkey images *that* keyword, and then marks both Food and
> Thanksgiving as supercategories for it.
>
> Ideally, the act of marking Foo as a supercategory of Bar should mean
> that any image already designated as both Foo and Bar would then only
> be listed as Bar (and any other keywords it might have, but not Foo),
> since Foo is automatic, being a supercategory of Bar.  We could start
> out with a system where we do some of this manually and add some of
> the automatic processing as we go along and as it becomes necessary.
>
> Alternately, even better, it might be convenient when looking at a
> list of images in the Foo category to be able to check several of them
> and split them into a subcategory, Foo::Bar.  In terms of the
> keywords, this would create a Bar keyword, change the keywords on the
> marked images to remove Foo and add Bar instead, and in the categories
> table add the Foo keyword to the Bar category.  Again, this is a
> feature that could be added later.
>
> So then when we want to get a list of all Foo images, we get the list
> of all the _other_ Foo images, plus a Bar subcategory we can click on
> to see those.
>
> Is that too confusing?
>
>> So sounds like something like this:
>>
>> CREATE TABLE image (
>>     id              INT NOT NULL AUTO_INCREMENT,
>>     uri             VARCHAR(255),
>>     author          VARCHAR(255),
>>     source          VARCHAR(255),
>>     format_id       INT
>> );
>
> Do we want to keep date submitted in the db?  Or can we get that from
> the uri e.g. by feeding the relative pathname to a filetest operator?
> Or does it not even matter?
>
>> CREATE TABLE format (
>>     id              INT NOT NULL AUTO_INCREMENT,
>>     name            VARCHAR(255)
>> );
>>
>> INSERT INTO format (id, name) VALUES
>> ( 1, "svg" ),
>> ( 2, "jpg" ),
>> ( 3, "wmf" );
>
> We might ought to add PNG to that list.  Reducing all bitmapped
> clipart to JPG is IMO not the way to go.  Consider that clipart often
> gets inserted into fliers and things and printed with a four-color
> process at 600dpi or higher; you don't want to introduce lossy
> compression if you can avoid it.  (If the source materiel is that way
> already, se la vi.)  There's also the issue of the alpha channel,
> which JPG doesn't have, and which is exceedingly useful for
> anti-aliasing bitmapped clipart that might not end up on the same
> color of background every time.
>
> I'm *tempted* to say add XCF too, because a properly layered image is
> *much* easier to modify, but I suspect most users in the target
> audience wouldn't know what to do with those anyway.  (I sure would,
> but I'm an oddball maybe.)  And most users of clip-art don't want to
> modify it.  So XCF is probably unnecessary bloat.
>
> But definitely add PNG; if you're going to support just one bitmapped
> format, that's the one.  If the user's going to use it where quality
> doesn't matter, they can JPEG compress it themselves, scale it down to
> 16x16 pixels, reduce the color depth to 4 bits, whatever.  Sure, there
> are a lot of users who don't know how to do these things, but they
> also don't know or care how many bytes their images take up; people
> who are likely to care about that know what to do about it.
>
>> CREATE TABLE category (
>>     id              INT NOT NULL AUTO_INCREMENT,
>>     name            VARCHAR(255),
>>     description     TEXT
>> );
>>
>> CREATE TABLE category_to_image (
>>     category_id         INT,
>>     image_id            INT
>> );
>>
>> CREATE TABLE category_inheretance (
>>     category_id         INT,
>>     supercategory_id    INT
>> );
>>
>> CREATE TABLE category_synonym (
>>     category_id         INT,
>>     synonym_category_id INT
>> );
>
> I would have designed this differently, but my design is almost
> entirely functionally isomorphic to yours, so most of the differences
> are not important.  Probably the biggest difference is that I would
> have had fewer tables with more fields each, but it's not a big deal.
>
>> > With a simple db library like Class::DBI, this could probably be
>> > tossed together well enough to start using it in a couple of hours.
>>
>> I've used DBI quite a bit but not Class::DBI.  If you got the ball
>> rolling, though, I'm game to give it a go.
>
> I've actually not used Class::DBI yet either, but I keep meaning to.
> I initially didn't know about it and so I rolled my own roughly
> isomorphic (albeit not object-oriented) solution, but rather than
> maintain that I've been meaning to migrate to Class::DBI.
>
>> > Given that there are only a couple of quite simple tables in the
>> > database, it could even be flat files, or maybe DBD::SQLite.
>>
>> Nah, I've talked with the freedesktop admin about db's and they said
>> a mysql db would be no prob to set up (in fact, it's on my todo list
>> to give him the info to set up one for a bug tracker for us.)
>
> Ah, I've used MySQL at work for some things, and it's plenty good
> enough for what we're doing.  It's almost overkill, even, but of
> course you can never have too much overkill.
>

-- 
http://www.hi.is/~akig/