[Clipart] Is anyone working on categorizing the existing images?

Jonadab the Unsightly One jonadab at bright.net
Wed Jun 9 20:31:33 PDT 2004


Bryce Harrington <bryce at bryceharrington.com> writes:

> We need a mechanism (including web-based) that allows:
>    * Looking through tree of existing categories
>    * Browse uncategorized images
>    * Designate categories for images
>    * Generate XMP for an item
>    * View the categories for a given image
>    * For a given category get a list of images and subcategories

Yes.  The exact details of the interface aren't deeply important.

> > Are you talking about having the person who submits the image give
> > it a tentative category, or just marking it as uncategorized?  (I
> > can go either way on that...)
> 
> Yes, I was thinking of allowing the submitter to designate a
> 'tentative category'.  Or categories.
> 
> Perhaps the ideal would be to give them a list of checkboxes of
> available categories to choose from, with a "fill in the blank" at
> the bottom.  Or maybe that'd turn into too many checkboxes...  Maybe
> provide some sort of navigational system for assigning increasingly
> finer subcategories.  Hmm.  Ideas?

If we can bring ourselves to use client-side scripting, selecting a
general category (e.g. by checking a checkbox) could cause a set of
subcategories to appear under it.  Users with scripts disabled could
still specify a general (top-level) category.  The client-side ECMA
script could be generated from the database (along with the rest of
the page) by the server-side stuff.

An alternative is to make a round-trip to the server for each level in
the category heirarchy, which would expend a lot of user time and so
seems like the greater evil, IMO.

Or we could put a tree of category choices in one frame and let the
user fill in the blank with one of them -- or users with client-side
scripting enabled could click on one of them and have it automatically
filled in.  This would work decently well with the keyword approach,
but a notable thing about it is that users would be able to type in
things that don't match any of the extant categories.  (That could be
construed as a feature or a bug, depending on your perspective; if
it's a feature, then the other options could add a fill-in "other" as
one of the choices.)  But having the list of existing categories there
would hopefully cut down somewhat on the synonym-category problem.
Hopefully.

There's probably another option if those are all too odious.

> Oh, supercategories -- interesting idea.
> 
> Use Case
>    1.  Several cooked turkey images are uploaded
>    2.  The turkey images are assigned various keywords:
>        2 turkeys have keywords = "Thanksgiving"
>        1 turkey have keywords = "Holiday"
>        3 turkeys have keywords = "Food"
>        2 turkeys have keywords = "foods"
>        2 turkeys have keywords = "Food" and "Holiday"
>    3.  User identifies "Holiday" as a supercategory of "Thanksgiving"
>        System adjusts:
>        3 turkeys have keywords = "Holiday::Thanksgiving"
>        3 turkeys have keywords = "Food"
>        2 turkeys have keywords = "foods"
>        2 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
>    4.  User specifies "foods=>Food"
>        3 turkeys have keywords = "Holiday::Thanksgiving"
>        5 turkeys have keywords = "Food"
>        2 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
>    5.  User adds "Food" and "Holiday::Thanksgiving" keywords for all
>        turkeys.  So:
>        10 turkeys have keywords = "Food" and "Holiday::Thanksgiving"

Something like that.  And then somebody comes along and decides that
there are a bunch of cooked turkeys and they deserve their own
category, so he makes a CookedTurkeys category, gives all the cooked
turkey images *that* keyword, and then marks both Food and
Thanksgiving as supercategories for it.

Ideally, the act of marking Foo as a supercategory of Bar should mean
that any image already designated as both Foo and Bar would then only
be listed as Bar (and any other keywords it might have, but not Foo),
since Foo is automatic, being a supercategory of Bar.  We could start
out with a system where we do some of this manually and add some of
the automatic processing as we go along and as it becomes necessary.

Alternately, even better, it might be convenient when looking at a
list of images in the Foo category to be able to check several of them
and split them into a subcategory, Foo::Bar.  In terms of the
keywords, this would create a Bar keyword, change the keywords on the
marked images to remove Foo and add Bar instead, and in the categories
table add the Foo keyword to the Bar category.  Again, this is a
feature that could be added later.

So then when we want to get a list of all Foo images, we get the list
of all the _other_ Foo images, plus a Bar subcategory we can click on
to see those.

Is that too confusing?

> So sounds like something like this:
> 
> CREATE TABLE image (
>     id              INT NOT NULL AUTO_INCREMENT,
>     uri             VARCHAR(255),
>     author          VARCHAR(255),
>     source          VARCHAR(255),
>     format_id       INT
> );

Do we want to keep date submitted in the db?  Or can we get that from
the uri e.g. by feeding the relative pathname to a filetest operator?
Or does it not even matter?

> CREATE TABLE format (
>     id              INT NOT NULL AUTO_INCREMENT,
>     name            VARCHAR(255)
> );
> 
> INSERT INTO format (id, name) VALUES 
> ( 1, "svg" ),
> ( 2, "jpg" ),
> ( 3, "wmf" );

We might ought to add PNG to that list.  Reducing all bitmapped
clipart to JPG is IMO not the way to go.  Consider that clipart often
gets inserted into fliers and things and printed with a four-color
process at 600dpi or higher; you don't want to introduce lossy
compression if you can avoid it.  (If the source materiel is that way
already, se la vi.)  There's also the issue of the alpha channel,
which JPG doesn't have, and which is exceedingly useful for
anti-aliasing bitmapped clipart that might not end up on the same
color of background every time.

I'm *tempted* to say add XCF too, because a properly layered image is
*much* easier to modify, but I suspect most users in the target
audience wouldn't know what to do with those anyway.  (I sure would,
but I'm an oddball maybe.)  And most users of clip-art don't want to
modify it.  So XCF is probably unnecessary bloat.

But definitely add PNG; if you're going to support just one bitmapped
format, that's the one.  If the user's going to use it where quality
doesn't matter, they can JPEG compress it themselves, scale it down to
16x16 pixels, reduce the color depth to 4 bits, whatever.  Sure, there
are a lot of users who don't know how to do these things, but they
also don't know or care how many bytes their images take up; people
who are likely to care about that know what to do about it.

> CREATE TABLE category (
>     id              INT NOT NULL AUTO_INCREMENT,
>     name            VARCHAR(255),
>     description     TEXT
> );
> 
> CREATE TABLE category_to_image (
>     category_id         INT,
>     image_id            INT
> );
> 
> CREATE TABLE category_inheretance (
>     category_id         INT,
>     supercategory_id    INT
> );
> 
> CREATE TABLE category_synonym (
>     category_id         INT,
>     synonym_category_id INT
> );

I would have designed this differently, but my design is almost
entirely functionally isomorphic to yours, so most of the differences
are not important.  Probably the biggest difference is that I would
have had fewer tables with more fields each, but it's not a big deal.

> > With a simple db library like Class::DBI, this could probably be
> > tossed together well enough to start using it in a couple of hours.
> 
> I've used DBI quite a bit but not Class::DBI.  If you got the ball
> rolling, though, I'm game to give it a go.

I've actually not used Class::DBI yet either, but I keep meaning to.
I initially didn't know about it and so I rolled my own roughly
isomorphic (albeit not object-oriented) solution, but rather than
maintain that I've been meaning to migrate to Class::DBI.

> > Given that there are only a couple of quite simple tables in the
> > database, it could even be flat files, or maybe DBD::SQLite.
> 
> Nah, I've talked with the freedesktop admin about db's and they said
> a mysql db would be no prob to set up (in fact, it's on my todo list
> to give him the info to set up one for a bug tracker for us.)

Ah, I've used MySQL at work for some things, and it's plenty good
enough for what we're doing.  It's almost overkill, even, but of
course you can never have too much overkill.

-- 
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/





More information about the clipart mailing list