[Clipart] Is anyone working on categorizing the existing images?
Bryce Harrington
bryce at bryceharrington.com
Tue Jun 8 22:16:38 PDT 2004
On Tue, 8 Jun 2004, Jonadab the Unsightly One wrote:
> Bryce Harrington <bryce at bryceharrington.com> writes:
>
> > By 'framework' what do you think we need?
>
> Some mechanism (possibly a web-based thingy) whereby we can look at a
> tree of existing categories, browse through the uncategorized images,
> and designate categories for them.
>
> > There's three items we had identified as necessary in prior
> > discussions - one is a metadata format, and IIRC we identified XMP
> > (which builds on Dublin Core, RDF, etc.)
>
> To me, it doesn't matter how this information is stored, as long as
> it's possible to tell what categories any given image is in, construct
> a tree of the categories that contain images, and get a list of the
> images and subcategories in any given category.
Got it:
We need a mechanism (including web-based) that allows:
* Looking through tree of existing categories
* Browse uncategorized images
* Designate categories for images
* Generate XMP for an item
* View the categories for a given image
* For a given category get a list of images and subcategories
> > The other is a way to ensure the appropriate XML snippets get
> > generated and added to the item during upload.
>
> Are you talking about having the person who submits the image give it
> a tentative category, or just marking it as uncategorized? (I can go
> either way on that...)
Yes, I was thinking of allowing the submitter to designate a 'tentative
category'. Or categories.
Perhaps the ideal would be to give them a list of checkboxes of
available categories to choose from, with a "fill in the blank" at the
bottom. Or maybe that'd turn into too many checkboxes... Maybe provide
some sort of navigational system for assigning increasingly finer
subcategories. Hmm. Ideas?
> I think we have to realize that categories are going to develop new
> subcategories as images are submitted. It also seems highly likely
> that some images and even subcategories will belong in multiple
> categories. It is not difficult to imagine ten or twenty images of
> cooked turkeys being put in a category together, and having that
> category (Turkeys-Cooked) listed under both Food/Meat and
> Holidays/Thanksgiving. Or whatever. I do think the index approach
> lends itself well to this; Yahoo has tons of crosslinking.
Yup, agreed.
> > So, in thinking about how their findings would apply to the Open
> > Clip Art Library, our categories would be more like "keywords" that
> > can be created and used as needed. The cost is that we would need
> > people to review keywords that are chosen and adjust the content as
> > needed so things "match up" (so we don't end up with categories of
> > "Fruit" "Fruits" "Fruits & Vegetables" "Vegetables and Fruits",
> > etc.)
>
> In libraries this process is called Authority Control, and yeah, it's
> definitely going to be necessary at some point, though probably not
> right away. If we can get a heirarchical tree of the existing
> categories, then that will make it easier to decide what to adjust.
Soudns good.
> If each category is a keyword (say, for the example above,
> turkeys-cooked is a keyword that can be attached to images of cooked
> turkeys), then the other thing we need in order to construct a
> heirarchy is the ability to take an existing category keyword and
> attach supercategory keywords to it -- that is, we might take the
> turkeys-cooked category and attach both the thanksgiving keyword and
> the meat (or maybe maindish) keyword to it. Then we'd attach the
> holidays keyword to the thanksgiving category and the food keyword to
> the meat (or maindish) category. Am I making any sense?
Oh, supercategories -- interesting idea.
Use Case
1. Several cooked turkey images are uploaded
2. The turkey images are assigned various keywords:
2 turkeys have keywords = "Thanksgiving"
1 turkey have keywords = "Holiday"
3 turkeys have keywords = "Food"
2 turkeys have keywords = "foods"
2 turkeys have keywords = "Food" and "Holiday"
3. User identifies "Holiday" as a supercategory of "Thanksgiving"
System adjusts:
3 turkeys have keywords = "Holiday::Thanksgiving"
3 turkeys have keywords = "Food"
2 turkeys have keywords = "foods"
2 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
4. User specifies "foods=>Food"
3 turkeys have keywords = "Holiday::Thanksgiving"
5 turkeys have keywords = "Food"
2 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
5. User adds "Food" and "Holiday::Thanksgiving" keywords for all
turkeys. So:
10 turkeys have keywords = "Food" and "Holiday::Thanksgiving"
> OTOH, this is not a very complicated wheel. We're talking about a
> database with two tables. The one table has records for all the
> images, with a field that uniquely identifies the image in the
> collection (a URI will do), whatever other metadata you want (author
> or source, image file format (e.g., SVG), bw/greyscale/indexed/color,
> whatever), and a categories field where one or more category keywords
> can be put. The other table you need is for the categories themselves
> and contains the unique identified (keyword), metadata (description,
> synonyms, ...), and a categories field listing categories it belongs
> to. If you want to get fancy and go both ways you could also have a
> subcategories field listing categories that belong to it, but then any
> code that modifies the category keywords has to change both places.
> (Also, that's the sort of thing that can be retrofitted later. We
> don't need to link both ways just to get started.)
So sounds like something like this:
CREATE TABLE image (
id INT NOT NULL AUTO_INCREMENT,
uri VARCHAR(255),
author VARCHAR(255),
source VARCHAR(255),
format_id INT
);
CREATE TABLE format (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(255)
);
INSERT INTO format (id, name) VALUES
( 1, "svg" ),
( 2, "jpg" ),
( 3, "wmf" );
CREATE TABLE category (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(255),
description TEXT
);
CREATE TABLE category_to_image (
category_id INT,
image_id INT
);
CREATE TABLE category_inheretance (
category_id INT,
supercategory_id INT
);
CREATE TABLE category_synonym (
category_id INT,
synonym_category_id INT
);
> With a simple db library like Class::DBI, this could probably be
> tossed together well enough to start using it in a couple of hours.
I've used DBI quite a bit but not Class::DBI. If you got the ball
rolling, though, I'm game to give it a go.
> The existing upload facility would need to be modified to create a
> record in the db for every item uploaded, and we'd need a list of the
> existing already-uploaded ones in order to create records for those.
> The records could be created initially with no category keywords, and
> the categories table could start empty, and the same script that adds
> a keyword to an image's record could also create the category record
> if it does not exist already.
Sounds good.
> Given that there are only a couple of quite simple tables in the
> database, it could even be flat files, or maybe DBD::SQLite.
Nah, I've talked with the freedesktop admin about db's and they said a
mysql db would be no prob to set up (in fact, it's on my todo list to
give him the info to set up one for a bug tracker for us.)
Bryce
More information about the clipart
mailing list