[Clipart] Contributors and moderation

Sun Mar 28 10:49:56 PST 2004

On Wed, 24 Mar 2004, Ted Gould wrote:
> Basically, groups are always started with good intentions by a group of
> people who really want it to be successful.  Then other people join with
> different goals, and things go a little bit crazy.  A good example of
> this is Slashdot, who ended up implementing a rather sophisticated
> rating system to handle the 'junk' submissions.

This is definitely true that crap tends to leak in, even when everyone's
intentions are of the best sort.  One runs into this issue a lot with
large scale game design.  The magic number appears to be about 200-300
people; lower than that, the group can self-regulate, but beyond and the
issues described in the article start to dominate.

As an example, in WorldForge we had an inherent trust that art and music
media submitted was the original work of the submitter.  However, as the
number of media creators increased we got to a point where we found one
contributor plagerizing music.  I imagine this issue is one that the
clipart project may someday need to deal with, as well; it is wise to
identify a validation mechanism and means of addressment here at the
outset, that can be employed when this occurs.

For Wikipedia we had similar concerns, but more specifically about
copyright infringement.  For this we opted to make the requirements for
submission be overt (via a contract-like agreement at time of
submission), and includes a vetting mechanism that makes it possible to
let the media developers do self-vetting and validation.

Based on those experiences, an approach that might work better for us
would be to encourage discrete clipart 'packages' rather than a single
flat collection.  Each collection would belong to a specific person or
small group of people who can review and police the items within that
collection for quality, legality, offensiveness, etc.  Different
collections could have different rules.

Larger arch-collections can then be made by an individual selecting the
set of collections that meet with a given criteria.  For instance, one
arch-collection might include only 'G' rated art, so only choose
collections with matching policies.  Essentially this is an indexing
mechanism; 'indexing' is one of the proven strategies for sorting crap
from quality.  The indexer may even wish to assign scores to indicate
how well the item matches the indexes' judgement criteria, if there's a
lot of items.

> I'm concerned that our clip-art repository may become victim to a
> similar issue.  Some will do it on purpose, they'll try to submit junk
> that is just entirely unacceptable (which is something we should define
> clearly).  But, others will just make crummy clip-art.  If the
> repository doesn't maintain some amount of quality it will be useless
> for everyone.
>
> So, I guess I'm asking two questions:
> 
> 1) Do we need an 'acceptable use' policy that specifies what is
> acceptable and what isn't?  What should it include?
> 
> My response:  Yes, we do.  I think it should include:
>   -- Graphics are Public domain
>   -- SVG
>   -- Not intended to harm or offend as determined by?

Yes, a policy is required for submission.  This must include an
affirmation that the work is either original art by the submitter, or
that the submitter has written permission from the original artist to
submit it to the repository as public domain.  

I'm unsure we want requirements related to offending people.  It is hard
to know what would be a major offense in one culture but no big deal in
another.  I think you're right that _intent_ is the key here, since
something neutral in one culture may be offensive in another.  Hmm...

How about this:  Instead of restricting offensive material we just
require it to be correctly labelled.  I.e., 'contains some nudity',
'contains nazi symbolism', 'contains widespread profanity', etc.
This way, censorship is between the collection creator and users, not
the responsibility of the overall clipart project.  Further, it is
synergistic with our metadata/indexing approach by generating additional
information for searching/filtering on.  And it can be much more finely
grained than traditional rating systems - i.e., someone may find
violence and blood offensive, but nudity perfectly okay.

> 2) How do we police works that are in the repository?  Do we need a
> group that is in charge of this?  What mechanisms are required to be
> built in for this to occur?  Is being able to roll back malicious
> changes enough?  Do we have a different set of what appears on the
> webpage that what we expect distributions to ship?  Do graphics need
> ratings?

I really advocate self-policing approaches, sort of like the wiki model,
where the contributors are also the editors.  Essentially, the
philosophy is to give the users the tools and powers to manage and
handle control of their community.

There are two methods for doing this: Positive control (indexing) and
negative control (censorship).  The former is akin to the wiki approach;
anyone can create an article of any sort, but only if the item is
included on an index page (which by definition is highly reviewed by
others) will it be directly accessible.  CPAN uses a similar approach -
anyone can submit Perl modules but they'll only be included in the index
if they're Good Enough.  Negative control is a filtering approach that
excludes items that fail to meet certain conditions; Wikipedia includes
provisions for admins to permanently delete abusive data as identified
by editors; Slashdot's comment scoring/labeling system doesn't destroy
anything, but poor quality items simply drop through the floor by
getting low scores.  Freshmeat uses a labeling system, plus admins to
review each submission to keep out the crap.  

It's interesting to note that some systems have user-indexing with
admin-censorship, whereas others have admin-indexing with
user-censorship.  It seems like it should be possible to take one of the
user-indexing schemes and couple it with one of the user-censorship
schemes to get a pretty much admin-less system.  And probably it would
be wise to limit this participation to 'core-group' such as by requiring
that only people who have contributed acceptable items be allowed to
create indexes or check/assign labels.

Allowing contributors to set up indexes (ala CPAN or Wikipedia) gives
the means of identifying quality from crap in a distributed fashion.
For censoring we could use a labeling approach so that we can leave
policing/filtering to the contributors themselves as well.  Keeping
things modularized into collections allows most of the conversation and
collaboration to be done at the small group scale, to avoid the issues
identified in that article.  

Finally, I think we need to be careful to Keep It Simple here at the
start.  I can imagine some really sophisticated software to implement
the above ideas, but I think it would be a disservice to let the
technology development sidetrack us from actually getting the ball
rolling.  So to start with, the simplest approach I can think of would
be to put the collection tarballs on a file server, create indexes to
them via Wiki pages, and inside each tarball include a text file with a
list of labels that apply to it.  The indexes can be maintained by
anyone, the label list is maintained by the collection maintainer.  When
we surpass 100 collections we will probably want some more sophisticated
software like a search/filter tool, a better upload manager, and a
better way for writing indexes, but those can come later as we need
them and people have time/interest to make them. 

Bryce