[Clipart] Positive submission reinforcement

Jonadab the Unsightly One jonadab at bright.net
Sun Oct 31 21:45:00 PST 2004


Nicu Buculei <nicu at apsro.com> writes:

> Jonadab the Unsightly One wrote:
>> Will it work?  I was under the impression that the browse script
>> needed something additional to work correctly, that had been manually
>> generated in the past.  I could be mistaken, but that was my
>> understanding.  Was it just the sorting into directories that was done
>> manually?
>
> no, we don't need anything in addition to the directory structure.
>
> after reviewing the candidate for 0.08 release i saw we have 280
> top-level directories - this is not nice for browsing

What about, as an interim solution (until we can get things reading
from the DMS) a one-off Perl script that re-arranges the files
according to a fixed list of what keywords go in what directories, in
order to move from the flat directory structure to a fixed hierarchy?

Something like the script below might get us by...

It's not a perfect solution; it's not a long-term solution; the
long-term solution is for the browse thingy to read directly from the
DMS using the language-dependent category hierarchy thingydoo.  This
interim solution isn't even well tested, and there are even known
shortcomings (e.g., it makes no attempt to detect duplicates but on
the other hand will merrily overwrite a file if another file with the
same filename gets put in the same category); however, it seems to
mostly work and may be better than several dozen toplevel directories,
and it's significantly better than having the browse tool be several
releases out of date.

Don't use it without backing up the directory first, in case something
is wrong or we later change our minds about where to put some
keywords.  In particular, I believe there are some keywords that
should be listed in the %hierarchy hash that are currently not, and
anything not listed gets thrown willy-nilly into unsorted.

I've tested it with the contents of that 0.08 tarball, and it seems to
mostly do a pretty decent job.

#!/usr/bin/perl
# -*- cperl -*-

my $debug = 42; $|=1;
#my $basedir = '/projects/clipart/OpenClipart';
my $basedir = '/root/img/download/openclipart-0.08';
# You must already have the latest release unpacked there.

# Also it's useful to do the following preprocessing first:
# * Within Flags, move all the images from the subdirs up into Flags
#   itself, or else move the subdirs up out of Flags and list them
#   in the %hierarchy hash below.

my %hierarchy =
  (

   'action'         => ['computer', 'icons', 'action'],
   'actions'        => ['computer', 'icons', 'action'],
   'africa'         => ['geography'],
   'albert'         => ['people'],
   'america'        => ['geography'],
   'animal'         => ['animals'],
   'animals'        => ['animals'],
   'appicon'        => ['computer', 'icons', 'applications'],
   'apple'          => ['food', 'fruit'],
   'application'    => ['computer', 'icons', 'applications'],
   'apps'           => ['computer', 'icons', 'applications'],

   'aqua'           => ['computer', 'icons'],
   'architecture'   => ['buildings'],
   'arrow'          => ['shapes', 'arrows'],
   'asia'           => ['geography'],
   'automne'        => ['plants'],
   'automobiles'    => ['transportation', 'vehicles'],
   'bacon'          => ['food'],
   'ball'           => ['recreation', 'toys'],
   'beach'          => ['recreation', 'toys'],
   'beverages'      => ['food', 'beverages'],
   'bicycle'        => ['transportation', 'vehicles'],
   'bicycles'       => ['transportation', 'roadsigns'],
   'bill'           => ['people'],
   'bird'           => ['animals', 'birds'],
   'boat'           => ['transportation', 'vehicles'],
   'book'           => ['education', 'books'],
   'bouquet'        => ['plants', 'flowers'],
   'bowl'           => ['food'],

   'breizh'         => ['decorations'],
   'britain'        => ['geography'],
   'bsd'            => ['computer'],
   'bug'            => ['animals', 'bugs'],
   'building'       => ['buildings'],
   'bulma'          => ['logos', 'linux'],
   'bulmalug'       => ['logos', 'linux'],
   'bulma.net'      => ['logos', 'linux'],

   'business'       => ['office'],
   'button'         => ['computer', 'icons'],
   'cactus'         => ['plants'],
   'callout'        => ['shapes', 'callouts'],
   'candy'          => ['food', 'deserts'],
   'carrot'         => ['food', 'vegitables'],
   'caution'        => ['transportation', 'roadsigns'],
   'celebrity'      => ['people'],
   'celtic'         => ['decorations'],
   'che'            => ['people'],

   'chicken'        => ['animals'],
   'clipart'        => ['computer'], # Why do these images have this keyword?
   'clipboard'      => ['office'],
   'cochon'         => ['animals'], # Clearly, some keyword normalization or authority control is in order.
   'coffe'          => ['food', 'beverages'],
   'computer'       => ['computer'],
   'curve'          => ['transportation', 'roadsigns'],
   'curvy'          => ['transportation', 'roadsigns'],
   'daemon'         => ['logos'],
   'decoration'     => ['decorations'],
   'decorations'    => ['decorations'],

   'desert'         => ['plants'],
   'device'         => ['computer', 'icons', 'device'],
   'devices'        => ['computer', 'icons', 'device'],
   'do'             => ['transportation', 'roadsigns'],
   'dry'            => ['plants'],
   'duck'           => ['animals', 'birds'],

   'eagle'          => ['animals', 'birds'],
   'education'      => ['education'],
   'einstein'       => ['people'],
   'emblems'        => ['computer', 'icons', 'filetypes'],
   'enter'          => ['transportation', 'roadsigns'],
   'entertainment'  => ['recreation'],
   'envelope'       => ['office'],
   'europe'         => ['geography'],
   'evergreen'      => ['plants', 'trees'],
   'examples'       => ['special', 'examples'],
   'Examples'       => ['special', 'examples'],
   'face'           => ['people'],

   'farm'           => ['animals'],
   'festive'        => ['recreation', 'party'],
   'filesystem'     => ['computer', 'icons'],
   'filesystems'    => ['computer', 'icons'],
   'filetype'       => ['logos', 'OpenClipArtLibrary'], # WHY do these images have this keyword?

   'fish'           => ['animals'],
   'flag'           => ['geography', 'flags'],
   'flags'          => ['geograph', 'flags'],
   'flight'         => ['animals', 'birds'],
   'flourish'       => ['decorations'],
   'flower'         => ['plants', 'flowers'],
   'fly'            => ['animals', 'birds'],
   'food'           => ['food'],
   'foul'           => ['animals', 'birds'],
   'fruit'          => ['food', 'fruit'],

   'game'           => ['recreation', 'games'],
   'geometry'       => ['shapes'],
   'gnu'            => ['logos', 'linux'],
   'gourami'        => ['animals', 'fish'],
   'grass'          => ['animals', 'bugs'], # Really strange keyword assignment here.
   'gradients'      => ['special', 'gradients'],
   'grape'          => ['food', 'fruit'],
   'great'          => ['geography'],
   'guevera'        => ['people'],

   'harvest'        => ['food'],
   'hen'            => ['animals'],
   'highway'        => ['transportation', 'roadsigns'],
   'holiday'        => ['recreation', 'holiday'],
   'homes'          => ['buildings', 'homes'],
   'hopper'         => ['animals', 'bugs'],
   'house'          => ['buildings', 'homes'],

   'icon'           => ['computer', 'icons'],
   'icons'          => ['computer', 'icons'],
   'imac'           => ['computer'],
   'images'         => ['logos', 'OpenClipArtLibrary'],
   'insect'         => ['animals', 'bugs'],

   'insects'        => ['animals', 'bugs'],
   'interface'      => ['computer', 'icons'],
   'kawai'          => ['animals'], # Yep, we'll need authority control.
   'ladybug'        => ['animals', 'bugs'],
   'laptop'         => ['computer'],
   'library'        => ['buildings'],

   'linux'          => ['logos', 'linux'],
   'logo'           => ['logos'],
   'logos'          => ['logos', 'linux'], # Umm, that's what's there, currently.
   'mammal'         => ['animals', 'mammals'],
   'mammals'        => ['animals', 'mammals'],
   'man'            => ['people'],
   'map'            => ['geography'],
   'maps'           => ['geography'],
   'map_symbols'    => ['geography', 'map_symbols'],
   'mascot'         => ['logos', 'linux'],
   'meat'           => ['food'],
   'milk'           => ['food', 'beverages'],

   'mime-types'     => ['computer', 'icons', 'filetypes'],
   'mousecursor'    => ['computer'],
   'mushroom'       => ['food'],
   'music'          => ['recreation', 'music'],
   'musicsym'       => ['recreation', 'music'],
   'navigation'     => ['computer', 'icons'],
   'nimbochromis'   => ['animals', 'fish'],
   'nicu'           => ['logos', 'OpenClipArtLibrary'],
   'no'             => ['transportation', 'roadsigns'],
   'not'            => ['transportation', 'roadsigns'],
   'note'           => ['recreation', 'music'],

   'ocal_logo'      => ['logos', 'OpenClipArtLibrary'],
   'oceania'        => ['geography'],
   'office'         => ['office'],
   'one'            => ['transportation', 'roadsigns'],
   'openclipart'    => ['logos', 'OpenClipArtLibrary'],
   'pear'           => ['food', 'fruit'],
   'penguin'        => ['animals'],

   'people'         => ['people'],
   'pig'            => ['animals'],
   'plane'          => ['transportation', 'vehicles'],
   'plant'          => ['plants'],
   'roadsign'       => ['transportation', 'roadsigns'],
   'scotland'       => ['geography'],

   'shape'          => ['shapes'],
   'shapes'         => ['shapes'],
   'sign'           => ['signs_and_symbols'],
   'signs'          => ['geography', 'map_symbols'], # Not sure if this is right?

   'soccer'         => ['recreation'],
   'soup'           => ['food'],
   'sport'          => ['recreation'],
   'sports'         => ['recreation'],
   'star'           => ['shapes'],
   'steaming'       => ['food'],
   'stop'           => ['transportation', 'roadsigns'],
   'study'          => ['education'],
   'symbol'         => ['signs_and_symbols'],

   'tea'            => ['food', 'beverages'],
   'toy'            => ['recreation', 'toys'],
   'toys'           => ['recreation', 'toys'],
   'thanksgiving'   => ['recreation', 'holiday', 'thanksgiving'],
   'transport'      => ['transportation', 'vehicles'],
   'transportation' => ['transportation', 'vehicles'],
   'tree'           => ['plants', 'trees'],
   'tropical'       => ['animals', 'fish'],

   'turn'           => ['transportation', 'roadsigns'],
   'tux'            => ['logos', 'linux'],
   'tuxx'           => ['logos', 'linux'],
   'u'              => ['transportation', 'roadsigns'],
   'usholiday'      => ['recreation', 'holiday'],
   'vegitable'      => ['food', 'vegitables'],
   'venustus'       => ['animals', 'fish'],
   'warning'        => ['signs_and_symbols'],
   'watch'          => ['transportation', 'roadsigns'],
   'way'            => ['transportation', 'roadsigns'],
   'wildlife'       => ['animals'],
   'wreath'         => ['recreation', 'holiday', 'Christmas'],
   'xmas'           => ['recreation', 'holiday', 'Christmas'],

   # ... and so on
  );

use File::Spec::Functions;
if (not -e catfile($basedir, 'unsorted')) {
  mkdir catfile($basedir, 'unsorted') or warn "Cannot create unsorted directory: $!\n";
}

opendir DIR, $basedir or die "Cannot opendir $basedir: $!";
@d = readdir DIR; close DIR;
for my $d (@d) {
  print "=================== (Processing $d)\n" if $debug>1;
  if (-d $d and not $d =~ /[.]/) {
    my $dest = $hierarchy{lc $d} || ['unsorted'];
    if ($d ne catfile(@$dest)) {
      print "[Dest: @$dest]\n" if $debug >2;
      # pop @$dest while $$dest[-1] =~ /[.]/;
      for (0..(@$dest - 1)) {
        my $x = catfile($basedir, @$dest[0..$_]);
        if (not -e $x) {
          print "Creating dir: $x\n";
          mkdir $x;
          warn "Cannot create directory $x: $!\n" if not -d $x;
        }
      }
      opendir DIR, catfile($basedir, $d);
      @f = readdir DIR; close DIR;
      for $f (@f) {
        if (not $f =~ m/^[.]/) {
          print "[$f]" if $debug>3;
          my $sourcepath = catfile($basedir, $d,     $f);
          my $destpath   = catfile($basedir, @$dest, $f);
          if ($sourcepath eq $destpath) {
            warn "No need to move $sourcepath" if $debug>5;
          } else {
            if (not system('mv', '-f', $sourcepath, $destpath)) {
              if ($! =~ /same file/) {
                unlink $sourcepath or warn "Connot unlink $sourcepath: $!\n";
              } else {
                warn "Unsuccessful move from $sourcepath to $destpath: $!\n";
              }
            }
          }
        } else {
          warn "Not attempting to move $f\n" if $debug>3;
        }
      }
      rmdir catfile($basedir, $d) or warn "Directory Not Removed: $d\n";
    }
  } else {
    warn "Not processing: $d (not a directory)\n";
  }
}

__END__

-- 
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/




More information about the clipart mailing list