[Clipart] Positive submission reinforcement
Bryce Harrington
bryce at bryceharrington.com
Mon Nov 1 00:46:30 PST 2004
I couldn't get the script to work. It just gives me a bunch of errors
about Bad file descriptor. I worked on it for a while but couldn't get
it to run and now am out of time. I've attached the script.
Bryce
On Mon, 1 Nov 2004, Jonadab the Unsightly One wrote:
> Nicu Buculei <nicu at apsro.com> writes:
>
> > Jonadab the Unsightly One wrote:
> >> Will it work? I was under the impression that the browse script
> >> needed something additional to work correctly, that had been manually
> >> generated in the past. I could be mistaken, but that was my
> >> understanding. Was it just the sorting into directories that was done
> >> manually?
> >
> > no, we don't need anything in addition to the directory structure.
> >
> > after reviewing the candidate for 0.08 release i saw we have 280
> > top-level directories - this is not nice for browsing
>
> What about, as an interim solution (until we can get things reading
> from the DMS) a one-off Perl script that re-arranges the files
> according to a fixed list of what keywords go in what directories, in
> order to move from the flat directory structure to a fixed hierarchy?
>
> Something like the script below might get us by...
>
> It's not a perfect solution; it's not a long-term solution; the
> long-term solution is for the browse thingy to read directly from the
> DMS using the language-dependent category hierarchy thingydoo. This
> interim solution isn't even well tested, and there are even known
> shortcomings (e.g., it makes no attempt to detect duplicates but on
> the other hand will merrily overwrite a file if another file with the
> same filename gets put in the same category); however, it seems to
> mostly work and may be better than several dozen toplevel directories,
> and it's significantly better than having the browse tool be several
> releases out of date.
>
> Don't use it without backing up the directory first, in case something
> is wrong or we later change our minds about where to put some
> keywords. In particular, I believe there are some keywords that
> should be listed in the %hierarchy hash that are currently not, and
> anything not listed gets thrown willy-nilly into unsorted.
>
> I've tested it with the contents of that 0.08 tarball, and it seems to
> mostly do a pretty decent job.
>
> #!/usr/bin/perl
> # -*- cperl -*-
>
> my $debug = 42; $|=1;
> #my $basedir = '/projects/clipart/OpenClipart';
> my $basedir = '/root/img/download/openclipart-0.08';
> # You must already have the latest release unpacked there.
>
> # Also it's useful to do the following preprocessing first:
> # * Within Flags, move all the images from the subdirs up into Flags
> # itself, or else move the subdirs up out of Flags and list them
> # in the %hierarchy hash below.
>
> my %hierarchy =
> (
>
> 'action' => ['computer', 'icons', 'action'],
> 'actions' => ['computer', 'icons', 'action'],
> 'africa' => ['geography'],
> 'albert' => ['people'],
> 'america' => ['geography'],
> 'animal' => ['animals'],
> 'animals' => ['animals'],
> 'appicon' => ['computer', 'icons', 'applications'],
> 'apple' => ['food', 'fruit'],
> 'application' => ['computer', 'icons', 'applications'],
> 'apps' => ['computer', 'icons', 'applications'],
>
> 'aqua' => ['computer', 'icons'],
> 'architecture' => ['buildings'],
> 'arrow' => ['shapes', 'arrows'],
> 'asia' => ['geography'],
> 'automne' => ['plants'],
> 'automobiles' => ['transportation', 'vehicles'],
> 'bacon' => ['food'],
> 'ball' => ['recreation', 'toys'],
> 'beach' => ['recreation', 'toys'],
> 'beverages' => ['food', 'beverages'],
> 'bicycle' => ['transportation', 'vehicles'],
> 'bicycles' => ['transportation', 'roadsigns'],
> 'bill' => ['people'],
> 'bird' => ['animals', 'birds'],
> 'boat' => ['transportation', 'vehicles'],
> 'book' => ['education', 'books'],
> 'bouquet' => ['plants', 'flowers'],
> 'bowl' => ['food'],
>
> 'breizh' => ['decorations'],
> 'britain' => ['geography'],
> 'bsd' => ['computer'],
> 'bug' => ['animals', 'bugs'],
> 'building' => ['buildings'],
> 'bulma' => ['logos', 'linux'],
> 'bulmalug' => ['logos', 'linux'],
> 'bulma.net' => ['logos', 'linux'],
>
> 'business' => ['office'],
> 'button' => ['computer', 'icons'],
> 'cactus' => ['plants'],
> 'callout' => ['shapes', 'callouts'],
> 'candy' => ['food', 'deserts'],
> 'carrot' => ['food', 'vegitables'],
> 'caution' => ['transportation', 'roadsigns'],
> 'celebrity' => ['people'],
> 'celtic' => ['decorations'],
> 'che' => ['people'],
>
> 'chicken' => ['animals'],
> 'clipart' => ['computer'], # Why do these images have this keyword?
> 'clipboard' => ['office'],
> 'cochon' => ['animals'], # Clearly, some keyword normalization or authority control is in order.
> 'coffe' => ['food', 'beverages'],
> 'computer' => ['computer'],
> 'curve' => ['transportation', 'roadsigns'],
> 'curvy' => ['transportation', 'roadsigns'],
> 'daemon' => ['logos'],
> 'decoration' => ['decorations'],
> 'decorations' => ['decorations'],
>
> 'desert' => ['plants'],
> 'device' => ['computer', 'icons', 'device'],
> 'devices' => ['computer', 'icons', 'device'],
> 'do' => ['transportation', 'roadsigns'],
> 'dry' => ['plants'],
> 'duck' => ['animals', 'birds'],
>
> 'eagle' => ['animals', 'birds'],
> 'education' => ['education'],
> 'einstein' => ['people'],
> 'emblems' => ['computer', 'icons', 'filetypes'],
> 'enter' => ['transportation', 'roadsigns'],
> 'entertainment' => ['recreation'],
> 'envelope' => ['office'],
> 'europe' => ['geography'],
> 'evergreen' => ['plants', 'trees'],
> 'examples' => ['special', 'examples'],
> 'Examples' => ['special', 'examples'],
> 'face' => ['people'],
>
> 'farm' => ['animals'],
> 'festive' => ['recreation', 'party'],
> 'filesystem' => ['computer', 'icons'],
> 'filesystems' => ['computer', 'icons'],
> 'filetype' => ['logos', 'OpenClipArtLibrary'], # WHY do these images have this keyword?
>
> 'fish' => ['animals'],
> 'flag' => ['geography', 'flags'],
> 'flags' => ['geograph', 'flags'],
> 'flight' => ['animals', 'birds'],
> 'flourish' => ['decorations'],
> 'flower' => ['plants', 'flowers'],
> 'fly' => ['animals', 'birds'],
> 'food' => ['food'],
> 'foul' => ['animals', 'birds'],
> 'fruit' => ['food', 'fruit'],
>
> 'game' => ['recreation', 'games'],
> 'geometry' => ['shapes'],
> 'gnu' => ['logos', 'linux'],
> 'gourami' => ['animals', 'fish'],
> 'grass' => ['animals', 'bugs'], # Really strange keyword assignment here.
> 'gradients' => ['special', 'gradients'],
> 'grape' => ['food', 'fruit'],
> 'great' => ['geography'],
> 'guevera' => ['people'],
>
> 'harvest' => ['food'],
> 'hen' => ['animals'],
> 'highway' => ['transportation', 'roadsigns'],
> 'holiday' => ['recreation', 'holiday'],
> 'homes' => ['buildings', 'homes'],
> 'hopper' => ['animals', 'bugs'],
> 'house' => ['buildings', 'homes'],
>
> 'icon' => ['computer', 'icons'],
> 'icons' => ['computer', 'icons'],
> 'imac' => ['computer'],
> 'images' => ['logos', 'OpenClipArtLibrary'],
> 'insect' => ['animals', 'bugs'],
>
> 'insects' => ['animals', 'bugs'],
> 'interface' => ['computer', 'icons'],
> 'kawai' => ['animals'], # Yep, we'll need authority control.
> 'ladybug' => ['animals', 'bugs'],
> 'laptop' => ['computer'],
> 'library' => ['buildings'],
>
> 'linux' => ['logos', 'linux'],
> 'logo' => ['logos'],
> 'logos' => ['logos', 'linux'], # Umm, that's what's there, currently.
> 'mammal' => ['animals', 'mammals'],
> 'mammals' => ['animals', 'mammals'],
> 'man' => ['people'],
> 'map' => ['geography'],
> 'maps' => ['geography'],
> 'map_symbols' => ['geography', 'map_symbols'],
> 'mascot' => ['logos', 'linux'],
> 'meat' => ['food'],
> 'milk' => ['food', 'beverages'],
>
> 'mime-types' => ['computer', 'icons', 'filetypes'],
> 'mousecursor' => ['computer'],
> 'mushroom' => ['food'],
> 'music' => ['recreation', 'music'],
> 'musicsym' => ['recreation', 'music'],
> 'navigation' => ['computer', 'icons'],
> 'nimbochromis' => ['animals', 'fish'],
> 'nicu' => ['logos', 'OpenClipArtLibrary'],
> 'no' => ['transportation', 'roadsigns'],
> 'not' => ['transportation', 'roadsigns'],
> 'note' => ['recreation', 'music'],
>
> 'ocal_logo' => ['logos', 'OpenClipArtLibrary'],
> 'oceania' => ['geography'],
> 'office' => ['office'],
> 'one' => ['transportation', 'roadsigns'],
> 'openclipart' => ['logos', 'OpenClipArtLibrary'],
> 'pear' => ['food', 'fruit'],
> 'penguin' => ['animals'],
>
> 'people' => ['people'],
> 'pig' => ['animals'],
> 'plane' => ['transportation', 'vehicles'],
> 'plant' => ['plants'],
> 'roadsign' => ['transportation', 'roadsigns'],
> 'scotland' => ['geography'],
>
> 'shape' => ['shapes'],
> 'shapes' => ['shapes'],
> 'sign' => ['signs_and_symbols'],
> 'signs' => ['geography', 'map_symbols'], # Not sure if this is right?
>
> 'soccer' => ['recreation'],
> 'soup' => ['food'],
> 'sport' => ['recreation'],
> 'sports' => ['recreation'],
> 'star' => ['shapes'],
> 'steaming' => ['food'],
> 'stop' => ['transportation', 'roadsigns'],
> 'study' => ['education'],
> 'symbol' => ['signs_and_symbols'],
>
> 'tea' => ['food', 'beverages'],
> 'toy' => ['recreation', 'toys'],
> 'toys' => ['recreation', 'toys'],
> 'thanksgiving' => ['recreation', 'holiday', 'thanksgiving'],
> 'transport' => ['transportation', 'vehicles'],
> 'transportation' => ['transportation', 'vehicles'],
> 'tree' => ['plants', 'trees'],
> 'tropical' => ['animals', 'fish'],
>
> 'turn' => ['transportation', 'roadsigns'],
> 'tux' => ['logos', 'linux'],
> 'tuxx' => ['logos', 'linux'],
> 'u' => ['transportation', 'roadsigns'],
> 'usholiday' => ['recreation', 'holiday'],
> 'vegitable' => ['food', 'vegitables'],
> 'venustus' => ['animals', 'fish'],
> 'warning' => ['signs_and_symbols'],
> 'watch' => ['transportation', 'roadsigns'],
> 'way' => ['transportation', 'roadsigns'],
> 'wildlife' => ['animals'],
> 'wreath' => ['recreation', 'holiday', 'Christmas'],
> 'xmas' => ['recreation', 'holiday', 'Christmas'],
>
> # ... and so on
> );
>
> use File::Spec::Functions;
> if (not -e catfile($basedir, 'unsorted')) {
> mkdir catfile($basedir, 'unsorted') or warn "Cannot create unsorted directory: $!\n";
> }
>
> opendir DIR, $basedir or die "Cannot opendir $basedir: $!";
> @d = readdir DIR; close DIR;
> for my $d (@d) {
> print "=================== (Processing $d)\n" if $debug>1;
> if (-d $d and not $d =~ /[.]/) {
> my $dest = $hierarchy{lc $d} || ['unsorted'];
> if ($d ne catfile(@$dest)) {
> print "[Dest: @$dest]\n" if $debug >2;
> # pop @$dest while $$dest[-1] =~ /[.]/;
> for (0..(@$dest - 1)) {
> my $x = catfile($basedir, @$dest[0..$_]);
> if (not -e $x) {
> print "Creating dir: $x\n";
> mkdir $x;
> warn "Cannot create directory $x: $!\n" if not -d $x;
> }
> }
> opendir DIR, catfile($basedir, $d);
> @f = readdir DIR; close DIR;
> for $f (@f) {
> if (not $f =~ m/^[.]/) {
> print "[$f]" if $debug>3;
> my $sourcepath = catfile($basedir, $d, $f);
> my $destpath = catfile($basedir, @$dest, $f);
> if ($sourcepath eq $destpath) {
> warn "No need to move $sourcepath" if $debug>5;
> } else {
> if (not system('mv', '-f', $sourcepath, $destpath)) {
> if ($! =~ /same file/) {
> unlink $sourcepath or warn "Connot unlink $sourcepath: $!\n";
> } else {
> warn "Unsuccessful move from $sourcepath to $destpath: $!\n";
> }
> }
> }
> } else {
> warn "Not attempting to move $f\n" if $debug>3;
> }
> }
> rmdir catfile($basedir, $d) or warn "Directory Not Removed: $d\n";
> }
> } else {
> warn "Not processing: $d (not a directory)\n";
> }
> }
>
> __END__
>
>
-------------- next part --------------
#!/usr/bin/perl
# -*- cperl -*-
my $debug = 42; $|=1;
my $basedir = '/home/bryce/src/Clipart/openclipart-0.08';
# You must already have the latest release unpacked there.
# Also it's useful to do the following preprocessing first:
# * Within Flags, move all the images from the subdirs up into Flags
# itself, or else move the subdirs up out of Flags and list them
# in the %hierarchy hash below.
my %hierarchy =
(
'action' => ['computer', 'icons', 'action'],
'actions' => ['computer', 'icons', 'action'],
'africa' => ['geography'],
'albert' => ['people'],
'america' => ['geography'],
'animal' => ['animals'],
'animals' => ['animals'],
'appicon' => ['computer', 'icons', 'applications'],
'apple' => ['food', 'fruit'],
'application' => ['computer', 'icons', 'applications'],
'apps' => ['computer', 'icons', 'applications'],
'aqua' => ['computer', 'icons'],
'architecture' => ['buildings'],
'arrow' => ['shapes', 'arrows'],
'asia' => ['geography'],
'automne' => ['plants'],
'automobiles' => ['transportation', 'vehicles'],
'bacon' => ['food'],
'ball' => ['recreation', 'toys'],
'beach' => ['recreation', 'toys'],
'beverages' => ['food', 'beverages'],
'bicycle' => ['transportation', 'vehicles'],
'bicycles' => ['transportation', 'roadsigns'],
'bill' => ['people'],
'bird' => ['animals', 'birds'],
'boat' => ['transportation', 'vehicles'],
'book' => ['education', 'books'],
'bouquet' => ['plants', 'flowers'],
'bowl' => ['food'],
'breizh' => ['decorations'],
'britain' => ['geography'],
'bsd' => ['computer'],
'bug' => ['animals', 'bugs'],
'building' => ['buildings'],
'bulma' => ['logos', 'linux'],
'bulmalug' => ['logos', 'linux'],
'bulma.net' => ['logos', 'linux'],
'business' => ['office'],
'button' => ['computer', 'icons'],
'cactus' => ['plants'],
'callout' => ['shapes', 'callouts'],
'candy' => ['food', 'deserts'],
'carrot' => ['food', 'vegitables'],
'caution' => ['transportation', 'roadsigns'],
'celebrity' => ['people'],
'celtic' => ['decorations'],
'che' => ['people'],
'chicken' => ['animals'],
'clipart' => ['computer'], # Why do these images have this keyword?
'clipboard' => ['office'],
'cochon' => ['animals'], # Clearly, some keyword normalization or authority control is in order.
'coffe' => ['food', 'beverages'],
'computer' => ['computer'],
'curve' => ['transportation', 'roadsigns'],
'curvy' => ['transportation', 'roadsigns'],
'daemon' => ['logos'],
'decoration' => ['decorations'],
'decorations' => ['decorations'],
'desert' => ['plants'],
'device' => ['computer', 'icons', 'device'],
'devices' => ['computer', 'icons', 'device'],
'do' => ['transportation', 'roadsigns'],
'dry' => ['plants'],
'duck' => ['animals', 'birds'],
'eagle' => ['animals', 'birds'],
'education' => ['education'],
'einstein' => ['people'],
'emblems' => ['computer', 'icons', 'filetypes'],
'enter' => ['transportation', 'roadsigns'],
'entertainment' => ['recreation'],
'envelope' => ['office'],
'europe' => ['geography'],
'evergreen' => ['plants', 'trees'],
'examples' => ['special', 'examples'],
'Examples' => ['special', 'examples'],
'face' => ['people'],
'farm' => ['animals'],
'festive' => ['recreation', 'party'],
'filesystem' => ['computer', 'icons'],
'filesystems' => ['computer', 'icons'],
'filetype' => ['logos', 'OpenClipArtLibrary'], # WHY do these images have this keyword?
'fish' => ['animals'],
'flag' => ['geography', 'flags'],
'flags' => ['geograph', 'flags'],
'flight' => ['animals', 'birds'],
'flourish' => ['decorations'],
'flower' => ['plants', 'flowers'],
'fly' => ['animals', 'birds'],
'food' => ['food'],
'foul' => ['animals', 'birds'],
'fruit' => ['food', 'fruit'],
'game' => ['recreation', 'games'],
'geometry' => ['shapes'],
'gnu' => ['logos', 'linux'],
'gourami' => ['animals', 'fish'],
'grass' => ['animals', 'bugs'], # Really strange keyword assignment here.
'gradients' => ['special', 'gradients'],
'grape' => ['food', 'fruit'],
'great' => ['geography'],
'guevera' => ['people'],
'harvest' => ['food'],
'hen' => ['animals'],
'highway' => ['transportation', 'roadsigns'],
'holiday' => ['recreation', 'holiday'],
'homes' => ['buildings', 'homes'],
'hopper' => ['animals', 'bugs'],
'house' => ['buildings', 'homes'],
'icon' => ['computer', 'icons'],
'icons' => ['computer', 'icons'],
'imac' => ['computer'],
'images' => ['logos', 'OpenClipArtLibrary'],
'insect' => ['animals', 'bugs'],
'insects' => ['animals', 'bugs'],
'interface' => ['computer', 'icons'],
'kawai' => ['animals'], # Yep, we'll need authority control.
'ladybug' => ['animals', 'bugs'],
'laptop' => ['computer'],
'library' => ['buildings'],
'linux' => ['logos', 'linux'],
'logo' => ['logos'],
'logos' => ['logos', 'linux'], # Umm, that's what's there, currently.
'mammal' => ['animals', 'mammals'],
'mammals' => ['animals', 'mammals'],
'man' => ['people'],
'map' => ['geography'],
'maps' => ['geography'],
'map_symbols' => ['geography', 'map_symbols'],
'mascot' => ['logos', 'linux'],
'meat' => ['food'],
'milk' => ['food', 'beverages'],
'mime-types' => ['computer', 'icons', 'filetypes'],
'mousecursor' => ['computer'],
'mushroom' => ['food'],
'music' => ['recreation', 'music'],
'musicsym' => ['recreation', 'music'],
'navigation' => ['computer', 'icons'],
'nimbochromis' => ['animals', 'fish'],
'nicu' => ['logos', 'OpenClipArtLibrary'],
'no' => ['transportation', 'roadsigns'],
'not' => ['transportation', 'roadsigns'],
'note' => ['recreation', 'music'],
'ocal_logo' => ['logos', 'OpenClipArtLibrary'],
'oceania' => ['geography'],
'office' => ['office'],
'one' => ['transportation', 'roadsigns'],
'openclipart' => ['logos', 'OpenClipArtLibrary'],
'pear' => ['food', 'fruit'],
'penguin' => ['animals'],
'people' => ['people'],
'pig' => ['animals'],
'plane' => ['transportation', 'vehicles'],
'plant' => ['plants'],
'roadsign' => ['transportation', 'roadsigns'],
'scotland' => ['geography'],
'shape' => ['shapes'],
'shapes' => ['shapes'],
'sign' => ['signs_and_symbols'],
'signs' => ['geography', 'map_symbols'], # Not sure if this is right?
'soccer' => ['recreation'],
'soup' => ['food'],
'sport' => ['recreation'],
'sports' => ['recreation'],
'star' => ['shapes'],
'steaming' => ['food'],
'stop' => ['transportation', 'roadsigns'],
'study' => ['education'],
'symbol' => ['signs_and_symbols'],
'tea' => ['food', 'beverages'],
'toy' => ['recreation', 'toys'],
'toys' => ['recreation', 'toys'],
'thanksgiving' => ['recreation', 'holiday', 'thanksgiving'],
'transport' => ['transportation', 'vehicles'],
'transportation' => ['transportation', 'vehicles'],
'tree' => ['plants', 'trees'],
'tropical' => ['animals', 'fish'],
'turn' => ['transportation', 'roadsigns'],
'tux' => ['logos', 'linux'],
'tuxx' => ['logos', 'linux'],
'u' => ['transportation', 'roadsigns'],
'usholiday' => ['recreation', 'holiday'],
'vegitable' => ['food', 'vegitables'],
'venustus' => ['animals', 'fish'],
'warning' => ['signs_and_symbols'],
'watch' => ['transportation', 'roadsigns'],
'way' => ['transportation', 'roadsigns'],
'wildlife' => ['animals'],
'wreath' => ['recreation', 'holiday', 'Christmas'],
'xmas' => ['recreation', 'holiday', 'Christmas'],
# ... and so on
);
use File::Spec::Functions;
if (not -e catfile($basedir, 'unsorted')) {
mkdir catfile($basedir, 'unsorted') or warn "Cannot create unsorted directory: $!\n";
}
opendir DIR, $basedir or die "Cannot opendir $basedir: $!";
@d = readdir DIR; close DIR;
for my $d (@d) {
print "=================== (Processing $d)\n" if $debug>1;
if (-d catdir($basedir, $d) && $d !~ /^\./) {
my $dest = catdir($basedir, 'unsorted');
if (defined $hierarchy{lc $d}) {
$dest = catdir($basedir, @{$hierarchy{lc $d}});
}
if (catdir($basedir,$d) ne $dest) {
print "[Dest: $dest]\n" if $debug >2;
`mkdir -p $dest`; # TODO: Should use File::Path::mkpath
opendir DIR, catfile($basedir, $d);
@f = readdir DIR;
close DIR;
for $f (@f) {
if ($f !~ m/^\./) {
print "[$f]" if $debug>3;
my $sourcepath = catfile($basedir, $d, $f);
if ($sourcepath eq catfile($dest, $f)) {
warn "No need to move $sourcepath" if $debug>5;
} else {
print "Moving $sourcepath to $dest\n";
if (not system('mv', '-f', $sourcepath, $dest)) {
if ($! =~ /same file/) {
unlink $sourcepath or warn "Cannot unlink $sourcepath: $!\n";
} else {
warn "Unsuccessful move from $sourcepath to $dest: $!\n";
}
}
}
} else {
warn "Not attempting to move $f\n" if $debug>3;
}
}
rmdir catfile($basedir, $d) or warn "Directory Not Removed: $d\n";
}
} else {
warn "Not processing: $d (not a directory)\n";
}
}
__END__
More information about the clipart
mailing list