[Clipart] Fix for file extension issue
Bryce Harrington
bryce at bryceharrington.com
Fri Jul 2 12:24:16 PDT 2004
On Fri, 2 Jul 2004, Jonadab the Unsightly One wrote:
> If you require the extension to have no periods in it, the most
> significant loss is the original extension for anything gzipped;
> though there are other cases as well, that's the one that will matter
> most. Of course the things most frequently gzipped are tarballs.
> Maybe .tar.gz should be special-cased?
Yup, I have .tar.gz special cased.
> Of course then some clown will
> gzip some other kind of file or use some other compressor with its own
> secondary extension (e.g., bzip2). If we wanted to be fancy we could
> attempt to list common secondary extensions (gz, bz2, Z, ...) and keep
> the preceding extension if the primary one is any of those. That
> might be more trouble than we really need to go to though.
Yeah, I'm figuring that each release I'll look and see what's been
uploaded, and if I spot something not already covered, I'll update the
script to account for it.
Oh, btw, there is one zip file that was uploaded in PKZip 2 format,
which Linux's unzip doesn't grok. If anyone happens to have a Windows
box handy, would they mind repackaging it as a tgz or a zip v.1 file and
reuploading it? The file is Backdrops.zip in the submission area.
> I was already thinking about this issue, for the upload script, and
> the conclusion I came to is that I can mostly avoid it, by not parsing
> filenames. I'm assigning a new filename based on the title metadatum
> (as specified at upload time, or from the embedded metadata if one is
> not specified at upload time) and ultimately will be deciding the
> extension based on the user's choice from the filetype dropdown list.
>
> This leaves the case where the user inadvertently picks a filetype
> inconsistent with the actual file uploaded, in which case we could get
> a PNG image saved as foo_01.svg or somesuch, but no system is entirely
> foolproof, because there's always a better fool out there somewhere.
By the way, for docsys I ended up switching to use of mimetype instead
of only relying on file extensions:
=head2 mimetype($filename)
Returns the mimetype for a given file.
=cut
sub file_mimetype {
my $self = shift;
my $filename = shift;
# TODO: Investigate use of PApp::MimeType
# http://theoryx5.uwinnipeg.ca/CPAN/data/PApp/PApp/MimeType.html
my $mimetype = `file -bi $filename`;
chomp $mimetype;
return $mimetype;
}
I'd found that file extensions weren't reliable enough in all situations
(ran into some of the better fools you mentioned), and that the mimetype
was a bit more reliable.
Bryce
More information about the clipart
mailing list