[Clipart] Fix for file extension issue

Bryce Harrington bryce at bryceharrington.com
Fri Jul 2 12:24:16 PDT 2004


On Fri, 2 Jul 2004, Jonadab the Unsightly One wrote:
> If you require the extension to have no periods in it, the most
> significant loss is the original extension for anything gzipped;
> though there are other cases as well, that's the one that will matter
> most.  Of course the things most frequently gzipped are tarballs.
> Maybe .tar.gz should be special-cased?

Yup, I have .tar.gz special cased.

> Of course then some clown will
> gzip some other kind of file or use some other compressor with its own
> secondary extension (e.g., bzip2).  If we wanted to be fancy we could
> attempt to list common secondary extensions (gz, bz2, Z, ...) and keep
> the preceding extension if the primary one is any of those.  That
> might be more trouble than we really need to go to though.

Yeah, I'm figuring that each release I'll look and see what's been
uploaded, and if I spot something not already covered, I'll update the
script to account for it.

Oh, btw, there is one zip file that was uploaded in PKZip 2 format,
which Linux's unzip doesn't grok.  If anyone happens to have a Windows
box handy, would they mind repackaging it as a tgz or a zip v.1 file and
reuploading it?  The file is Backdrops.zip in the submission area.
 
> I was already thinking about this issue, for the upload script, and
> the conclusion I came to is that I can mostly avoid it, by not parsing
> filenames.  I'm assigning a new filename based on the title metadatum
> (as specified at upload time, or from the embedded metadata if one is
> not specified at upload time) and ultimately will be deciding the
> extension based on the user's choice from the filetype dropdown list.
>
> This leaves the case where the user inadvertently picks a filetype
> inconsistent with the actual file uploaded, in which case we could get
> a PNG image saved as foo_01.svg or somesuch, but no system is entirely
> foolproof, because there's always a better fool out there somewhere.

By the way, for docsys I ended up switching to use of mimetype instead
of only relying on file extensions:

=head2 mimetype($filename)

Returns the mimetype for a given file.

=cut
sub file_mimetype {
    my $self = shift;
    my $filename = shift;

    # TODO:  Investigate use of PApp::MimeType
    #   http://theoryx5.uwinnipeg.ca/CPAN/data/PApp/PApp/MimeType.html
    my $mimetype = `file -bi $filename`;
    chomp $mimetype;

    return $mimetype;
}

I'd found that file extensions weren't reliable enough in all situations
(ran into some of the better fools you mentioned), and that the mimetype
was a bit more reliable.

Bryce

 




More information about the clipart mailing list