Jonadab the Unsightly One
jonadab at bright.net
Tue Jul 20 20:14:25 PDT 2004
Nicu Buculei <nicu at apsro.com> writes:
> i still have one issue: don't like that the upload script renames
> the images uploaded,
I suspect we don't want to keep the filenames as they come from
CGI::Lite. (They have big long numbers in them.) It might be
possible to extract the original filenames with a regular expression,
but that would be brittle; changes in future versions of CGI::Lite
could potentially break it in unpredictable ways. There are also
other considerations; for example, users on some platforms don't use
filename extensions, but we definitely want them. Also, we want
to make sure the filenames are unique, so we don't write over extant
images. So I think it's necessary to change the filenames at least to
It is not strictly necessary to change them wholescale; if there were
an easy way of determining the original filename, we could start with
that instead of the title as a base, and make the necessary
adjustments. However, I don't think we want CGI::Lite's filenames.
I *could* replace CGI::Lite with some other mechanism for parsing the
script's input, such as a handrolled routine...
> this way it makes hard for me to keep the track of images on my
> computer and the repository.
It names the file based on the title from the embedded metadata (if
present, or the title specified at upload time otherwise). It
converts characters that don't belong in filenames (e.g., spaces) to
underscores, and it appends a number in order to guarantee uniqueness
so we don't have filename conflicts, but apart from that it pretty
much just names them based on the title metadatum.
> OTOH it solves an old security breach: anyone was able to overwrite
> any existing image. but how about versioning? if i want to upload a
> new version for an already submitted image?
I don't know of any way to automatically detect that this is a new
version of the old file and should replace the old file. (Among other
things, sometimes the second version is a variation, and both versions
should be kept; other times, the second version obsoletes the first.
This is something a human is going to have to look at.)
However, assuming the title hasn't changed, the filenames of the two
versions will be remarkably similar; only the number at the end should
be different. So it should be relatively easy for someone to come
along and look at the two files and make a decision about keeping the
first one or marking it obsolete. This is an issue we'll have to
You can think of the number on the end of the filename as a primitive
form of versioning, a la VMS (but without the semicolon and before the
extension). Except that if someone uploads a different image with the
same title, that also will have filenames from the same series. The
script isn't smart enough to know it's a completely different image.
For example, if you upload an image titled "Bowl of Weiner Dogs",
it'll be stored as bowl_of_weiner_dogs_01.svg. If you upload a second
version of it, it'll be bowl_of_weiner_dogs_02.svg. However, if
someone uploads a totally different image that just happens to have
the same title, it'll be bowl_of_weiner_dogs_03.svg even though it's
not a version of your image, because the script isn't smart enough to
know whether it's a version of the same image or not.
It would be very easy to modify the script to use the author as well
as the title in the filename, if that would help with this.
Committing the change into the CVS repository would take longer than
actually making the change. It would make for longer filenames,
though. I thought about it and just wasn't sure it was necessary, but
that's open to discussion. If the filenames did include the authors'
names, should the author's name come before or after the title?
Okay, so if someone does an SVG mimicking the painting The Persistence
of Weiner Dogs (Labrador Dali) and uploads it, what filename should
Currently we have this:
We _could_ do it one of these ways...
What's the best?
 Mac users don't always use extensions. Acorn/Archimedes users
never use them. There are probably others.
 Windows users need them, and Apache needs them also, in order
to know what Content-type to send to the browser.
 Apologies to Gary Larson.
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/
More information about the clipart