Trash specification, version 0.1

Thu Sep 2 10:28:29 EEST 2004

On Wed, 2004-09-01 at 16:24 +0200, David Faure wrote:
> Hello all,
> sorry for my silence the last 2 days, I was travelling back from aKademy 
> (KDE conference and developer meeting).

No problems. There really is no hurry.

> > > 2) I'm a little concerned by just how simple the info file format is. 
> > > Although a more complex file format wouldn't help the first two 
> > > lines, I'm more concerned by what might happen if we need to extend 
> > > this format. (No, I can't think of a reason why we'd need to 
> > > *yet*...) Given we have a standardish file format already, what's the 
> > > problem with using  something similar to the Desktop file format? 
> > > Agreed, there'd be a slight performance hit when reading or writing 
> > > the info files, but it gains us extensibility easily.
> > 
> > An interesting issue is filenames with newlines in them.
> 
> Hmm :/
> OK how about we make it
> [Desktop Entry]
> Path=/foo/bar/doc.txt
> DeletionDate=....
> 
> Then we benefit from the .desktop format escaping, and we can easily add 
> other information into the file, e.g. for caching purposes.

Yes. One less file-type parser in the desktop is good. One less place to
define escaping rules etc. I'm totally for this.

> > > C) What gets written first? The info file or the actual trash file? 
> > > (I think the trash file, since that may simply be renamed into 
> > > position. Then the info file, so that if we run out of disk space, we 
> > > can gracefully continue. Some operating systems apparently flag a 
> > > device full error as a fatal error while deleting files. This is 
> > > embarrassingly stupid.)
> > 
> > Not only does does one has to be written first. We need to protect from
> > races by doing an atomic create with the O_EXCL on the file in info
> > first, before writing anything in files.
> 
> Yes. The reason why I chose that the info file must be created first,
> is that there's only one way to create it, in all cases, whereas the
> trashed file can be created in multiple ways (rename(), or copying data
> across partitions if the implementation wishes to support that, in which
> case it can be a new file, a new directory, or a symlink). It's much simpler
> to define that the critical part is to create the info file, which basically
> acts as a lockfile that must be acquired before trashing the data itself.

Makes total sense to me.

> > Even rfc3339 is way to complicated than the YYYYMMMDD:HHMM or whatever
> > that was initially mentioned. All this complication is way unnecessary,
> > since this is all machine parsed anyway. I see little or no reason for
> > not just using an epoch number for the time, anything else will just
> > result in thousands of lines of wasted parsing code to convert the
> > string date to the internal representation which is likely an epoch. 
> 
> I'm surprised by the discussion on this, I thought the code for ISO8601 dates
> was pretty common - well, I'm spoiled by Qt's QDate, which offers support for it.
> Well how about a compromise: YYYYY-MM-DDThh:mm:ssZ where Z is the
> literal character, i.e. this is always in UTC. (It is conform to ISO8601
> but if we specify it explicitely then nobody needs actual reading of ISO8601).
> This is exactly what we did in the Kolab-2 XML format specification, which is
> now implemented on both Windows and Unix.
> If doing that, I don't see any point in writing out the datetime in the local 
> timezone here. If any software wants to *present* this data to the user, it 
> will convert the date to local timezone.
> 
> Epoch numbers will expire in 2030 if we're still using 32 bit code by then,
> so I'm not too keen on them. OTOH it's consistent with stat(2), time_t etc.,
> so why not.
> After reading all the scenarii where people talk about accessing the same
> filesystem from another timezone, I wonder: how do filesystems deal with
> the issue? When changing the timezone on your laptop, and the other case
> is: when using NFS from a machine in the other continent?
> Does it store UTC or local-timezone times, and if the latter, is the
> TZ explicitely stored too? If not (the good old broken context-dependent
> solutions) then I don't see the point in solving a problem at the trash level
> that isn't solved at the filesystem level.
> 
> But people pointed out "use from command-line tools" and "readability"
> in the thread, and the ISO-format-subset is obviously *much* more
> readable than a big number of seconds.

I like the YYYYY-MM-DDThh:mm:ss approach. I'm not sure about the
timezone though. mtimes on e.g. nfs filesystems are of the local time of
the system that has the nfs server. Doing something different for the
trash time than mtime strikes me as a bit odd.

> > > E) What happens when you delete a file /home/me/foo, then create a 
> > > new one, then undelete the first?
> > 
> > Undelete fails, in implementation-defined ways. (I.E. this is outside
> > the scope of the spec).
> 
> In the case of KDE, a "do you want to overwrite this file?" dialog will appear.
> I believe this effectively puts an end to any "trojan device / whether to use full paths" 
> issue. Undeleting should never silently overwrite a file or directory.
> 
> I think saying it "fails" is a bit restrictive. If the user agrees to overwrite the target,
> then the undeletion will effectively succeed. But I'm OK with this being left as
> "implementation-defined", although an implementation that would silently
> overwrite the target wouldn't be very secure IMHO.

Sure. Of course any sane system would do that, but its not necessary to
specify. Implementation-defined is good.

> > > F) Some suggestions for deriving a filename which is unlikely, or 
> > > impossible, to clash would be nice. (Or at least some suggestion that 
> > > this is needed.)
> > 
> > Yes. What was mentioned in the thread was to first use the real
> > filename, then use whatever duplicating system used already on the
> > desktop system in question (such as adding " (copy $n)", or ".$n" to the
> > string.
> 
> I don't see the need for specifying this at all, as a matter of case.
> If I wanted to use 1, 2, 3 etc. for the "file ids", everything would still work
> fine - it would simply be a bit more confusing for people using command-line tools. 

I'd like to see a recommendation that the original filename is first
used though, as this helps console users.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
                   alexl at redhat.com    alla at lysator.liu.se 
He's a jaded voodoo paramedic on the run. She's a cold-hearted Bolivian safe 
cracker with only herself to blame. They fight crime!