Trash specification, version 0.1

Wed Sep 1 17:24:13 EEST 2004

Hello all,
sorry for my silence the last 2 days, I was travelling back from aKademy 
(KDE conference and developer meeting).

On Monday 30 August 2004 13:47, Alexander Larsson wrote:
> On Mon, 2004-08-30 at 12:20, Dave Cridland wrote:
> > On Sun Aug 29 23:42:33 2004, Mikhail Ramendik wrote:
> > > The Trash specification, version 0.1, is available here:
> > > 
> > > http://www.ramendik.ru/docs/trashspec.html
> > 
> > Things I'd like to see:
> > 
> > A) One of:
> > 
> > 1) Expansion space in the info files. So extra lines in the info file 
> > MUST be ignored unless defined by a future specification.
> 
> Makes sense.

Yes (that's what my implementation does already, obviously)

> > 2) I'm a little concerned by just how simple the info file format is. 
> > Although a more complex file format wouldn't help the first two 
> > lines, I'm more concerned by what might happen if we need to extend 
> > this format. (No, I can't think of a reason why we'd need to 
> > *yet*...) Given we have a standardish file format already, what's the 
> > problem with using  something similar to the Desktop file format? 
> > Agreed, there'd be a slight performance hit when reading or writing 
> > the info files, but it gains us extensibility easily.
> 
> An interesting issue is filenames with newlines in them.

Hmm :/
OK how about we make it
[Desktop Entry]
Path=/foo/bar/doc.txt
DeletionDate=....

Then we benefit from the .desktop format escaping, and we can easily add 
other information into the file, e.g. for caching purposes.

> > C) What gets written first? The info file or the actual trash file? 
> > (I think the trash file, since that may simply be renamed into 
> > position. Then the info file, so that if we run out of disk space, we 
> > can gracefully continue. Some operating systems apparently flag a 
> > device full error as a fatal error while deleting files. This is 
> > embarrassingly stupid.)
> 
> Not only does does one has to be written first. We need to protect from
> races by doing an atomic create with the O_EXCL on the file in info
> first, before writing anything in files.

Yes. The reason why I chose that the info file must be created first,
is that there's only one way to create it, in all cases, whereas the
trashed file can be created in multiple ways (rename(), or copying data
across partitions if the implementation wishes to support that, in which
case it can be a new file, a new directory, or a symlink). It's much simpler
to define that the critical part is to create the info file, which basically
acts as a lockfile that must be acquired before trashing the data itself.

> > D) Rather than ISO8601, can we use RFC3339 instead? It's the same 
> > thing, but the specification is simpler and cut-down. Moreover, I 
> > believe that ISO8601 isn't a freely available specification, is quite 
> > large, and thus there's a greater chance of misinterpretation. (Not 
> > that I think anyone is likely to write a date in there as 
> > "2004272T111027.22467Z" - I believe a valid ISO date time, but it's 
> > worth specifying that explicitly.)
> 
> Even rfc3339 is way to complicated than the YYYYMMMDD:HHMM or whatever
> that was initially mentioned. All this complication is way unnecessary,
> since this is all machine parsed anyway. I see little or no reason for
> not just using an epoch number for the time, anything else will just
> result in thousands of lines of wasted parsing code to convert the
> string date to the internal representation which is likely an epoch. 

I'm surprised by the discussion on this, I thought the code for ISO8601 dates
was pretty common - well, I'm spoiled by Qt's QDate, which offers support for it.
Well how about a compromise: YYYYY-MM-DDThh:mm:ssZ where Z is the
literal character, i.e. this is always in UTC. (It is conform to ISO8601
but if we specify it explicitely then nobody needs actual reading of ISO8601).
This is exactly what we did in the Kolab-2 XML format specification, which is
now implemented on both Windows and Unix.
If doing that, I don't see any point in writing out the datetime in the local 
timezone here. If any software wants to *present* this data to the user, it 
will convert the date to local timezone.

Epoch numbers will expire in 2030 if we're still using 32 bit code by then,
so I'm not too keen on them. OTOH it's consistent with stat(2), time_t etc.,
so why not.
After reading all the scenarii where people talk about accessing the same
filesystem from another timezone, I wonder: how do filesystems deal with
the issue? When changing the timezone on your laptop, and the other case
is: when using NFS from a machine in the other continent?
Does it store UTC or local-timezone times, and if the latter, is the
TZ explicitely stored too? If not (the good old broken context-dependent
solutions) then I don't see the point in solving a problem at the trash level
that isn't solved at the filesystem level.

But people pointed out "use from command-line tools" and "readability"
in the thread, and the ISO-format-subset is obviously *much* more
readable than a big number of seconds.

> > E) What happens when you delete a file /home/me/foo, then create a 
> > new one, then undelete the first?
> 
> Undelete fails, in implementation-defined ways. (I.E. this is outside
> the scope of the spec).

In the case of KDE, a "do you want to overwrite this file?" dialog will appear.
I believe this effectively puts an end to any "trojan device / whether to use full paths" 
issue. Undeleting should never silently overwrite a file or directory.

I think saying it "fails" is a bit restrictive. If the user agrees to overwrite the target,
then the undeletion will effectively succeed. But I'm OK with this being left as
"implementation-defined", although an implementation that would silently
overwrite the target wouldn't be very secure IMHO.

> > F) Some suggestions for deriving a filename which is unlikely, or 
> > impossible, to clash would be nice. (Or at least some suggestion that 
> > this is needed.)
> 
> Yes. What was mentioned in the thread was to first use the real
> filename, then use whatever duplicating system used already on the
> desktop system in question (such as adding " (copy $n)", or ".$n" to the
> string.

I don't see the need for specifying this at all, as a matter of case.
If I wanted to use 1, 2, 3 etc. for the "file ids", everything would still work
fine - it would simply be a bit more confusing for people using command-line tools. 

-- 
David Faure, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).