Trash spec: directory size cache

Mon Apr 15 01:07:40 PDT 2013

On sön, 2013-04-14 at 23:48 +0200, David Faure wrote:
> To implement a maximum size for the trash directory, one needs to check the 
> size every time a new item is being trashed. With the current spec, the only 
> solution is to do a recursive traversal, which is pretty expensive.
> To make this efficient, we need a cache.
> My initial idea of a global "total size" cache doesn't work well with older 
> implementations which don't update that value, so it gets out of date quickly.
> 
> Instead, Ryan Lortie and I came up with the following idea, which we would 
> like to standardize into the trash spec:
> 
> For files, we get the file from stat. For dirs, we use a cache:
> in every trash directory, a metadata file is created, with one entry per 
> directory (that was trashed by the user).
> That entry contains the total size in bytes of the directory, and the 
> modification time of the trashinfo file [*].
> 
> The metadata file uses desktop file syntax, where the key is the directory 
> name, and the value is a pair: size, and mtime.
> 
> However the desktop file standard restricts the available characters for keys, 
> so instead of just writing out the directory name, we write the sha1 of the 
> directory name (a bit like the thumbnail spec uses sha1s too).
> 
> In summary, it would look like this:
> 
> [Directories]
> # One entry per sub-directory of the "files" directory
> # key = sha1 of the directory name
> #  value = size in bytes, timestamp of the trashinfo file, in UTC
> cb58e5c11a6802db43fd82ca8d3c7393353c0eab=25383,2009-07-11T20:18:30
> f1d2d2f924e986ac86fdf7b36c94bcdf32beec15=2315,2012-04-12T10:05:20

In general this sounds good to me. I have two minor objections:

1: Using sha1 seems wrong to me. There is no need to get an even
distribution of the keys (like for thumbnail subdirectories), and a sha1
is slow to calculate. Also, if you ever look at the file manually its
says very little. I would much prefer simple character escape model, say
you allow A-Za-z0-9 and everyting else you escape as "-" + the hex
digits (like "-2d" for "-"). This is valid desktop file keys, are cheap
to calculate and makes most files readable by humans.

2: Don't store the mtime in a format that needs parsing. Time and date
parsing is a very complicated area that is easy to get wrong. And the
source is always a stat which is in epoch format, why not just save it
in the same format to avoid any day/month order issues, timezone
weirdnesses or whatnot.