thumbnail-spec: proposing nested thumbnail cache directories

Alexander Larsson alexl at redhat.com
Thu Aug 30 04:30:59 PDT 2007


On Thu, 2007-08-30 at 12:06 +0200, Christian Neumair wrote:
> Dear xdg list,
> 
> Over at GNOME land we have the problem that due to the huge size of the
> thumbnail directories, refreshing in-memory thumbnail readdir() caches
> takes very long.
> 
> The caches are required as thumbnails are looked up for *every* file one
> opens with the file manager, so the cumulative performance impact is
> significant.
> 
> http://blogs.gnome.org/cneumair/2007/04/29/thumbnail-followup/
> http://blogs.gnome.org/cneumair/2007/08/29/thumbnail-followup-2/
> 
> Alexander Larsson proposed to reduce this delay massively by using
> nested directories - git does that as well:
> 
> The first two digits of the MD5 hash "4f831f...89" would be split apart,
> and used to create a subdirectory "4f". The thumbnail file corresponding
> to the specified example in this directory would then be named
> "831f...89.png". So the entire hash maps to the file "4f/831f...98.png"
> rather than "4f831f...98.png"
> 
> This definitly allows better cache-refreshing.

It is also much nicer on the filesystem, as it avoids huge directories
which filesystems tend to do poorly at. Furthermore, it means that if
you look for a particular thumbnail which doesn't exist we have to scan
a lot less on the dist.

For example, Say you have 10000 thumbnails. If you stat for a thumbnail
md5 name that is not here then you have to scan through all the 10000
entries in the thumbnail directory. However, using a two-level approach
you generally only need to scan 256 + 10000/256 == 295 entries. Some
filesystems do better than the full scan (e.g. reiserfs and ext3 with
htrees), but its still better to do the hashing manually.

This is nothing new, usenet news spool has been doing this for decades,
and git does it with its sha1 hash files. 

> For backwards-compatibility, it may be a good idea to store this in a
> new location, maybe ~/.thumbnails/hashed-failed and
> ~/.thumbnails/hashed-normal rather than ~/.thumbnails/normal and
> ~/.thumbnails/failed.

Yeah, then apps can just switch over as they want and it'll keep
working. There will be some double storage for a while, but eventually
that will be fixed.

Anyone against this?



More information about the xdg mailing list