My problems with the thumb spec

Sun May 18 19:07:01 EEST 2003

As to Kai's post, check out my thumbnail spec. It's available now at:

http://www.mosfet.org/prothumbnails

A technical comparison between it and other thumbnail mechanisms, including 
the FreeDesktop proposed ones is here:

http://www.mosfet.org/prothumbnails/history.html

On Sunday 18 May 2003 08:01 am, Thomas Leonard wrote:
> On Sun, May 18, 2003 at 02:45:32AM +0200, Kai Wetzel wrote:
> [...]
>
> > ...sorry for joining the discussion so late. I've re-read some of the
> > earlier threads on the TMS in my mailbox and I'll comment on some
> > of those suggestions but I'll focus on the latest debate.
>
> [...]
>
> > #6 UNIX mtime vs. MTime attribute in PNG thumbnail file
> >
> >    This decision should be explained more clearly in the standard
> >    or else it will come up again and again. The reason given
> >    is that basically a younger mtime of an invalid thumbnail could
> >    result if the image is replaced with an older one,I think. (Hmm)
>
> I don't think Mosfet was suggesting that. He wanted to set the mtime of
> the thumbnail to that of the source, not the thumbnail's creation time.
> Both approaches are equivalent. Mosfet's is slightly faster in an unusal
> case; the current one is more consistant with the way the size is stored.
>

Calling Unix stat, which you are most likely doing anyways, is much faster 
than opening a PNG header and allows you to check if thumbnails are obselete 
before opening the thumbnail. As you stated both mechanisms are functionally 
equivalent - one is faster, requires no overhead over what people are already 
doing, and is standard Unix. The other is not ;-)

Besides, if you looked at the libpng spec it already can store modification 
timestamps in the standard png_time structure. The timestamp as PNG text 
makes no sense whatsoever. It avoids two standard procedures that can 
accomplish the same thing: Unix stat and libpng.

> > #6 One large folder with MD5 for each file vs. MD5 directories &
> >    individual files without MD5
> >
> >    Unfortunately, few _facts_ have been presented about this issue.
> >    The case about one large directory has been backed off with readings
> >    for NFS but I don't understand why it's actually faster (must be
> >    some weird NFS implemetation detail). A few points:
>
> Note that my arguments against the original plan do not apply to Mosfet's
> proposal. I argued that splitting a logically unified directory (full of
> MD5 sums) arbitrarily into subdirectories was an ugly premature
> optimisation; something the kernel should, and probably would, do
> automatically and better.
>

Using one large directory is not an optimization, and it most certainly is 
ugly. I think most people would rather deal with 1 MD5 calculated path when 
entering a folder than having to calculate a few hundred or thousand MD5 sums 
for each file. What little, (if any), speed you gain by throwing everything 
in one folder is going to be thrown away both by having to recalculate each 
file path in MD5 and search through a huge inode list. Not to mention having 
one folder full of hex filenames ain't the most user-friendly way of doing 
things ;-) I already have written utilities such as "pwdthumb", which prints 
out the path that thumbnails are stored that users can use to manually manage 
their thumbnails. This is impossible with the proposed FreeDesktop spec.

> Mosfet's proposal organises the subdirectories by their sources. When
> showing thumbnails for a directory under the original plan, the thumbnails
> would be split over all the subdirectories, forcing lots of seeking.
> When subdirectories are the MD5 sums of the source directories (not the
> first two characters of the MD5 of the whole path, as before) all the
> requests come from a single directory, which may indeed give better
> performance. It also makes more sense for a user if they decide to browse
> the thumbnails directory.
>
> Possible arguments against:
>
> - It involves yet another change to the spec, just when it was starting to
>   look stable.
>
> - When thumbnailing remote resources, you could end up with a huge number
>   of subdirectories.
>

One for each remote hostname. Not bad.

> - When thumbnailing files on filesystems with long filenames, the
>   filesystem hosting the cache must support them too.
>

So no DOS or VAX support ;-)

> None of these arguments bother me particularly... it's too late for the
> next stable release of ROX-Filer to support a new system, but both could
> coexist, while wasting some space. It would be really good to declare the
> spec as stable at some point, though!