Reducing the memory overhead of the mime system
Matthias Clasen
mclasen at redhat.com
Mon Mar 28 23:11:48 EEST 2005
As things currently are, the xdgmime implementation of the shared mime
info spec uses ~50k of heapspace, and does a fair bit of text-file
parsing to populate it with data that rarely changes, and is the same
for all clients. To make things worse, xdgmime is used by source code
sharing, thus gtk+ and gnome-vfs have their own copies. As a
consequence, applications using gtk+ and gnome-vfs (with the new
filechooser, this means basically all gtk+ applications) pay the 50k
price twice.
An easy and well-established way to avoid both the parsing and the
memory overhead is to use an mmappable cache file. Below is a proposal
for such a format, which closely follows the in-memory data structures
currently used by xdgmime. Patches to make update-mime-database generate
such a cache file and to make xdgmime use the cache files are in
bugzilla (
https://bugs.freedesktop.org/show_bug.cgi?id=2804
https://bugs.freedesktop.org/show_bug.cgi?id=2805
)
To make sense of the specification below, some acquaintance with the
xdgmime data structures will probably be required...
Matthias
Header:
2 CARD16 MAJOR_VERSION 1
2 CARD16 MINOR_VERSION 0
4 CARD32 ALIAS_LIST_OFFSET
4 CARD32 PARENT_LIST_OFFSET
4 CARD32 LITERAL_LIST_OFFSET
4 CARD32 SUFFIX_LIST_OFFSET
4 CARD32 GLOB_LIST_OFFSET
4 CARD32 MAGIC_LIST_OFFSET
AliasList:
4 CARD32 N_ALIASES
8*N_ALIASES AliasListEntry
AliasListEntry:
4 CARD32 ALIAS_OFFSET
4 CARD32 MIME_TYPE_OFFSET
ParentList:
4 CARD32 N_ENTRIES
8*N_ENTRIES ParentListEntry
ParentListEntry:
4 CARD32 MIME_TYPE_OFFSET
4 CARD32 PARENTS_OFFSET
Parents:
4 CARD32 N_PARENTS
4*N_PARENTS CARD32 MIME_TYPE_OFFSET
LiteralList:
4 CARD32 N_LITERALS
8*N_LITERALS LiteralEntry
LiteralEntry:
4 CARD32 LITERAL_OFFSET
4 CARD32 MIME_TYPE_OFFSET
GlobList:
4 CARD32 N_GLOBS
8*N_GLOBS GlobEntry
GlobEntry:
4 CARD32 GLOB_OFFSET
4 CARD32 MIME_TYPE_OFFSET
SuffixTree:
4 CARD32 N_ROOTS
4 CARD32 FIRST_ROOT_OFFSET
SuffixTreeNode:
4 CARD32 CHARACTER
4 CARD32 MIME_TYPE_OFFSET
4 CARD32 N_CHILDREN
4 CARD32 FIRST_CHILD_OFFSET
MagicList:
4 CARD32 N_MATCHES
4 CARD32 MAX_EXTENT
4 CARD32 FIRST_MATCH_OFFSET
Match:
4 CARD32 PRIORITY
4 CARD32 MIME_TYPE_OFFSET
4 CARD32 N_MATCHLETS
4 CARD32 FIRST_MATCHLET_OFFSET
Matchlet:
4 CARD32 RANGE_START
4 CARD32 RANGE_LENGTH
4 CARD32 WORD_SIZE
4 CARD32 VALUE_LENGTH
4 CARD32 VALUE
4 CARD32 MASK
4 CARD32 N_CHILDREN
4 CARD32 FIRST_CHILD_OFFSET
Notes:
* The list of aliases is sorted by alias, the list of
literal globs is sorted by the literal. The SuffixTreeNode
siblings are sorted by character.
* All offsets are in bytes from he beginning of the file
* Strings are zero-terminated
* All numbers are in network (big-endian) order. This is
necessary because the data will be stored in arch-independent
directories like /usr/share/mime or even in user's
home directories.
* Cache files have to be written atomically - write to a
temporary name, then move over the old file - so that
clients that have the old cache file open and mmap'ed
won't get corrupt data.
More information about the xdg
mailing list