libxdg

Tue Jul 3 10:03:01 PDT 2012

Hi,

Sorry for delay - I've been busy a bit...

> This part is quite confusing:
>
> "What are the Global AVL trees mentioned above?  [...]
>
> @note Actually, this problem is solved a bit different way. There is no
Global
> AVL trees [...]"
>
> Please describe the actual state of things, not what could have been done
and
> was not done ;)

When I've read this docs after a while I was amazed - it's just a stream of
mind :)
That was definitely not my day :)
I've updated it to be more or less understandable (
http://vilkov.github.com/libxdg).

> Because of this confusing documentation, I'm still not sure how one is
> supposed to see a "global tree" when using the caches directly (rather
than
> when using the library API). What does "fixing the references" mean?
>
> This is about the case where a local mimeapps.list refers to a globally-
> installed .desktop file, right ?

Right. The thing is when we have several directories, ".list" files from
one directory could have
references to ".desktop" files from another directory (usually this is the
case). It means that
after we loaded a cache file or after we parsed some ".desktop" file, we
must check that ".list"
files from another directories had the correct references (pointers) to
just loaded data.

> There will never be so many mimetypes. On a modern linux system there are
> about 400 mimetypes in application, and 660 overall.
> In any case, cutting the mimetypes in two, and performing two AVL-tree
> lookups, doesn't sound faster than performing one AVL-tree lookup with the
> full mimetype name... But anyway. I don't have time to experiment with
that,
> and neither do you apparently, so let's leave it as such.
> But let's not use "thousands or tens of thousands" as an argument, it's
not a
> valid argument :)

Agree :)

> > > > Each value of this tree is an AVL tree of sub types (e.g. html),
which
> > > > contains a list of pointers to XdgApp structures.
> > >
> > > What does this structure contain? The full contents of the desktop
file,
> > > again?
> > > Or do I misunderstand that?
> > > Surely there's no need to duplicate the contents again, for each
mimetype
> > > associated with the application. Wouldn't it be enough to write in
this
> > > tree,
> > > the name of the desktop file, in order to then look it up in the other
> > > tree if one wants to get more details about it?
> > > Or is this space-optimized anyway, by pointing to the same data in the
> > > on-disk cache?
>
> These questions remain.

It's just association of a mime type with ".desktop" files. In
turn, ".desktop" file
can be loaded/parsed dynamically or can be a part of a binary cache file.

> > > With a tree for the associations coming from desktop files, and
another
> > > tree
> > > for the contents of defaults.list/mimeapps.list, one has to look up
into
> > > multiple trees to be able to answer that question. Wouldn't it be
faster,
> > > and
> > > simpler (higher level) to have a single tree, combining these two?
> > > I.e. a simple tree with
> > >
> > >  key = mimetype -> value = list of desktop files
> > >
> > > ("merging" .list files into the desktop files) would give an immediate
> > > result,
> > > compared to three lookups (initial list, then added associations, then
> > > removed
> > > associations), all this multiplied by the number of paths in
XDG_DATA_DIRS
> > > plus one local dir, of course.
> >
> > Everything just the way you describe it - the library returns merged
> > results from
> > all XDG_DATA_DIRS directories and local directory too.
> > When one asks the library
> > about, e.g. default associations the library returns a list of apps from
> > all ".list" files
> > (system-wide) there is no initial list and there is no need (and
actually
> > possibility too)
> > to scan XDG_DATA_DIRS. All this implementation details (like
XDG_DATA_DIRS
> > and
> > local dir) encapsulated in the library.
>
> Yes I know that the library does all this. My question is: could we make
this
> more optimized and reuseable, by having this logic at the time of
building up
> the binary cache, rather than at the time of reading into the cache?
> Otherwise we're really not gaining much by having this cache, if it only
has
> raw unprocessed copy of the file contents.
>
> The binary cache only makes sense if it contains data in 'ready to use'
form
> (as much as possible).

There is no extra operations on cache - not during loading, nor using.
Cache is a number of AVL trees mmap'ed into memory. So, there is only two
necessary
operations after cache is mmap'ed:
 - updating of pointers in mmap'ed memory (to allow AVL tree to travers
through its nodes);
 - updating of pointers in other [cache files/parsed ".list" files] (to
allow ".list" files from other
directories to refer to the actual data).

The only way it could be more optimaized is to have only one cache file for
whole
system.

> > > Do you think this could be done? Or am I overlooking some difficulty
here?
> >
> > It could easily be done by adding one more function which will group the
> > results from
> > different groups ("Default Associations", "Added associations", etc).
> > By the way, this functionality is implemented in example in online
> > documentation.
>
> I'm not asking for more functions, but for a different layout in the
on-disk
> cache :-)
>
> But yes, this goes together. The most common use case for this library
should
> require a single function (plus iteration loop) rather than three (plus
> iteration loop), and the on-disk cache should be optimized for this use
case.
>
> If the user-preferences-handling code needs to explicitely access "added"
and
> "removed" lists, it can just use the mimeapps.list files, as it does
already.
> So if you want to keep this API in the library, it could just read these
files
> directly. Or don't provide it.
> But the goal of the on-disk cache is really to pre-process the data in
order
> to make the "which app(s) handle this mimetype" lookup as fast as
possible,
> and as simple as possible for developers.

Yes, it definitely is. I will merge trees representing associations of mime
types with
"added", "default", "all other" and "removed" lists in the next release
(within couple
weeks).

--------------------------------
Best regards, Dmitriy.

2012/6/14 David Faure <faure at kde.org>

> On Monday 28 May 2012 02:23:49 DAV wrote:
> > Hi, all!
> >
> > I finally did it :) I have update libxdg documentation:
> >  - here is the small Wiki with basic info about the library:
> > https://github.com/vilkov/libxdg/wiki
> >  - here is updated online Doxygen documentation:
> > http://vilkov.github.com/libxdg
> >  - and code itself: https://github.com/vilkov/libxdg
> >
> > Could you possibly take a look at the documentation please, especially if
> > you are Stake Holder, like e.g. David :)
>
> This part is quite confusing:
>
> "What are the Global AVL trees mentioned above?  [...]
>
> @note Actually, this problem is solved a bit different way. There is no
> Global
> AVL trees [...]"
>
> Please describe the actual state of things, not what could have been done
> and
> was not done ;)
>
> Because of this confusing documentation, I'm still not sure how one is
> supposed to see a "global tree" when using the caches directly (rather than
> when using the library API). What does "fixing the references" mean?
>
> This is about the case where a local mimeapps.list refers to a globally-
> installed .desktop file, right ?
>
> > > > * List of XdgFileWatcher structures
> > It's just like you said - there is no any file watching facility. This
> > structures is used just
> > for checking validity of cache files.
>
> OK, I see. I got confused by the naming because of the resemblance with
> QFileSystemWatcher (FAM/inotify stuff).
>
> > > "text" is not the name of a mime type. "text/html" is. Is there any
> > > reason for
> > > splitting text/html into two leaves in the tree? Is this for supporting
> > > mimetypes like text/* or image/*? Hmm, why not. But otherwise it's a
> bit
> > > pointless and confusing.
> >
> > Hmm... It could be so, but the reason of this is performance. When we
> have
> > only tens of mime types it won't be a problem, but when we are speaking
> > about
> > thousands or even tens of thousands for each mime group/sub type
> >  (i.e. application/ text/ etc) it will be a serious problem.
>
> There will never be so many mimetypes. On a modern linux system there are
> about 400 mimetypes in application, and 660 overall.
> In any case, cutting the mimetypes in two, and performing two AVL-tree
> lookups, doesn't sound faster than performing one AVL-tree lookup with the
> full mimetype name... But anyway. I don't have time to experiment with
> that,
> and neither do you apparently, so let's leave it as such.
> But let's not use "thousands or tens of thousands" as an argument, it's
> not a
> valid argument :)
>
> > > > Each value of this tree is an AVL tree of sub types (e.g. html),
> which
> > > > contains a list of pointers to XdgApp structures.
> > >
> > > What does this structure contain? The full contents of the desktop
> file,
> > > again?
> > > Or do I misunderstand that?
> > > Surely there's no need to duplicate the contents again, for each
> mimetype
> > > associated with the application. Wouldn't it be enough to write in this
> > > tree,
> > > the name of the desktop file, in order to then look it up in the other
> > > tree if one wants to get more details about it?
> > > Or is this space-optimized anyway, by pointing to the same data in the
> > > on-disk cache?
>
> These questions remain.
>
> > Well as I mentioned before old documentation was a bit outdated and, lets
> > say, didn't describe whole things clearly enough... I hope new one will.
>
> Well you changed "structures" into "items", but that doesn't really answer
> my
> questions :-)
>
> Does the second tree point into the first one, or does it have its own
> items?
>
> > > With a tree for the associations coming from desktop files, and another
> > > tree
> > > for the contents of defaults.list/mimeapps.list, one has to look up
> into
> > > multiple trees to be able to answer that question. Wouldn't it be
> faster,
> > > and
> > > simpler (higher level) to have a single tree, combining these two?
> > > I.e. a simple tree with
> > >
> > >  key = mimetype -> value = list of desktop files
> > >
> > > ("merging" .list files into the desktop files) would give an immediate
> > > result,
> > > compared to three lookups (initial list, then added associations, then
> > > removed
> > > associations), all this multiplied by the number of paths in
> XDG_DATA_DIRS
> > > plus one local dir, of course.
> >
> > Everything just the way you describe it - the library returns merged
> > results from
> > all XDG_DATA_DIRS directories and local directory too.
> > When one asks the library
> > about, e.g. default associations the library returns a list of apps from
> > all ".list" files
> > (system-wide) there is no initial list and there is no need (and actually
> > possibility too)
> > to scan XDG_DATA_DIRS. All this implementation details (like
> XDG_DATA_DIRS
> > and
> > local dir) encapsulated in the library.
>
> Yes I know that the library does all this. My question is: could we make
> this
> more optimized and reuseable, by having this logic at the time of building
> up
> the binary cache, rather than at the time of reading into the cache?
> Otherwise we're really not gaining much by having this cache, if it only
> has
> raw unprocessed copy of the file contents.
>
> The binary cache only makes sense if it contains data in 'ready to use'
> form
> (as much as possible).
>
> > > Do you think this could be done? Or am I overlooking some difficulty
> here?
> >
> > It could easily be done by adding one more function which will group the
> > results from
> > different groups ("Default Associations", "Added associations", etc).
> > By the way, this functionality is implemented in example in online
> > documentation.
>
> I'm not asking for more functions, but for a different layout in the
> on-disk
> cache :-)
>
> But yes, this goes together. The most common use case for this library
> should
> require a single function (plus iteration loop) rather than three (plus
> iteration loop), and the on-disk cache should be optimized for this use
> case.
>
> If the user-preferences-handling code needs to explicitely access "added"
> and
> "removed" lists, it can just use the mimeapps.list files, as it does
> already.
> So if you want to keep this API in the library, it could just read these
> files
> directly. Or don't provide it.
> But the goal of the on-disk cache is really to pre-process the data in
> order
> to make the "which app(s) handle this mimetype" lookup as fast as possible,
> and as simple as possible for developers.
>
> --
> David Faure, faure at kde.org, http://www.davidfaure.fr
> Sponsored by Nokia to work on KDE, incl. KDE Frameworks 5
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/xdg/attachments/20120703/072b766b/attachment-0001.html>