[packagekit] Inviting you to project "PackageMap"

Sebastian Pipping webmaster at hartwork.org
Fri Jun 12 00:42:18 PDT 2009


Hello!


Quick (re-)introduction:  My task for Gentoo/Google Summer of Code 2009
is to give Gentoo a Debian popcon equivalent, a tool to collect
statistics on "what package is installed how often".  To achieve this
goal I'm extending Smolt (a tool currently doing similar things with
hardware information) by fine-tunable software stats gathering.


The plan we have for Smolt is to make it cross-distro, not just fit
Gentoo or Fedora.  One point where the consequences and benefits of such
an approach can be seen clearly is with

  counting packages from different distros into the same buckets.

What do I mean by that?  Debian's Git counts for Gentoo's Git counts for
Fedora's, you know the list.  With packages counted from accross distros
we can suddenly answer questions that we currently cannot answer, among them

 - What globally popular packages are missing in distro X?
   Let's say we don't have a package for product P.  Do other distros
   have one?  They do, maybe we need one, too?  They don't, maybe P is
   not that important then?

 - How many Linux users are approximately using program X in total?
   Not just on Ubuntu or Arch - all across Linux, BSD, Solaris!

 - Does distro X have 10 times the packages of Y or is it just
   different splitting?

To count into the same bucket we use global identifiers for the
"products" that fall out of a package.  Gentoo package "dev-util/git"
can produce product "cpe://a:git:git", Debian's "git-core" can, too.
That string before is a CPE URI [1], a concept close to package naming
in Java.  This "intermediate language" allows us to relate package names
from distro X with those of distro Y and answer various questions from
that data.

To do such mapping we need code (or a "service") that does the mapping
for us and base of collected data that the service can operate on.  Both
of these is project "PackageMap"

I have started populating the database with packages (currently 312
in number) made from information extracted from the Gentoo tree
and the National Vulnerability Database.  Latter holds many CPEs.
Let me state clearly that packagemap is not about Gentoo in particular.
Sure, the initial data has lots of Gentoo in it but the whole point of
the project is to get information and people from different distros
together.

To see what these 312 packages maps look like at the moment you best do
a few clicks through the database folder yourself:
http://git.goodpoint.de/?p=packagemap.git;a=tree;f=database

Also, there are Relax NG schema and DTD for validation, more
documentation than I usually write and a few scripts:
http://git.goodpoint.de/?p=packagemap.git;a=tree

  By now I hope you have gained interest in what this can become.
  Your active participation is highly appreciated.
  A few minutes from everyone can make a huge difference here.
  If you want write access to the repo - mail me: sebastian at pipping.org.

Please have a look at the Git repository linked above and ask questions.
I propose to keep the related Gentoo stuff on gentoo-dev and everything
else on the packagekit list.  I hope that works out well.

Thanks for reading up to this point.



Sebastian



PS: I'm aware "hartwork.org" might not make a good longterm location for
    DTDs, XML namespaces and such for a cross-distro project.  Any ideas
    where to put them best?

[1] http://cpe.mitre.org/





More information about the PackageKit mailing list