Crash reporting for TDF builds

Sun May 29 11:11:29 UTC 2016

Hey,

so I managed to land the crash reporting that I presented during my FOSDEM
talk finally in time for the 5.2 release. The code is already integrated
and available for 5.2.0.0.beta1. Below are some information for all
developers on how to make use of the crash reporting in the code and how to
use the reported crashes.

Please note that you should not upload crashes from your own builds as
there are no corresponding debug symbols on the server. Please only use it
for official TDF builds.

* How does it work?

We use the breakpad library which registers signal handlers for a few
signal that signal crashes and related problems (on Windows SEH exceptions).
During the start of Libreoffice we generate a file in the user profile
called dump.ini with key value pairs of information (the URL of the server,
the version, the product name). During the runtime of LibreOffice we may
collect additional key value pairs, e.g. currently the used OpenGL device
and driver if we use OpenGL. These information are all added to the
dump.ini.
 If we now hit a crash the crash handler will write a minidump of the
current stack (a windows debug format) and add the path to the minidump
file to the dump.ini. Nothing more is done from the crash reporting code in
the crashing process (well you should not do more than really necessary in
a process that might be corrupted).
During the next start-up we check if a dump.ini exists and if it contains a
path to a minidump (if there is no such path we just ignore it), and if it
exists ask the user if he wants to upload it.
Currently there is no feedback for the user with the ID or an URL for the
uploaded crash.
The processing of the crash to a human readable stack trace happens
completely on the server with the symbols generated during the build. To
generate the symbols we have now two additional make targets (make symbols
and make upload-symbols). Note that you can not use them to upload your own
symbols as that is restricted to registered users.

* How can a developer make use of the crash reporting to provide more
information?

I mentioned already that we already upload additional information with the
crash report like the used OpenGL device and driver if we use them. This is
done through the CrashReporter::AddKeyValue method from
include/desktop/crashreport.hxx
The idea is that you can add important information as key value pairs and
they will be available together with the crash report on the server. Some
information that area already part of the minidump are the OS, the CPU, the
loaded modules. Keep in mind that you should only collect the information
that are really necessary.

Some information that we might want to select in the future:

OpenCL info if OpenCL is use
did we crash inside of our OpenGL code
loaded extensions

The plan is to improve the dialog asking the user to upload the crash with
the information that we will include.

* How can a developer use the uploaded crashes to fix bugs?

The server side of our crash reporting is at
http://crashreport.libreoffice.org/stats/ (yes the main page is nearly
unusable right now and incredibly ugly). But more importantly are pages
like:

http://crashreport.libreoffice.org/stats/version/5.2.0.0
http://crashreport.libreoffice.org/stats/signature/sclo.dll+0x64396a
http://crashreport.libreoffice.org/stats/crash_details/4a266638-29a6-41a9-bcec-cfccaffb1a6c

Taking the last one as an example we have on the top all the meta
information that are part of the minidump, below that the stacktrace of the
crashing thread (the symbols for sclo.dll are missing because I rebuilt the
library after generating the symbol information) and at the bottom a link
to open the stack traces for all the other threads.
On the top you have 4 tabs for different information: the main details
page, the metadata (which contains all the key-value pairs (empty in that
case), the loaded modules and finally the raw export of the minidump.

Hopefully that provides enough information for you to understand the crash.
In the future I also want to add a way to link crashes to bugzilla.

* Open Items

As can be seen we have a basic working concept but this is for now just a
little bit more than a working prototype. We still need to make several big
changes on client and server side. The following come to my mind:

** Client

better feedback for the user
collect all relevant and important information
clean-up my code
integrate it into the OSX build

** Server (code at https://github.com/mmohrhard/crash )

more beautiful and somewhat useable pages and UX
automated a few workflows (currently the processing is async and needs to
be done manually)
provide connection to bugzilla (see issue#9)
integrate windows library symbols (see issue#14)

many more smaller tasks, see for example other issues and TODO comments in
the code

If you want to help out please talk to me. I'm especially looking for
people with experience in web development around javascript, css and django.

Regards,
Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20160529/58a8da6d/attachment.html>