[systemd-devel] journalctl segfault in gcrypt code

Tue Oct 16 07:13:15 PDT 2012

On Tue, Oct 16, 2012 at 01:01:04AM +0200, Lennart Poettering wrote:
> On Mon, 15.10.12 23:02, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:
> 
> > 
> > On Mon, Oct 15, 2012 at 10:01:31PM +0200, Lennart Poettering wrote:
> > > On Sat, 13.10.12 17:59, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I'm having trouble debugging the problem below. Maybe somebody has an
> > > > idea...  When I run journalctl, on a specific (large) set of journal
> > > > logs, it segfaults. Always in the same place.
> > > 
> > > Hmm, I have not see this so far. Have you tried valgrind on this?
> > Yeah, but it didn't say anything useful.
> > 
> > ==32666== Invalid write of size 1
> > ==32666==    at 0x5076B12: md_close (md.c:771)
> > ==32666==    by 0x41282D: journal_file_close (journal-file.c:109)
> > ==32666==    by 0x4110FE: sd_journal_close (sd-journal.c:1620)
> > ==32666==    by 0x406DEA: main (journalctl.c:990)
> > ==32666==  Address 0x402d000 is not stack'd, malloc'd or (recently) free'd
> > ==32666== 
> > ==32666== 
> > ==32666== Process terminating with default action of signal 11 (SIGSEGV)
> > ==32666==  Access not within mapped region at address 0x402D000
> > ==32666==    at 0x5076B12: md_close (md.c:771)
> > ==32666==    by 0x41282D: journal_file_close (journal-file.c:109)
> > ==32666==    by 0x4110FE: sd_journal_close (sd-journal.c:1620)
> > ==32666==    by 0x406DEA: main (journalctl.c:990)
> > ==32666==  If you believe this happened as a result of a stack
> > ==32666==  overflow in your program's main thread (unlikely but
> > ==32666==  possible), you can try to increase the size of the
> > ==32666==  main thread stack using the --main-stacksize= flag.
> > ==32666==  The main thread stack size used in this run was 8388608.
> > --32666-- Caught __NR_exit; running __libc_freeres()
> > --32666-- Discarding syms at 0x33981230-0x3398887c in /usr/lib64/libnss_files-2.16.90.so due to munmap()
> > ==32666== 
> > ==32666== HEAP SUMMARY:
> > ==32666==     in use at exit: 17,860,820 bytes in 57 blocks
> > ==32666==   total heap usage: 73,740 allocs, 73,683 frees, 33,511,230,790 bytes allocated
> > 
> > When I compile with --disable-gcrypt, everything seems to work fine
> > (no valgrind warnings). So the problem seems related to gcrypt,
> > but I can't see anything wrong by looking at the code.
> 
> I wonder if valgrind actually tracks mmap()s properly. I wonder if the
> mmap_cache is mistakingly unmapping a map it shoudln't. It might be
> worth loking for munmap() invocations in mmap-cache.c and printing the
> range unmapped and comparing that with the address valgrind mentions as
> freed.
Seems that mmaps/unmmaps are not the problem:

mmap 0x7f7c38c03000 2f5000
mmap 0x7f7c3890e000 2f5000
mmap 0x7f7c38618000 2f5000
...
mmap 0x7f7c0c53f000 2f5000
mmap 0x7f7c0bd3e000 800000
mmap 0x7f7c0b53d000 800000
mmap 0x7f7c0b247000 2f5000
mmap 0x7f7c0af51000 2f5000
munmap 0x7f7c38c03000 2f5000
...
munmap 0x7f7c0c53f000 2f5000
Segmentation fault (core dumped)

and 
a->ctx->macpads
$2 = (byte *) 0x7f7c3a3b2f88 ""

So, no overlap, everything mmaped and unmmaped in order.

Zbyszek