[systemd-devel] journalctl segfault in gcrypt code

Tue Oct 16 07:19:34 PDT 2012

On Tue, Oct 16, 2012 at 04:13:15PM +0200, Zbigniew Jędrzejewski-Szmek wrote:
> On Tue, Oct 16, 2012 at 01:01:04AM +0200, Lennart Poettering wrote:
> > On Mon, 15.10.12 23:02, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:
> > 
> > > 
> > > On Mon, Oct 15, 2012 at 10:01:31PM +0200, Lennart Poettering wrote:
> > > > On Sat, 13.10.12 17:59, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > I'm having trouble debugging the problem below. Maybe somebody has an
> > > > > idea...  When I run journalctl, on a specific (large) set of journal
> > > > > logs, it segfaults. Always in the same place.
> > > > 
> > > > Hmm, I have not see this so far. Have you tried valgrind on this?
> > > Yeah, but it didn't say anything useful.
> > > 
> > > ==32666== Invalid write of size 1
> > > ==32666==    at 0x5076B12: md_close (md.c:771)
> > > ==32666==    by 0x41282D: journal_file_close (journal-file.c:109)
> > > ==32666==    by 0x4110FE: sd_journal_close (sd-journal.c:1620)
> > > ==32666==    by 0x406DEA: main (journalctl.c:990)
> > > ==32666==  Address 0x402d000 is not stack'd, malloc'd or (recently) free'd
> > > ==32666== 
> > > ==32666== 
> > > ==32666== Process terminating with default action of signal 11 (SIGSEGV)
> > > ==32666==  Access not within mapped region at address 0x402D000
> > > ==32666==    at 0x5076B12: md_close (md.c:771)
> > > ==32666==    by 0x41282D: journal_file_close (journal-file.c:109)
> > > ==32666==    by 0x4110FE: sd_journal_close (sd-journal.c:1620)
> > > ==32666==    by 0x406DEA: main (journalctl.c:990)
> > > ==32666==  If you believe this happened as a result of a stack
> > > ==32666==  overflow in your program's main thread (unlikely but
> > > ==32666==  possible), you can try to increase the size of the
> > > ==32666==  main thread stack using the --main-stacksize= flag.
> > > ==32666==  The main thread stack size used in this run was 8388608.
> > > --32666-- Caught __NR_exit; running __libc_freeres()
> > > --32666-- Discarding syms at 0x33981230-0x3398887c in /usr/lib64/libnss_files-2.16.90.so due to munmap()
> > > ==32666== 
> > > ==32666== HEAP SUMMARY:
> > > ==32666==     in use at exit: 17,860,820 bytes in 57 blocks
> > > ==32666==   total heap usage: 73,740 allocs, 73,683 frees, 33,511,230,790 bytes allocated
> > > 
> > > When I compile with --disable-gcrypt, everything seems to work fine
> > > (no valgrind warnings). So the problem seems related to gcrypt,
> > > but I can't see anything wrong by looking at the code.
> > 
> > I wonder if valgrind actually tracks mmap()s properly. I wonder if the
> > mmap_cache is mistakingly unmapping a map it shoudln't. It might be
> > worth loking for munmap() invocations in mmap-cache.c and printing the
> > range unmapped and comparing that with the address valgrind mentions as
> > freed.
> Seems that mmaps/unmmaps are not the problem:
> 
> mmap 0x7f7c38c03000 2f5000
> mmap 0x7f7c3890e000 2f5000
> mmap 0x7f7c38618000 2f5000
> ...
> mmap 0x7f7c0c53f000 2f5000
> mmap 0x7f7c0bd3e000 800000
> mmap 0x7f7c0b53d000 800000
> mmap 0x7f7c0b247000 2f5000
> mmap 0x7f7c0af51000 2f5000
> munmap 0x7f7c38c03000 2f5000
> ...
> munmap 0x7f7c0c53f000 2f5000
> Segmentation fault (core dumped)
> 
> and 
> a->ctx->macpads
> $2 = (byte *) 0x7f7c3a3b2f88 ""
> 
> So, no overlap, everything mmaped and unmmaped in order.
BTW, this is on top of Colin's patches: (journal: Set the last_unused pointer...
and journal: Properly track the number of allocated windows).

Zbyszek