[systemd-devel] kdbus performance regression by ~70% on 3.15 kernels ?

Djalal Harouni tixxdz at opendz.org
Fri Jun 27 17:28:09 PDT 2014


On Fri, Jun 27, 2014 at 04:55:30PM -0700, Steven Noonan wrote:
> On Fri, Jun 27, 2014 at 3:14 PM, Djalal Harouni <tixxdz at opendz.org> wrote:
> > On Fri, Jun 27, 2014 at 02:28:56PM -0700, Greg KH wrote:
> >> On Fri, Jun 27, 2014 at 10:19:03PM +0100, Djalal Harouni wrote:
> >> > On Fri, Jun 27, 2014 at 12:23:05PM +0100, Djalal Harouni wrote:
> >> > > On Fri, Jun 27, 2014 at 01:04:00PM +0200, Daniel Mack wrote:
> >> > > > On 06/27/2014 12:51 PM, Djalal Harouni wrote:
> >> > > > > Just to let you know that I did notice a regression by ~70% when running
> >> > > > > test-kdbus-benchmark on a kvm guest (that's what I've under hands now)
> >> > > > >
> >> > > > > I know sorry, but still a kdbus on kvm is a valid case, I don't know if
> >> > > > > this affects real machine or only kvm guests will be able to confirm it
> >> > > > > next week unless someone do!
> >> > > > >
> >> > > > > If you are able to test it in a real machine and confirm that it affects
> >> > > > > them too, thank you!
> >> > > > > I've managed to bisect this to:
> >> > > > > 3.15.0-rc1 good
> >> > > > > 3.15.0-rc5 bad
> >> > > > >
> >> > > > > I Will continue later this day!
> >> > > >
> >> > > > Please do. I'm not currently aware of such a regression. What about
> >> > > > 3.16-rc2?
> >> > A bit late, sorry!
> >> >
> >> > I was wrong on the 3.15.0-rc5 sorry that was a fedora rawhide kernel got
> >> > confused by the naming and 'rc5'... but yes fedora rawhide affected! so
> >> > something backported perhaps...
> >> >
> >> >
> >> > Anyway for upstream tests:
> >> >
> >> > 3.15.0-rc5 and 3.15.0-rc7 are good
> >> >
> >> > 3.16-rc1 and 3.16-rc2 are bad
> >> >
> >> > So I confirm there is a regression somewhere.
> >>
> >> Can you run 'git bisect' on the kernel tree to try to track down the
> >> problem commit?
> > Yes of course! I'm planning to do so
> >
> > Thanks!
> >
> 
> Was going to try to repro this perf regression as well, but instead got
> kdbus to oops (via test-kdbus-benchmark):
> 
> $ test/test-kdbus-benchmark 
> -- opening /dev/kdbus/control
> -- creating bus '1000-testbus'
> -- opening bus connection /dev/kdbus/1000-testbus/bus
> -- Our peer ID for /dev/kdbus/1000-testbus/bus: 1 -- bus uuid:
> 'b65bfdd23d3e4696aae2992a0857aa33'
> -- opening bus connection /dev/kdbus/1000-testbus/bus
> -- Our peer ID for /dev/kdbus/1000-testbus/bus: 2 -- bus uuid:
> 'b65bfdd23d3e4696aae2992a0857aa33'
> name_acquire(): flags after call: 0x0                                                                                                
> Killed
> $
> 
> [   32.853967] kdbus: initialized
> [   33.557785] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [   33.557819] IP: [<          (null)>]           (null)
> [   33.557837] PGD c58a5067 PUD c81cd067 PMD 0
> [   33.557856] Oops: 0010 [#1] SMP
> [   33.557870] Modules linked in: kdbus(O) snd_hda_codec_hdmi tun hid_generic snd_hda_codec_realtek snd_hda_codec_generic usbhid hid kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel snd_hda_controller radeon microcode snd_hda_codec snd_hwdep broadcom snd_pcm snd_timer serio_raw tg3 fam15h_power snd ttm soundcore libphy edac_core i2c_piix4 edac_mce_amd k10temp tpm_tis tpm acpi_cpufreq wmi evdev processor usbip_host(C) usbip_core(C) ext4 crc16 jbd2 mbcache sd_mod ata_generic pata_acpi crc_t10dif crct10dif_common ahci pata_jmicron libahci pata_atiixp crc32c_intel ehci_pci ehci_hcd xhci_hcd libata firewire_ohci usbcore scsi_mod usb_common firewire_core crc_itu_t i915 video intel_gtt i2c_algo_bit drm_kms_helper
> [   33.558231]  drm i2c_core e1000e ptp pps_core ipmi_poweroff ipmi_msghandler button
> [   33.558267] CPU: 1 PID: 1393 Comm: test-kdbus-benc Tainted: G         C O  3.16.0-rc2-ec2-00222-g3493860 #1
> [   33.558335] task: ffff8803e7811d80 ti: ffff8800c82cc000 task.ti: ffff8800c82cc000
> [   33.558364] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
> [   33.558398] RSP: 0018:ffff8800c82cfe40  EFLAGS: 00010246
> [   33.558419] RAX: ffffffff81636400 RBX: ffff880406dcfe40 RCX: 0000000000000000
> [   33.558447] RDX: 0000000000000001 RSI: ffff8800c82cfe88 RDI: ffff8800c82cfe98
> [   33.558475] RBP: ffff8800c82cfe78 R08: 00007fff80d23810 R09: ffff8803f9d5cc00
> [   33.558503] R10: ffff8803e7811d80 R11: 0000000000000246 R12: ffff8800c82cfe98
> [   33.558532] R13: ffff880406dcfe48 R14: ffff8800c82cfe88 R15: 0000000000000001
> [   33.558566] FS:  00007f97e69bf700(0000) GS:ffff88042dc40000(0000) knlGS:0000000000000000
> [   33.558595] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   33.558616] CR2: 0000000000000000 CR3: 0000000036e20000 CR4: 00000000000407e0
> [   33.558642] Stack:
> [   33.558650]  ffffffffa11f45d6 0000000000000000 ffff8800c82cff50 0000000000000010
> [   33.558681]  00007fff80d23800 ffff8800c82cff50 0000000000000000 ffff8800c82cfef8
> [   33.558712]  ffffffff811d386a 00007fff80d23800 0000000000000010 ffff8803f9d5d400
> [   33.558743] Call Trace:
> [   33.558757]  [<ffffffffa11f45d6>] ? kdbus_memfd_writev+0x66/0xa0 [kdbus]
> [   33.558785]  [<ffffffff811d386a>] do_sync_write+0x5a/0x90
> [   33.558806]  [<ffffffff811d4071>] vfs_write+0x151/0x200
> [   33.558827]  [<ffffffff811d4bb6>] SyS_write+0x46/0xc0
> [   33.558847]  [<ffffffff81105eb6>] ? __audit_syscall_exit+0x236/0x2e0
> [   33.558872]  [<ffffffff8152faed>] system_call_fastpath+0x1a/0x1f
> [   33.558894] Code:  Bad RIP value.
> [   33.558910] RIP  [<          (null)>]           (null)
> [   33.558930]  RSP <ffff8800c82cfe40>
> [   33.558943] CR2: 0000000000000000
> [   33.569387] ---[ end trace e8d6c50c5ef168aa ]---
> 
> Any ideas?
Hmm you seem to be running an old kdbus ?

kdbus_memfd_writev() was removed in commit 7da2745eb5d9c
https://code.google.com/p/d-bus/source/detail?r=7da2745eb5d9c41e29df53de614b8872a24e759f

Pull please from this repo! it should work! (not sure perhaps that
commit might cause something...)

Daniel the github repo is not synced ?


And now a strange thing with 3.16.0-rc1 when I compile and run I've a
40% performance regression, and after I reboot I hit ~70% performance
regression, booting on a fedora kvm with an old systemd (208 will update
it perhaps) on a multi-user.target

Will try to see tomorrow...

-- 
Djalal Harouni
http://opendz.org


More information about the systemd-devel mailing list