[systemd-devel] kdbus performance regression by ~70% on 3.15 kernels ?

Steven Noonan steven at uplinklabs.net
Fri Jun 27 17:31:56 PDT 2014


Ah, I was running the GitHub version from:

https://github.com/gregkh/kdbus

On Fri, Jun 27, 2014 at 5:28 PM, Djalal Harouni <tixxdz at opendz.org> wrote:
> On Fri, Jun 27, 2014 at 04:55:30PM -0700, Steven Noonan wrote:
>> On Fri, Jun 27, 2014 at 3:14 PM, Djalal Harouni <tixxdz at opendz.org> wrote:
>> > On Fri, Jun 27, 2014 at 02:28:56PM -0700, Greg KH wrote:
>> >> On Fri, Jun 27, 2014 at 10:19:03PM +0100, Djalal Harouni wrote:
>> >> > On Fri, Jun 27, 2014 at 12:23:05PM +0100, Djalal Harouni wrote:
>> >> > > On Fri, Jun 27, 2014 at 01:04:00PM +0200, Daniel Mack wrote:
>> >> > > > On 06/27/2014 12:51 PM, Djalal Harouni wrote:
>> >> > > > > Just to let you know that I did notice a regression by ~70% when running
>> >> > > > > test-kdbus-benchmark on a kvm guest (that's what I've under hands now)
>> >> > > > >
>> >> > > > > I know sorry, but still a kdbus on kvm is a valid case, I don't know if
>> >> > > > > this affects real machine or only kvm guests will be able to confirm it
>> >> > > > > next week unless someone do!
>> >> > > > >
>> >> > > > > If you are able to test it in a real machine and confirm that it affects
>> >> > > > > them too, thank you!
>> >> > > > > I've managed to bisect this to:
>> >> > > > > 3.15.0-rc1 good
>> >> > > > > 3.15.0-rc5 bad
>> >> > > > >
>> >> > > > > I Will continue later this day!
>> >> > > >
>> >> > > > Please do. I'm not currently aware of such a regression. What about
>> >> > > > 3.16-rc2?
>> >> > A bit late, sorry!
>> >> >
>> >> > I was wrong on the 3.15.0-rc5 sorry that was a fedora rawhide kernel got
>> >> > confused by the naming and 'rc5'... but yes fedora rawhide affected! so
>> >> > something backported perhaps...
>> >> >
>> >> >
>> >> > Anyway for upstream tests:
>> >> >
>> >> > 3.15.0-rc5 and 3.15.0-rc7 are good
>> >> >
>> >> > 3.16-rc1 and 3.16-rc2 are bad
>> >> >
>> >> > So I confirm there is a regression somewhere.
>> >>
>> >> Can you run 'git bisect' on the kernel tree to try to track down the
>> >> problem commit?
>> > Yes of course! I'm planning to do so
>> >
>> > Thanks!
>> >
>>
>> Was going to try to repro this perf regression as well, but instead got
>> kdbus to oops (via test-kdbus-benchmark):
>>
>> $ test/test-kdbus-benchmark
>> -- opening /dev/kdbus/control
>> -- creating bus '1000-testbus'
>> -- opening bus connection /dev/kdbus/1000-testbus/bus
>> -- Our peer ID for /dev/kdbus/1000-testbus/bus: 1 -- bus uuid:
>> 'b65bfdd23d3e4696aae2992a0857aa33'
>> -- opening bus connection /dev/kdbus/1000-testbus/bus
>> -- Our peer ID for /dev/kdbus/1000-testbus/bus: 2 -- bus uuid:
>> 'b65bfdd23d3e4696aae2992a0857aa33'
>> name_acquire(): flags after call: 0x0
>> Killed
>> $
>>
>> [   32.853967] kdbus: initialized
>> [   33.557785] BUG: unable to handle kernel NULL pointer dereference at           (null)
>> [   33.557819] IP: [<          (null)>]           (null)
>> [   33.557837] PGD c58a5067 PUD c81cd067 PMD 0
>> [   33.557856] Oops: 0010 [#1] SMP
>> [   33.557870] Modules linked in: kdbus(O) snd_hda_codec_hdmi tun hid_generic snd_hda_codec_realtek snd_hda_codec_generic usbhid hid kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel snd_hda_controller radeon microcode snd_hda_codec snd_hwdep broadcom snd_pcm snd_timer serio_raw tg3 fam15h_power snd ttm soundcore libphy edac_core i2c_piix4 edac_mce_amd k10temp tpm_tis tpm acpi_cpufreq wmi evdev processor usbip_host(C) usbip_core(C) ext4 crc16 jbd2 mbcache sd_mod ata_generic pata_acpi crc_t10dif crct10dif_common ahci pata_jmicron libahci pata_atiixp crc32c_intel ehci_pci ehci_hcd xhci_hcd libata firewire_ohci usbcore scsi_mod usb_common firewire_core crc_itu_t i915 video intel_gtt i2c_algo_bit drm_kms_helper
>> [   33.558231]  drm i2c_core e1000e ptp pps_core ipmi_poweroff ipmi_msghandler button
>> [   33.558267] CPU: 1 PID: 1393 Comm: test-kdbus-benc Tainted: G         C O  3.16.0-rc2-ec2-00222-g3493860 #1
>> [   33.558335] task: ffff8803e7811d80 ti: ffff8800c82cc000 task.ti: ffff8800c82cc000
>> [   33.558364] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
>> [   33.558398] RSP: 0018:ffff8800c82cfe40  EFLAGS: 00010246
>> [   33.558419] RAX: ffffffff81636400 RBX: ffff880406dcfe40 RCX: 0000000000000000
>> [   33.558447] RDX: 0000000000000001 RSI: ffff8800c82cfe88 RDI: ffff8800c82cfe98
>> [   33.558475] RBP: ffff8800c82cfe78 R08: 00007fff80d23810 R09: ffff8803f9d5cc00
>> [   33.558503] R10: ffff8803e7811d80 R11: 0000000000000246 R12: ffff8800c82cfe98
>> [   33.558532] R13: ffff880406dcfe48 R14: ffff8800c82cfe88 R15: 0000000000000001
>> [   33.558566] FS:  00007f97e69bf700(0000) GS:ffff88042dc40000(0000) knlGS:0000000000000000
>> [   33.558595] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [   33.558616] CR2: 0000000000000000 CR3: 0000000036e20000 CR4: 00000000000407e0
>> [   33.558642] Stack:
>> [   33.558650]  ffffffffa11f45d6 0000000000000000 ffff8800c82cff50 0000000000000010
>> [   33.558681]  00007fff80d23800 ffff8800c82cff50 0000000000000000 ffff8800c82cfef8
>> [   33.558712]  ffffffff811d386a 00007fff80d23800 0000000000000010 ffff8803f9d5d400
>> [   33.558743] Call Trace:
>> [   33.558757]  [<ffffffffa11f45d6>] ? kdbus_memfd_writev+0x66/0xa0 [kdbus]
>> [   33.558785]  [<ffffffff811d386a>] do_sync_write+0x5a/0x90
>> [   33.558806]  [<ffffffff811d4071>] vfs_write+0x151/0x200
>> [   33.558827]  [<ffffffff811d4bb6>] SyS_write+0x46/0xc0
>> [   33.558847]  [<ffffffff81105eb6>] ? __audit_syscall_exit+0x236/0x2e0
>> [   33.558872]  [<ffffffff8152faed>] system_call_fastpath+0x1a/0x1f
>> [   33.558894] Code:  Bad RIP value.
>> [   33.558910] RIP  [<          (null)>]           (null)
>> [   33.558930]  RSP <ffff8800c82cfe40>
>> [   33.558943] CR2: 0000000000000000
>> [   33.569387] ---[ end trace e8d6c50c5ef168aa ]---
>>
>> Any ideas?
> Hmm you seem to be running an old kdbus ?
>
> kdbus_memfd_writev() was removed in commit 7da2745eb5d9c
> https://code.google.com/p/d-bus/source/detail?r=7da2745eb5d9c41e29df53de614b8872a24e759f
>
> Pull please from this repo! it should work! (not sure perhaps that
> commit might cause something...)
>
> Daniel the github repo is not synced ?
>
>
> And now a strange thing with 3.16.0-rc1 when I compile and run I've a
> 40% performance regression, and after I reboot I hit ~70% performance
> regression, booting on a fedora kvm with an old systemd (208 will update
> it perhaps) on a multi-user.target
>
> Will try to see tomorrow...
>
> --
> Djalal Harouni
> http://opendz.org


More information about the systemd-devel mailing list