[pulseaudio-discuss] [PATCH RFCv3 00/51] optimizations
David Henningsson
david.henningsson at canonical.com
Wed Nov 5 01:47:16 PST 2014
On 2014-11-05 00:25, Peter Meerwald wrote:
>
> preliminary benchmarking on Intel i5-2400S, 64-bit, Linux 3.13:
>
> running 'paplay --latency-msec=10 stereo_48KHz.wav', output on internal
> soundcard (Intel HDA), measuring the maximum CPU% in top for the pulseaudio
> and paplay
>
> code flags PA paplay
> master 6d1fd4d1 -O2 < 14.0% < 3.7%
> master 6d1fd4d1 -O2 -DNDEBUG < 13.3% < 3.3%
> proposed v3 -O2 < 8.3% < 1.3%
> proposed v3 -O2 -DNDEBUG < 7.6% < 1.3%
Cool stuff!
Seems we can get the low-latency story somewhat better, and perhaps even
more so if we can communicate directly with the I/O threads through
srbchannels, but that's a different project...
But to focus on 6.0 release; which of these patches have large enough
performance impact vs risk of regression to try to squeeze them into 6.0
rc1, and which ones can just be deferred to the -next branch?
Also, is v4 going to be 90 patches...? ;-)
>
> ARMv7 benchmarking soonish
>
>
> this patch series aims to save memory allocations and some system calls
> related to PA's client/server protocol implementation
>
> v3 adds inlining and saves a snd_pcm_avail(), v2 code is largely unchanged
> (minibuffers are increased and better used)
>
>
> patches 1 to 5 ('tagstruct:') introduce a new tagstruct type _APPENDED
> which can hold tagstruct data up to a certain size; tagstructs are now
> kept in a specific free-list -- this typically replaces two malloc()/free()s
> with one flist push()/pop()
>
> patches 6 to 9 ('packet:') make packets fixed-size (typically); packets are
> kept in a specific free-list -- this replaces one malloc()/free() with one
> flist push()/pop()
>
> patches 10 to 14 ('pstream:') allows to send tagstructs directly to a pstream
> without encapsulation in a packet -- this saves one flist push()/pop()
>
> patches 15 and 16 ('pstream') often save a read() call by reading more than
> just the descriptor (up to 40 bytes, e.g. description (20 bytes) + shm
> info (16 bytes)); the idea is similar to b4342845d, "Optimize write
> of smaller packages", but for read -- this trades some extra memcpy() for
> a read(); in v3 the buffer size has been increased to 256 bytes
>
> patch 17 ('iochannel') fixes a strange behaviour in iochannel/mainloop that
> deleted the input_event with every read which caused a rebuild of the pollfds
> for every read()!
>
> patches 18 to 20 ('queue', 'pstream') aim to combine two (v3: or more) write items
> into one minibuffer by peeking ahead in the send queue
>
> patch 21 stop calling mainloop's defer_enable() after queuing a SHMRELEASE; this
> increases the chance that items can be combined (i.e. by patch 20)
>
> patch 22 inlines pa_run_once() as this function came out high in profiling
>
> patches 23 and 24 ('rtpoll') are cleanup
>
> patch 25 ('mainloop') only clears the wakeup pipe when poll() indicates that
> the pipe is readable; if the only ready file descriptor is the wakeup pipe,
> searching io_events can be avoided
>
> patch 26 and 27 ('flish') removes the volatile annotation and makes flist_elem attributes
> non-atomic -- needed?
>
> v3 material:
>
> patches 28 to 31 annotates some branches in and saves two rtclock() calls
>
> patch 32 ('resampler') is cleanup
>
> patch 33 ('build-sys') adds --disable-statistics to configure
>
> patches 34 to 37 make several hot functions inlinable; API function in pulse/
> do lot of error checking which is unnecessary in the core; worse, checking does NOT
> go away with NDEBUG
>
> patch 38 ('resampler') precomputes the maximum block size in frames
>
> patches 39 to 42 ('mix) makes functions inlineable and cleanup
>
> patches 43 and 44 makes volume-related function inlineable
>
> patch 45 and 46 ('iochannel', 'asyncmsgq') drop dead code
>
> patch 47 fixes sink_input_pop_cb() to return the entire memchunk (as per specification)
>
> patch 48 saves one call to snd_pcm_avail() by computing left_to_play -- this patch
> has probably THE BIGGEST impact
>
> patches 49 to 51 are cleanup and refactoring
>
>
> summary:
>
> with these patches typical playback (i.e. after setup) runs without any malloc()/free()
> thanks to the use of free-lists; the number of memory management operations is reduced
>
> many hot function have been made inlineable, redundant checks can be dropped by
> compiling with NDEBUG=1
>
> read() and write() syscalls are saved by combining data into minibuffers
>
> one call to snd_pcm_avail() is saved per mmap_write()
>
>
> Peter Meerwald (51):
> tagstruct: Distinguish pa_tagstruct_new() use cases
> tagstruct: Replace dynamic flag with type
> tagstruct: Get rid of pa_tagstruct_free_data()
> tagstruct: Add type _APPENDED
> tagstruct: Use flist to potentially save calls to malloc()/free()
> packet: Hide internals of pa_packet, introduce pa_packet_data()
> packet: Make pa_packet_new() create fixed-size packets
> packet: Introduce pa_packet_new_data() to copy data into a newly
> created packet
> packet: Use flist to save calls to malloc()/free()
> pstream: Unionize item_info
> pstream: Add pa_pstream_send_tagstruct()
> pstream: #define PA_PSTREAM_SHM_SIZE
> pstream: Duplicate assignment, write.data is always NULL
> pstream: Only reset memchunk if it has been used
> pstream: Split up do_read()
> pstream: Use small minibuffer to combine several read()s if possible
> iochannel: Fix channel enable
> queue: Add pa_queue_peek() function
> pstream: Add helper functions reset_descriptor(), shm_descriptor()
> pstream: Peek into next item on send queue to see if it can be put
> into minibuffer together with current item
> pstream: Don't call defer_enable() on SHMRELEASE
> once: Inline functions
> rtpoll: Fix condition for DEBUG_TIMING output
> rtpoll: Drop extra wait_op argument to pa_rtpoll_run()
> mainloop: Clear wakeup pipe only when necessary
> flist: Don't use atomic operations to manipulate ptr, next
> flist: Don't make flist volatile
> rtpoll: Annotate branches with LIKELY
> mainloop: Annotate branches with LIKELY
> alsa: Make rtpoll_run() runtime measurement compile-time code, default
> off
> alsa: Annotate branches in ALSA sink/source thread_func() with LIKELY
> resampler: Drop pointless remix variable
> build-sys: Add --disable-statistics
> sample: Make pa_sample_size_table public
> sample: Make pa_channels_valid() inlineable
> sample-util: Add inlineable functions
> core: Make use of use inlineable macros
> resampler: Precompute maximum block size in frames
> mix: Make use of pa_cvolume_is_norm/muted() macros
> mix: Avoid redundant cvolume checks
> mix: pa_mix() is always called with more than one steam
> mix: Length over all chunk has already been computed by the caller
> core: Add volume-util.h
> core: Make use of volume macros
> iochannel: Remove unnecessary zero-initialization
> asyncmsgq: Drop weird assert
> protocol-native: Make sink_input_pop_cb() return entire chunk
> alsa-sink: Assume left_to_play can be computed, save one call to
> snd_pcm_avail()
> alsa: Refactor computation of sleep usec
> alsa: Precompute max_frames
> alsa: Remove redundant sample_spec parameter to reset_watermark()
> function
>
> configure.ac | 13 +-
> src/modules/alsa/alsa-mixer.c | 4 +-
> src/modules/alsa/alsa-sink.c | 187 +++----
> src/modules/alsa/alsa-source.c | 135 ++---
> src/modules/alsa/alsa-util.c | 32 +-
> src/modules/bluetooth/module-bluez4-device.c | 2 +-
> src/modules/bluetooth/module-bluez5-device.c | 2 +-
> src/modules/echo-cancel/module-echo-cancel.c | 42 +-
> src/modules/echo-cancel/webrtc.cc | 10 +-
> src/modules/module-card-restore.c | 4 +-
> src/modules/module-combine-sink.c | 2 +-
> src/modules/module-device-manager.c | 12 +-
> src/modules/module-device-restore.c | 16 +-
> src/modules/module-esound-sink.c | 2 +-
> src/modules/module-null-sink.c | 2 +-
> src/modules/module-null-source.c | 2 +-
> src/modules/module-pipe-sink.c | 2 +-
> src/modules/module-pipe-source.c | 2 +-
> src/modules/module-sine-source.c | 2 +-
> src/modules/module-stream-restore.c | 12 +-
> src/modules/module-tunnel.c | 54 +-
> src/modules/oss/module-oss.c | 2 +-
> src/modules/raop/module-raop-sink.c | 2 +-
> src/pulse/context.c | 29 +-
> src/pulse/ext-device-manager.c | 14 +-
> src/pulse/ext-device-restore.c | 10 +-
> src/pulse/ext-stream-restore.c | 10 +-
> src/pulse/introspect.c | 82 +--
> src/pulse/mainloop.c | 70 +--
> src/pulse/sample.c | 18 +-
> src/pulse/sample.h | 4 +-
> src/pulse/scache.c | 10 +-
> src/pulse/stream.c | 43 +-
> src/pulse/subscribe.c | 2 +-
> src/pulsecore/asyncmsgq.c | 2 -
> src/pulsecore/flist.c | 14 +-
> src/pulsecore/flist.h | 2 +-
> src/pulsecore/iochannel.c | 37 +-
> src/pulsecore/memblock.c | 15 +
> src/pulsecore/memblockq.c | 5 +-
> src/pulsecore/mix.c | 42 +-
> src/pulsecore/mix.h | 5 +
> src/pulsecore/once.c | 18 +-
> src/pulsecore/once.h | 25 +-
> src/pulsecore/packet.c | 55 +-
> src/pulsecore/packet.h | 20 +-
> src/pulsecore/pdispatch.c | 9 +-
> src/pulsecore/protocol-native.c | 162 +++---
> src/pulsecore/pstream-util.c | 33 +-
> src/pulsecore/pstream-util.h | 2 -
> src/pulsecore/pstream.c | 734 +++++++++++++++++----------
> src/pulsecore/pstream.h | 2 +
> src/pulsecore/queue.c | 11 +
> src/pulsecore/queue.h | 3 +
> src/pulsecore/resampler.c | 45 +-
> src/pulsecore/resampler.h | 3 +-
> src/pulsecore/rtpoll.c | 46 +-
> src/pulsecore/rtpoll.h | 5 +-
> src/pulsecore/sample-util.c | 8 +-
> src/pulsecore/sample-util.h | 53 ++
> src/pulsecore/sink-input.c | 13 +-
> src/pulsecore/sink.c | 23 +-
> src/pulsecore/source-output.c | 9 +-
> src/pulsecore/source.c | 13 +-
> src/pulsecore/tagstruct.c | 67 ++-
> src/pulsecore/tagstruct.h | 4 +-
> src/pulsecore/volume-util.h | 92 ++++
> src/tests/rtpoll-test.c | 4 +-
> src/tests/srbchannel-test.c | 21 +-
> 69 files changed, 1455 insertions(+), 982 deletions(-)
> create mode 100644 src/pulsecore/volume-util.h
>
--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
More information about the pulseaudio-discuss
mailing list