[pulseaudio-discuss] [PATCH RFCv3 00/51] optimizations
Peter Meerwald
pmeerw at pmeerw.net
Tue Nov 4 15:25:55 PST 2014
preliminary benchmarking on Intel i5-2400S, 64-bit, Linux 3.13:
running 'paplay --latency-msec=10 stereo_48KHz.wav', output on internal
soundcard (Intel HDA), measuring the maximum CPU% in top for the pulseaudio
and paplay
code flags PA paplay
master 6d1fd4d1 -O2 < 14.0% < 3.7%
master 6d1fd4d1 -O2 -DNDEBUG < 13.3% < 3.3%
proposed v3 -O2 < 8.3% < 1.3%
proposed v3 -O2 -DNDEBUG < 7.6% < 1.3%
ARMv7 benchmarking soonish
this patch series aims to save memory allocations and some system calls
related to PA's client/server protocol implementation
v3 adds inlining and saves a snd_pcm_avail(), v2 code is largely unchanged
(minibuffers are increased and better used)
patches 1 to 5 ('tagstruct:') introduce a new tagstruct type _APPENDED
which can hold tagstruct data up to a certain size; tagstructs are now
kept in a specific free-list -- this typically replaces two malloc()/free()s
with one flist push()/pop()
patches 6 to 9 ('packet:') make packets fixed-size (typically); packets are
kept in a specific free-list -- this replaces one malloc()/free() with one
flist push()/pop()
patches 10 to 14 ('pstream:') allows to send tagstructs directly to a pstream
without encapsulation in a packet -- this saves one flist push()/pop()
patches 15 and 16 ('pstream') often save a read() call by reading more than
just the descriptor (up to 40 bytes, e.g. description (20 bytes) + shm
info (16 bytes)); the idea is similar to b4342845d, "Optimize write
of smaller packages", but for read -- this trades some extra memcpy() for
a read(); in v3 the buffer size has been increased to 256 bytes
patch 17 ('iochannel') fixes a strange behaviour in iochannel/mainloop that
deleted the input_event with every read which caused a rebuild of the pollfds
for every read()!
patches 18 to 20 ('queue', 'pstream') aim to combine two (v3: or more) write items
into one minibuffer by peeking ahead in the send queue
patch 21 stop calling mainloop's defer_enable() after queuing a SHMRELEASE; this
increases the chance that items can be combined (i.e. by patch 20)
patch 22 inlines pa_run_once() as this function came out high in profiling
patches 23 and 24 ('rtpoll') are cleanup
patch 25 ('mainloop') only clears the wakeup pipe when poll() indicates that
the pipe is readable; if the only ready file descriptor is the wakeup pipe,
searching io_events can be avoided
patch 26 and 27 ('flish') removes the volatile annotation and makes flist_elem attributes
non-atomic -- needed?
v3 material:
patches 28 to 31 annotates some branches in and saves two rtclock() calls
patch 32 ('resampler') is cleanup
patch 33 ('build-sys') adds --disable-statistics to configure
patches 34 to 37 make several hot functions inlinable; API function in pulse/
do lot of error checking which is unnecessary in the core; worse, checking does NOT
go away with NDEBUG
patch 38 ('resampler') precomputes the maximum block size in frames
patches 39 to 42 ('mix) makes functions inlineable and cleanup
patches 43 and 44 makes volume-related function inlineable
patch 45 and 46 ('iochannel', 'asyncmsgq') drop dead code
patch 47 fixes sink_input_pop_cb() to return the entire memchunk (as per specification)
patch 48 saves one call to snd_pcm_avail() by computing left_to_play -- this patch
has probably THE BIGGEST impact
patches 49 to 51 are cleanup and refactoring
summary:
with these patches typical playback (i.e. after setup) runs without any malloc()/free()
thanks to the use of free-lists; the number of memory management operations is reduced
many hot function have been made inlineable, redundant checks can be dropped by
compiling with NDEBUG=1
read() and write() syscalls are saved by combining data into minibuffers
one call to snd_pcm_avail() is saved per mmap_write()
Peter Meerwald (51):
tagstruct: Distinguish pa_tagstruct_new() use cases
tagstruct: Replace dynamic flag with type
tagstruct: Get rid of pa_tagstruct_free_data()
tagstruct: Add type _APPENDED
tagstruct: Use flist to potentially save calls to malloc()/free()
packet: Hide internals of pa_packet, introduce pa_packet_data()
packet: Make pa_packet_new() create fixed-size packets
packet: Introduce pa_packet_new_data() to copy data into a newly
created packet
packet: Use flist to save calls to malloc()/free()
pstream: Unionize item_info
pstream: Add pa_pstream_send_tagstruct()
pstream: #define PA_PSTREAM_SHM_SIZE
pstream: Duplicate assignment, write.data is always NULL
pstream: Only reset memchunk if it has been used
pstream: Split up do_read()
pstream: Use small minibuffer to combine several read()s if possible
iochannel: Fix channel enable
queue: Add pa_queue_peek() function
pstream: Add helper functions reset_descriptor(), shm_descriptor()
pstream: Peek into next item on send queue to see if it can be put
into minibuffer together with current item
pstream: Don't call defer_enable() on SHMRELEASE
once: Inline functions
rtpoll: Fix condition for DEBUG_TIMING output
rtpoll: Drop extra wait_op argument to pa_rtpoll_run()
mainloop: Clear wakeup pipe only when necessary
flist: Don't use atomic operations to manipulate ptr, next
flist: Don't make flist volatile
rtpoll: Annotate branches with LIKELY
mainloop: Annotate branches with LIKELY
alsa: Make rtpoll_run() runtime measurement compile-time code, default
off
alsa: Annotate branches in ALSA sink/source thread_func() with LIKELY
resampler: Drop pointless remix variable
build-sys: Add --disable-statistics
sample: Make pa_sample_size_table public
sample: Make pa_channels_valid() inlineable
sample-util: Add inlineable functions
core: Make use of use inlineable macros
resampler: Precompute maximum block size in frames
mix: Make use of pa_cvolume_is_norm/muted() macros
mix: Avoid redundant cvolume checks
mix: pa_mix() is always called with more than one steam
mix: Length over all chunk has already been computed by the caller
core: Add volume-util.h
core: Make use of volume macros
iochannel: Remove unnecessary zero-initialization
asyncmsgq: Drop weird assert
protocol-native: Make sink_input_pop_cb() return entire chunk
alsa-sink: Assume left_to_play can be computed, save one call to
snd_pcm_avail()
alsa: Refactor computation of sleep usec
alsa: Precompute max_frames
alsa: Remove redundant sample_spec parameter to reset_watermark()
function
configure.ac | 13 +-
src/modules/alsa/alsa-mixer.c | 4 +-
src/modules/alsa/alsa-sink.c | 187 +++----
src/modules/alsa/alsa-source.c | 135 ++---
src/modules/alsa/alsa-util.c | 32 +-
src/modules/bluetooth/module-bluez4-device.c | 2 +-
src/modules/bluetooth/module-bluez5-device.c | 2 +-
src/modules/echo-cancel/module-echo-cancel.c | 42 +-
src/modules/echo-cancel/webrtc.cc | 10 +-
src/modules/module-card-restore.c | 4 +-
src/modules/module-combine-sink.c | 2 +-
src/modules/module-device-manager.c | 12 +-
src/modules/module-device-restore.c | 16 +-
src/modules/module-esound-sink.c | 2 +-
src/modules/module-null-sink.c | 2 +-
src/modules/module-null-source.c | 2 +-
src/modules/module-pipe-sink.c | 2 +-
src/modules/module-pipe-source.c | 2 +-
src/modules/module-sine-source.c | 2 +-
src/modules/module-stream-restore.c | 12 +-
src/modules/module-tunnel.c | 54 +-
src/modules/oss/module-oss.c | 2 +-
src/modules/raop/module-raop-sink.c | 2 +-
src/pulse/context.c | 29 +-
src/pulse/ext-device-manager.c | 14 +-
src/pulse/ext-device-restore.c | 10 +-
src/pulse/ext-stream-restore.c | 10 +-
src/pulse/introspect.c | 82 +--
src/pulse/mainloop.c | 70 +--
src/pulse/sample.c | 18 +-
src/pulse/sample.h | 4 +-
src/pulse/scache.c | 10 +-
src/pulse/stream.c | 43 +-
src/pulse/subscribe.c | 2 +-
src/pulsecore/asyncmsgq.c | 2 -
src/pulsecore/flist.c | 14 +-
src/pulsecore/flist.h | 2 +-
src/pulsecore/iochannel.c | 37 +-
src/pulsecore/memblock.c | 15 +
src/pulsecore/memblockq.c | 5 +-
src/pulsecore/mix.c | 42 +-
src/pulsecore/mix.h | 5 +
src/pulsecore/once.c | 18 +-
src/pulsecore/once.h | 25 +-
src/pulsecore/packet.c | 55 +-
src/pulsecore/packet.h | 20 +-
src/pulsecore/pdispatch.c | 9 +-
src/pulsecore/protocol-native.c | 162 +++---
src/pulsecore/pstream-util.c | 33 +-
src/pulsecore/pstream-util.h | 2 -
src/pulsecore/pstream.c | 734 +++++++++++++++++----------
src/pulsecore/pstream.h | 2 +
src/pulsecore/queue.c | 11 +
src/pulsecore/queue.h | 3 +
src/pulsecore/resampler.c | 45 +-
src/pulsecore/resampler.h | 3 +-
src/pulsecore/rtpoll.c | 46 +-
src/pulsecore/rtpoll.h | 5 +-
src/pulsecore/sample-util.c | 8 +-
src/pulsecore/sample-util.h | 53 ++
src/pulsecore/sink-input.c | 13 +-
src/pulsecore/sink.c | 23 +-
src/pulsecore/source-output.c | 9 +-
src/pulsecore/source.c | 13 +-
src/pulsecore/tagstruct.c | 67 ++-
src/pulsecore/tagstruct.h | 4 +-
src/pulsecore/volume-util.h | 92 ++++
src/tests/rtpoll-test.c | 4 +-
src/tests/srbchannel-test.c | 21 +-
69 files changed, 1455 insertions(+), 982 deletions(-)
create mode 100644 src/pulsecore/volume-util.h
--
1.9.1
More information about the pulseaudio-discuss
mailing list