[pulseaudio-discuss] Design constraints for per-client mempools
David Henningsson
david.henningsson at canonical.com
Tue Mar 1 10:53:12 UTC 2016
On 2016-03-01 11:25, Ahmed S. Darwish wrote:
> Hi :-)
>
> On Tue, Feb 23, 2016 at 11:45:41AM +0200, Tanu Kaskinen wrote:
>> On Tue, 2016-02-23 at 11:19 +0200, Tanu Kaskinen wrote:
>>> On Tue, 2016-02-23 at 07:55 +0200, Ahmed S. Danish wrote:
>>>> Hello everyone,
>>>>
>>>> As you probably know by now, the second patch in the memfd series
>>>> was created to transform the global, shared by all clients,
>>>> srbchannel mempool to a "per-client" one. [1]
>>>>
>>>> Reason of this transition is to avoid data leaks between clients.
>>>> In a follow-up patch series, it's even hoped that the global core
>>>> mempool will also be transformed to the per-client model. [2]
>>>>
>>>>
>>>> ==> Current problem:
>>>> --------------------
>>>>
>>>> By injecting faults at the system-call level, it was discovered
>>>> that current design of per-client mempools leads to a reproducible
>>>> memory fault on certain error-recovery paths.
>>>>
>>>> The fault is caused by a design issue, rather than a small coding
>>>> error.
>>>>
>>>>
>>>> ==> Why are you dedicating a full mail to this?
>>>> -----------------------------------------------
>>>>
>>>> The problematic issue discovered implies that a bigger design
>>>> change is required; your input is very much needed :-)
>>>>
>>>>
>>>> ==> What was the current design?
>>>> --------------------------------
>>>>
>>>> To do the per-client mempool transformation, a simple decision
>>>> was taken: let 'pa_native_connection' own the per-client mempool
>>>> in question.
>>>>
>>>> Why? Because when a client connects, a 'native connection' object
>>>> is created for it by default. Inside this object is the pstream
>>>> pipe and other per client resources. So this seemed like the best
>>>> place for a per-client mempool:
>>>>
>>>> /* protocol-native.c - Called for each client connection */
>>>> void pa_native_protocol_connect(...) {
>>>> pa_native_connection *c;
>>>>
>>>> ...
>>>> c = pa_msgobject_new(pa_native_connection);
>>>> c->pstream = pa_pstream_new(...);
>>>> c->mempool = pa_mempool_new(...); // per-client mempool
>>>> ...
>>>> }
>>>>
>>>> And 'naturally', the pool should be deallocated when the
>>>> connection closes:
>>>>
>>>> /* protocol-native.c */
>>>> native_connection_free(pa_object *o) {
>>>> ...
>>>> pa_pdispatch_unref(c->pdispatch);
>>>> pa_pstream_unref(c->pstream);
>>>>
>>>> // Srbchannels allocates from this per-client pool,
>>>> // so free it only after the pstream is freed
>>>> pa_mempool_free(c->mempool);
>>>> ...
>>>> }
>>>>
>>>> All looked fine and dandy ..
>>>>
>>>>
>>>> ==> And where is the problem exactly?
>>>> -------------------------------------
>>>>
>>>> By injecting failures in sendmsg(2), thus reaching up to the
>>>> iochannel and pstream layers, the following leak is shown:
>>>>
>>>> E: Memory pool destroyed but not all mem blocks freed! 1 remain
>>>>
>>>> and a segfault is then produced due to a use-after-free operation
>>>> on the leaked memblock.
>>>>
>>>> The sequence of failure, bottom-up, is as follows:
>>>>
>>>> -> sendmsg(2) failes -EACCES
>>>> -> pa_iochannel_write_with_creds() return failure
>>>> -> pstream do_write() fails
>>>> -> do_pstream_read_write() fails, jumps to error recovery
>>>>
>>>> And the error recovery goes as this:
>>>>
>>>> -> pstream die callback (p->callback) is called
>>>> -> That's protocol-native.c pstream_die_callback()
>>>> -> native_connection_unlink // Stops srbchannel, etc.
>>>> -> pa_pstream_unlink // Reset all pstream callbacks
>>>> -> native_connection_free // Per-client mempool _freed!_
>>>> -> pa_pstream_unlink // Second time, does nothing..
>>>> -> pa_pstram_free
>>>> -> pa_queue_free(p->send_queue, item_free)
>>>> -> 1. PACKET item found .. safely removed
>>>> -> 2. MEMBLOCK item found .. from freed srbchannel mempool
>>>> BOOM!!! segfault
>>>>
>>>> SUMMARY : As seen above, a per-client mempool's lifetime must be
>>>> a superset of the pstream's lifetime. Putting the per-client pool
>>>> in the native_connection object provided only an _illusion_ of
>>>> such a superset. [*]
>>>>
>>>> During error recovery, stale memblocks remain in pstrem's send
>>>> queue long after __all the per-client objects__ has been removed.
>>>>
>>>> [*] Check native_connection_free() under "What was the current
>>>> design?" for why this illusion was maintained.
>>>>
>>>>
>>>> ==> Suggested solutions?
>>>> ------------------------
>>>>
>>>> The situation a is bit tricky.
>>>>
>>>> Where can we put the per-client pool while insuring a correct
>>>> lifecycle – especially during error recovery?
>>>>
>>>> The safest option, it seems, is to let the per-client pool be
>>>> owned by the pstream object, thus the pool can be freed at the
>>>> very end of pstream_free() itself. Is such a solution acceptable?
>>>
>>> My first reaction is that why is the pstream object reference counted?
>>>
>
> Seems this was just a regular convention rather than a conscious
> design decision. This is evidenced by the fact of having only
> __two__ pa_pstream_ref() calls in the entire tree. At pstream.c
> do_pstream_read_write() and in the same file at srb_callback().
> In both places they're just a ref/unref couple done at local
> context.
I recently added the one in srb_callback for a good reason:
commit f277f2c5094fb32c5d879923960eb807b3b1c535
Author: David Henningsson <david.henningsson at canonical.com>
Date: Fri Oct 16 22:12:32 2015 +0200
pstream: Fix use-after-free in srb_callback
...please make sure this bug does not reappear if you change things
around :-)
--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
More information about the pulseaudio-discuss
mailing list