[pulseaudio-discuss] [PATCH] echo-cancel: Avoid segfaults due to invalid master sink or source

Georg Chini georg at chini.tk
Sat May 6 18:06:14 UTC 2017

On 06.05.2017 18:36, Tanu Kaskinen wrote:
> On Fri, 2017-05-05 at 16:21 +0200, Georg Chini wrote:
>> On 04.05.2017 20:11, Tanu Kaskinen wrote:
>>> On Wed, 2017-05-03 at 22:19 +0200, Georg Chini wrote:
>>>> On 03.05.2017 21:58, Tanu Kaskinen wrote:
>>>>> On Tue, 2017-05-02 at 07:12 +0200, Georg Chini wrote:
>>>>>> On 01.05.2017 22:10, Tanu Kaskinen wrote:
>>>>>>> On Mon, 2017-04-24 at 19:33 +0200, Georg Chini wrote:
>>>>>>>> There are several places in module-echo-cancel where a segfault is
>>>>>>>> possible when the master sink or source is invalid.
>>>>>>> I don't think the rewind, volume and mute callbacks are ever called
>>>>>>> during stream moves, at least with the current code base. However,
>>>>>>> adding the extra checks does no harm either, so I don't mind, but I
>>>>>>> think the commit message should be clarified on this point.
>>>>>> I do not mention a move at all in the commit message, I just
>>>>>> say there are several places where a segfault is possible.
>>>>> The patch adds sink_input->sink and source_output->source checks, and
>>>>> checking those pointers is only relevant during stream moves (or during
>>>>> stream initialization before the routing has been decided, but the
>>>>> callbacks aren't used that early). At all other times the sink and
>>>>> source pointers are non-NULL.
>>>>>> I think the rewind callback is called during a move, the
>>>>>> PA_SINK_MESSAGE_START_MOVE calls request_rewind().
>>>>> That happens before detaching the sink input, so the sink pointer is
>>>>> not NULL.
>>>> This seems not correct. At least after fixing the first bug, the next
>>>> crash occurred in pa_sink_invalidate_requested_latency() during
>>>> PA_SINK_MESSAGE_START_MOVE. The request_rewind is just a few
>>>> lines below.
>>>> There must be (at least for echo-cancel) a possible situation
>>>> where the sink-input is moved from (null) to a real sink, otherwise
>>>> the crash I fixed could not happen at all.
>>> Sorry, I mixed the request rewind and process rewind callbacks. When
>>> pa_sink_request_rewind() is called on the virtual sink, the request
>>> gets propagated to the master sink, but in this case the filter sink is
>>> not connected to the master sink.
>>> I remember fighting with this case before... (looking up old stuff...)
>>> Last time it was module-device-manager that was moving a stream from an
>>> echo-cancel sink to the auto null sink during an alsa card profile
>>> change. So the case was quite similar, just a different policy module
>>> (device-manager instead of switch-on-connect).
>>> The fix that was used last time is not really applicable here. Part of
>>> the fix was to set the DONT_MOVE flags to module-echo-cancel's streams,
>>> but those flags are only set when module-echo-cancel is autoloaded by
>>> module-filter-apply, and apparently in this case module-echo-cancel is
>>> manually loaded (otherwise this situation shouldn't happen).
>>> We haven't so far supported moving streams that are connected to an
>>> already-moving filter sink, and I don't think this patch is a good way
>>> to add that support. The crash happens in some IO thread, but since the
>>> filter sink is not attached to any master sink, it's not clear which IO
>>> thread that is. Probably it's the old IO thread (the sink asyncmsgq
>>> pointer gets updated in the moving callback), but my point is that if
>>> it seems to work, it works by accident. After the filter sink has been
>>> detached from its master, the old IO thread could be already destroyed,
>>> so relying on it still running seems like a bad idea. If we want to
>>> support moving streams while not connected to any IO thread, I think
>>> the system design should be thought through properly.
>> After going through the code for I while, I believe the only possible
>> situation where such a "move within a move" can happen is when
>> the alsa card is switching the profile. In that situation we know
>> that the old thread still exists, because the sink is destroyed after
>> calling start_move() for all connected inputs. In all other cases the
>> move is either completed or fails before another move can be started.
>>> I propose applying this patch (the commit message needs to be
>>> rewritten): https://patchwork.freedesktop.org/patch/68741/
>>> That patch was not applied, because it was not needed after all to
>>> solve the bug last time, and because Arun didn't like it, but I think
>>> it makes sense and it should fix this crash. It will prevent module-
>>> switch-on-connect from moving the application stream to the auto null
>>> sink, which is fine.
>> I agree with you that my patch is wrong, but I think there is a simpler
>> solution than your patch. The reason why the move is triggered at all
>> is because the virtual sink becomes the default sink when no other
>> sink is available. Then a null-sink turns up and the streams are moved
>> away from the default sink. This does not make sense anyway for a
>> virtual sink, we don't want the streams to move away from there.
>> So the simple fix is to prevent the virtual sink from being the default
>> sink when it is not connected to a master, then the move will not occur.
>> If you are OK with that approach, I would send a patch.
> By "the virtual sink", do you mean the null sink? Would you want it to
> be impossible to make any null sink the default sink? That doesn't seem
> like a good idea, so do you propose some more nuanced policy?

No, by virtual sink I mean the filter sink. I think you did not get what I
wanted to say. When there is no other sink, the filter sink becomes
the default before the null sink turns up. Then streams are moved
from the moving filter sink to the new null sink. If we prevent the
filter sink from becoming the default while its sink input is moving,
the problem will not happen in the first place.
I'll send a patch, maybe you understand better what I mean then.

> I believe preventing the move to the null sink isn't enough anyway,
> because module-switch-on-connect will again try to move all streams
> when the sink of the new alsa profile is created. That happens before
> the echo-cancel sink has been attached to the new alsa sink.
> In defence of my patch, I believe it makes sense regardless of whether
> we make also another patch that avoids this specific situation. Moving
> a stream from a moving filter sink is unsupported, and will stay
> unsupported for the foreseeable future. Preventing all such move
> attempts seems like a good idea to me.

My argument that the code does not allow this to happen anyway
with one exception when the alsa card is switching profile. Because
that situation is handled by my patch, your patch is not necessary.
On the other hand I would not resist to add it, it is just an additional

More information about the pulseaudio-discuss mailing list