[pulseaudio-discuss] [PATCH] echo-cancel: Avoid segfaults due to invalid master sink or source

Georg Chini georg at chini.tk
Fri May 5 14:21:28 UTC 2017


On 04.05.2017 20:11, Tanu Kaskinen wrote:
> On Wed, 2017-05-03 at 22:19 +0200, Georg Chini wrote:
>> On 03.05.2017 21:58, Tanu Kaskinen wrote:
>>> On Tue, 2017-05-02 at 07:12 +0200, Georg Chini wrote:
>>>> On 01.05.2017 22:10, Tanu Kaskinen wrote:
>>>>> On Mon, 2017-04-24 at 19:33 +0200, Georg Chini wrote:
>>>>>> There are several places in module-echo-cancel where a segfault is
>>>>>> possible when the master sink or source is invalid.
>>>>> I don't think the rewind, volume and mute callbacks are ever called
>>>>> during stream moves, at least with the current code base. However,
>>>>> adding the extra checks does no harm either, so I don't mind, but I
>>>>> think the commit message should be clarified on this point.
>>>> I do not mention a move at all in the commit message, I just
>>>> say there are several places where a segfault is possible.
>>> The patch adds sink_input->sink and source_output->source checks, and
>>> checking those pointers is only relevant during stream moves (or during
>>> stream initialization before the routing has been decided, but the
>>> callbacks aren't used that early). At all other times the sink and
>>> source pointers are non-NULL.
>>>
>>>> I think the rewind callback is called during a move, the
>>>> PA_SINK_MESSAGE_START_MOVE calls request_rewind().
>>> That happens before detaching the sink input, so the sink pointer is
>>> not NULL.
>> This seems not correct. At least after fixing the first bug, the next
>> crash occurred in pa_sink_invalidate_requested_latency() during
>> PA_SINK_MESSAGE_START_MOVE. The request_rewind is just a few
>> lines below.
>>
>> There must be (at least for echo-cancel) a possible situation
>> where the sink-input is moved from (null) to a real sink, otherwise
>> the crash I fixed could not happen at all.
> Sorry, I mixed the request rewind and process rewind callbacks. When
> pa_sink_request_rewind() is called on the virtual sink, the request
> gets propagated to the master sink, but in this case the filter sink is
> not connected to the master sink.
>
> I remember fighting with this case before... (looking up old stuff...)
> Last time it was module-device-manager that was moving a stream from an
> echo-cancel sink to the auto null sink during an alsa card profile
> change. So the case was quite similar, just a different policy module
> (device-manager instead of switch-on-connect).
>
> The fix that was used last time is not really applicable here. Part of
> the fix was to set the DONT_MOVE flags to module-echo-cancel's streams,
> but those flags are only set when module-echo-cancel is autoloaded by
> module-filter-apply, and apparently in this case module-echo-cancel is
> manually loaded (otherwise this situation shouldn't happen).
>
> We haven't so far supported moving streams that are connected to an
> already-moving filter sink, and I don't think this patch is a good way
> to add that support. The crash happens in some IO thread, but since the
> filter sink is not attached to any master sink, it's not clear which IO
> thread that is. Probably it's the old IO thread (the sink asyncmsgq
> pointer gets updated in the moving callback), but my point is that if
> it seems to work, it works by accident. After the filter sink has been
> detached from its master, the old IO thread could be already destroyed,
> so relying on it still running seems like a bad idea. If we want to
> support moving streams while not connected to any IO thread, I think
> the system design should be thought through properly.

After going through the code for I while, I believe the only possible
situation where such a "move within a move" can happen is when
the alsa card is switching the profile. In that situation we know
that the old thread still exists, because the sink is destroyed after
calling start_move() for all connected inputs. In all other cases the
move is either completed or fails before another move can be started.

>
> I propose applying this patch (the commit message needs to be
> rewritten): https://patchwork.freedesktop.org/patch/68741/
>
> That patch was not applied, because it was not needed after all to
> solve the bug last time, and because Arun didn't like it, but I think
> it makes sense and it should fix this crash. It will prevent module-
> switch-on-connect from moving the application stream to the auto null
> sink, which is fine.

I agree with you that my patch is wrong, but I think there is a simpler
solution than your patch. The reason why the move is triggered at all
is because the virtual sink becomes the default sink when no other
sink is available. Then a null-sink turns up and the streams are moved
away from the default sink. This does not make sense anyway for a
virtual sink, we don't want the streams to move away from there.
So the simple fix is to prevent the virtual sink from being the default
sink when it is not connected to a master, then the move will not occur.
If you are OK with that approach, I would send a patch.

BTW, we can then also revert my first commits that are already in the
tree.



More information about the pulseaudio-discuss mailing list