[Spice-devel] [spice-gtk 3/3] usb-device-manager: Fix USB event thread leak

Hans de Goede hdegoede at redhat.com
Thu Jun 30 11:08:10 UTC 2016


Hi,

On 30-06-16 09:08, Christophe Fergeau wrote:
> On Wed, Jun 29, 2016 at 07:29:18PM +0200, Hans de Goede wrote:
>> Erm no, it is the caller's responsibility to make sure that all
>> spice_usbredir_channel's are properly torn down before the
>> usbdevicemanager gets torn down.
>>
>> Otherwise you're not just having the problem of the thread continuing
>> to run, but also you still have open usbfs file-descriptors and any
>> redirected devices will not become available to the client-os again
>> (think e.g. virt-manager with multiple spice ontexts).
>>
>> The proper fix is to fix the channel already being set to NULL,
>> without first calling spice_usbredir_channel_disconnect_device().
>
> Yeah, I agree with most of this, however if the API is misused (and
> actually it has been for years...), it still much better to cleanup the
> thread rather than leaving it hanging around until it triggers a crash.
>
> However, I should have made this commit more explicit that something bad
> happened first, ie:
>
>         spice_usb_device_manager_stop_event_listening(self);
>
>         if (g_atomic_int_get(&priv->event_thread_run)) {
>             /* Force termination of the event thread even if there were some mismatched
>              * spice_usb_device_manager_{start,stop}_event_listening
>              * calls. This should not happen if the API is properly
>              * used, but if it's not, this will avoid a crash/thread
>              * leak */
>             g_warn_if_reached();
>             g_atomic_int_set(&priv->event_thread_run, FALSE);
>         }
> (I'll send a v2 doing that, and add more precisions to the commit log
> that it's not a real fix)
>
>
> In this case spice_usbredir_channel_disconnect_device() is called with a
> non-NULL channel:
>
> #0  0x00007fc80b449dc4 in spice_usbredir_channel_disconnect_device
> (channel=0x55aa2bc55060 [SpiceUsbredirChannel]) at
> channel-usbredir.c:459
> #1  0x00007fc80b449fb3 in _disconnect_device_thread (task=0x55aa2c7ad910
> [GTask], object=0x55aa2bc55060, task_data=0x0, cancellable=0x0) at
> channel-usbredir.c:508
> #2  0x00007fc83f0fab10 in g_task_thread_pool_thread
> (thread_data=0x55aa2c7ad910, pool_data=<optimized out>) at gtask.c:1288
> #3  0x00007fc83f875735 in g_thread_pool_thread_proxy (data=<optimized
> out>) at gthreadpool.c:307
> #4  0x00007fc83f874d38 in g_thread_proxy (data=0x55aa2c93fed0) at
> gthread.c:780
> #5  0x00007fc84244f5ca in start_thread (arg=0x7fc8193b3700) at
> pthread_create.c:333
> #6  0x00007fc841a77ead in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
>
> but SpiceChannel::session has already been set to NULL, so this specific
> code block is not going to run:
>
> SpiceSession *session = spice_channel_get_session(SPICE_CHANNEL(channel));
> if (session != NULL)
>     spice_usb_device_manager_stop_event_listening(spice_usb_device_manager_get(session, NULL));
>
> The session is set to NULL through:
>
> #0  0x00007fc80b4221ed in spice_session_channel_destroy (session=0x55aa2ba69fa0 [SpiceSession], channel=0x55aa2bc55060 [SpiceUsbredirChannel]) at spice-session.c:2280
> #1  0x00007fc80b41c70e in session_disconnect (self=0x55aa2ba69fa0 [SpiceSession], keep_main=0)
>     at spice-session.c:314
> #2  0x00007fc80b420a3b in session_disconnect_idle (self=0x55aa2ba69fa0 [SpiceSession])
>     at spice-session.c:1908
> #3  0x00007fc83f84e703 in g_main_context_dispatch (context=0x55aa2b09d320) at gmain.c:3154
>
>
> I"m a bit wary of touching the device disconnection/channel teardown code paths
> though as I'm not familiar with them :((

So it seems there are 2 problems:

1) The existence of the "if (session)" in:

    if (session != NULL)
       spice_usb_device_manager_stop_event_listening(spice_usb_device_manager_get(session, NULL));

git blame gives me:

https://cgit.freedesktop.org/spice/spice-gtk/commit/?id=e3932bfebbfec7637f3d03d90e8f9b75e3223236

Which says this was done to fix a segfault. Which clearly is a bad idea,
just adding if (pointer) checks to avoid segfaults is not the way to fix
bugs people!

So to fix this we need to root cause said segfault, fix it and then revert this
commit.

2) spice_usbredir_channel_disconnect_device now running in a thread, rather
then being a synchronous part of the tear-down sequence, this can either be
due to someone calling spice_usbredir_channel_disconnect_device_async() or
someone calling spice_usbredir_channel_reset(). Note that if either happens
just before session_disconnect() gets called we've a problem as then
the channel will be torn down while disconnect is running async, looking
at your backtraces that is exactly what is happening. This all seems to be
fallout from making usb device connect / disconnect async because windows
is oh so terribly slow with this.  Since you cannot fix windows, you at
least not to make sure that this all gets to run its course properly
on shutdown / session close.  Note that things being able to run their
course properly is also important to get the windows filter driver reconfigured
to give the redirected usb device back to windows proper / to rebind the
linux kernel driver to the usb device. Without this the client-os
will not be able to access the device even after closing virt-viewer /
whatever.

Regards,

Hans



More information about the Spice-devel mailing list