[Spice-devel] [spice-gtk 3/3] usb-device-manager: Fix USB event thread leak
Hans de Goede
hdegoede at redhat.com
Thu Jun 30 11:49:57 UTC 2016
Hi,
On 30-06-16 13:08, Hans de Goede wrote:
> Hi,
>
> On 30-06-16 09:08, Christophe Fergeau wrote:
>> On Wed, Jun 29, 2016 at 07:29:18PM +0200, Hans de Goede wrote:
>>> Erm no, it is the caller's responsibility to make sure that all
>>> spice_usbredir_channel's are properly torn down before the
>>> usbdevicemanager gets torn down.
>>>
>>> Otherwise you're not just having the problem of the thread continuing
>>> to run, but also you still have open usbfs file-descriptors and any
>>> redirected devices will not become available to the client-os again
>>> (think e.g. virt-manager with multiple spice ontexts).
>>>
>>> The proper fix is to fix the channel already being set to NULL,
>>> without first calling spice_usbredir_channel_disconnect_device().
>>
>> Yeah, I agree with most of this, however if the API is misused (and
>> actually it has been for years...), it still much better to cleanup the
>> thread rather than leaving it hanging around until it triggers a crash.
>>
>> However, I should have made this commit more explicit that something bad
>> happened first, ie:
>>
>> spice_usb_device_manager_stop_event_listening(self);
>>
>> if (g_atomic_int_get(&priv->event_thread_run)) {
>> /* Force termination of the event thread even if there were some mismatched
>> * spice_usb_device_manager_{start,stop}_event_listening
>> * calls. This should not happen if the API is properly
>> * used, but if it's not, this will avoid a crash/thread
>> * leak */
>> g_warn_if_reached();
>> g_atomic_int_set(&priv->event_thread_run, FALSE);
>> }
>> (I'll send a v2 doing that, and add more precisions to the commit log
>> that it's not a real fix)
>>
>>
>> In this case spice_usbredir_channel_disconnect_device() is called with a
>> non-NULL channel:
>>
>> #0 0x00007fc80b449dc4 in spice_usbredir_channel_disconnect_device
>> (channel=0x55aa2bc55060 [SpiceUsbredirChannel]) at
>> channel-usbredir.c:459
>> #1 0x00007fc80b449fb3 in _disconnect_device_thread (task=0x55aa2c7ad910
>> [GTask], object=0x55aa2bc55060, task_data=0x0, cancellable=0x0) at
>> channel-usbredir.c:508
>> #2 0x00007fc83f0fab10 in g_task_thread_pool_thread
>> (thread_data=0x55aa2c7ad910, pool_data=<optimized out>) at gtask.c:1288
>> #3 0x00007fc83f875735 in g_thread_pool_thread_proxy (data=<optimized
>> out>) at gthreadpool.c:307
>> #4 0x00007fc83f874d38 in g_thread_proxy (data=0x55aa2c93fed0) at
>> gthread.c:780
>> #5 0x00007fc84244f5ca in start_thread (arg=0x7fc8193b3700) at
>> pthread_create.c:333
>> #6 0x00007fc841a77ead in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>
>>
>> but SpiceChannel::session has already been set to NULL, so this specific
>> code block is not going to run:
>>
>> SpiceSession *session = spice_channel_get_session(SPICE_CHANNEL(channel));
>> if (session != NULL)
>> spice_usb_device_manager_stop_event_listening(spice_usb_device_manager_get(session, NULL));
>>
>> The session is set to NULL through:
>>
>> #0 0x00007fc80b4221ed in spice_session_channel_destroy (session=0x55aa2ba69fa0 [SpiceSession], channel=0x55aa2bc55060 [SpiceUsbredirChannel]) at spice-session.c:2280
>> #1 0x00007fc80b41c70e in session_disconnect (self=0x55aa2ba69fa0 [SpiceSession], keep_main=0)
>> at spice-session.c:314
>> #2 0x00007fc80b420a3b in session_disconnect_idle (self=0x55aa2ba69fa0 [SpiceSession])
>> at spice-session.c:1908
>> #3 0x00007fc83f84e703 in g_main_context_dispatch (context=0x55aa2b09d320) at gmain.c:3154
>>
>>
>> I"m a bit wary of touching the device disconnection/channel teardown code paths
>> though as I'm not familiar with them :((
>
> So it seems there are 2 problems:
>
> 1) The existence of the "if (session)" in:
>
> if (session != NULL)
> spice_usb_device_manager_stop_event_listening(spice_usb_device_manager_get(session, NULL));
>
> git blame gives me:
>
> https://cgit.freedesktop.org/spice/spice-gtk/commit/?id=e3932bfebbfec7637f3d03d90e8f9b75e3223236
>
> Which says this was done to fix a segfault. Which clearly is a bad idea,
> just adding if (pointer) checks to avoid segfaults is not the way to fix
> bugs people!
>
> So to fix this we need to root cause said segfault, fix it and then revert this
> commit.
>
> 2) spice_usbredir_channel_disconnect_device now running in a thread, rather
> then being a synchronous part of the tear-down sequence, this can either be
> due to someone calling spice_usbredir_channel_disconnect_device_async() or
> someone calling spice_usbredir_channel_reset(). Note that if either happens
> just before session_disconnect() gets called we've a problem as then
> the channel will be torn down while disconnect is running async, looking
> at your backtraces that is exactly what is happening. This all seems to be
> fallout from making usb device connect / disconnect async because windows
> is oh so terribly slow with this. Since you cannot fix windows, you at
> least not to make sure that this all gets to run its course properly
> on shutdown / session close. Note that things being able to run their
> course properly is also important to get the windows filter driver reconfigured
> to give the redirected usb device back to windows proper / to rebind the
> linux kernel driver to the usb device. Without this the client-os
> will not be able to access the device even after closing virt-viewer /
> whatever.
Thinking about this more, the easiest fix is probably for
channel-usbredir.c to take a ref on the usb_device_manager when it calls
spice_usb_device_manager_start_event_listening(), and use that ref when
it wants to call spice_usb_device_manager_stop_event_listening() and
then g_clear_object its ref.
This will allow removal of the if (session) and will allow async
disconnect-device calls to complete normally independently of
the spice session teardown.
Another potential issue I've noticed is the replacement of
GSimpleAsyncResult with GTask, the code was relying on GSimpleAsyncResult
taking a ref on the passed in gobject (the usbredir channel) and then
releasing that after the callback has completed. I'm not sure if
the GTask replacement code also does this, if not then we need to
do this explicitly (in ALL places where we use GTask).
Regards,
Hans
More information about the Spice-devel
mailing list