[Spice-devel] [spice-gtk v2 8/9] channel-main: avoid race around file-transfer flush
Victor Toso
victortoso at redhat.com
Wed Aug 3 11:57:15 UTC 2016
Hi,
On Wed, Aug 03, 2016 at 12:11:28PM +0200, Christophe Fergeau wrote:
> On Tue, Aug 02, 2016 at 11:48:49AM +0200, Victor Toso wrote:
> > This patch avoids a race condition. The race happens when we mark the
> > xfer-task as completed by receiving VD_AGENT_FILE_XFER_STATUS_SUCCESS
> > message while the flush callback was not yet called.
> >
> > The flush callback is file_xfer_data_flushed_cb() and it might not be
> > called immediately after data is flushed.
> >
> > The race can be verified while transferring several small files at
> > once. I can see it often with more then 50 files in one transfer
> > operation.
>
> I've also been able to reliably hit this running in valgrind.
>
> >
> > This fix implies that SpiceMainChannel should check in its async
> > callbacks if given SpiceFileTransferTask is completed.
> >
> > This patch introduces spice_file_transfer_task_is_completed (internal)
> > to help check if spice_file_transfer_task_completed() was called or
> > not.
>
> I wondered if we could cancel the pending async callbacks rather than
> doing things this way, but it's probably more complicated to do that.
Yes, It would be more complicated. I guess this 'completed' state could
be converted to a property and we can use g_object_notify() to trigger a
the cancellation ..? We also would need to change the GCancellable in
this flush_async() function as we are using the SpiceFileTransferTask.
Not sure if we would gain much from it but I could give it a try.
>
> > ---
> > src/channel-main.c | 17 +++++++++++------
> > src/spice-file-transfer-task-priv.h | 1 +
> > src/spice-file-transfer-task.c | 14 ++++++++++++++
> > 3 files changed, 26 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/channel-main.c b/src/channel-main.c
> > index bc7349f..14ad6cf 100644
> > --- a/src/channel-main.c
> > +++ b/src/channel-main.c
> > @@ -1766,10 +1766,12 @@ static void file_xfer_data_flushed_cb(GObject *source_object,
> > return;
> > }
> >
> > - file_transfer_operation_send_progress(xfer_task);
> > -
> > - /* Read more data */
> > - spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL);
> > + /* task might be completed while on idle */
> > + if (!spice_file_transfer_task_is_completed(xfer_task)) {
> > + file_transfer_operation_send_progress(xfer_task);
> > + /* Read more data */
> > + spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL);
> > + }
> > }
> >
> > static void file_xfer_queue_msg_to_agent(SpiceMainChannel *channel,
> > @@ -1814,13 +1816,15 @@ static void file_xfer_read_async_cb(GObject *source_object,
> > }
> >
> > file_xfer_queue_msg_to_agent(channel, spice_file_transfer_task_get_id(xfer_task), buffer, count);
> > - if (count == 0) {
> > - /* on EOF just wait for VD_AGENT_FILE_XFER_STATUS from agent */
> > + if (count == 0 || spice_file_transfer_task_is_completed(xfer_task)) {
> > + /* on EOF just wait for VD_AGENT_FILE_XFER_STATUS from agent
> > + * in case the task was completed, nothing to do. */
> > return;
> > }
> >
> > task_id = spice_file_transfer_task_get_id(xfer_task);
> > xfer_op = g_hash_table_lookup(channel->priv->file_xfer_tasks, GUINT_TO_POINTER(task_id));
> > + g_return_if_fail(xfer_op != NULL);
> > xfer_op->stats.total_sent += count;
> >
> > cancellable = spice_file_transfer_task_get_cancellable(xfer_task);
> > @@ -1841,6 +1845,7 @@ static void main_agent_handle_xfer_status(SpiceMainChannel *channel,
> >
> > switch (msg->result) {
> > case VD_AGENT_FILE_XFER_STATUS_CAN_SEND_DATA:
> > + g_return_if_fail(spice_file_transfer_task_is_completed(xfer_task) == FALSE);
> > spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL);
> > return;
> > case VD_AGENT_FILE_XFER_STATUS_CANCELLED:
> > diff --git a/src/spice-file-transfer-task-priv.h b/src/spice-file-transfer-task-priv.h
> > index 1ceb392..a7b58d8 100644
> > --- a/src/spice-file-transfer-task-priv.h
> > +++ b/src/spice-file-transfer-task-priv.h
> > @@ -52,6 +52,7 @@ gssize spice_file_transfer_task_read_finish(SpiceFileTransferTask *self,
> > GError **error);
> > guint64 spice_file_transfer_task_get_file_size(SpiceFileTransferTask *self);
> > guint64 spice_file_transfer_task_get_bytes_read(SpiceFileTransferTask *self);
> > +gboolean spice_file_transfer_task_is_completed(SpiceFileTransferTask *self);
> >
> > G_END_DECLS
> >
> > diff --git a/src/spice-file-transfer-task.c b/src/spice-file-transfer-task.c
> > index eb943c4..4e51b7e 100644
> > --- a/src/spice-file-transfer-task.c
> > +++ b/src/spice-file-transfer-task.c
> > @@ -42,6 +42,7 @@ struct _SpiceFileTransferTask
> > GObject parent;
> >
> > uint32_t id;
> > + gboolean completed;
> > gboolean pending;
> > GFile *file;
> > SpiceMainChannel *channel;
> > @@ -270,6 +271,8 @@ G_GNUC_INTERNAL
> > void spice_file_transfer_task_completed(SpiceFileTransferTask *self,
> > GError *error)
> > {
> > + self->completed = TRUE;
> > +
>
> Maybe add a g_warn_if_fail(self->completed == FALSE); or
> g_return_if_fail() at the beginning of thee function? Or is it meant to
> be called multiple times?
It can be called multiple times, the code bellow is:
if (self->pending) {
/* Complete but pending is okay only if error is set */
if (self->error == NULL) {
self->error = g_error_new(SPICE_CLIENT_ERROR,
SPICE_CLIENT_ERROR_FAILED,
"Cannot complete task in pending state");
}
return;
}
In case we receive a message from agent that is going to cancel the
operation but xfer-task is in the middle of an async operation.
The self->error check in the async callback functions are related to
this.
>
> > /* In case of multiple errors we only report the first error */
> > if (self->error)
> > g_clear_error(&error);
> > @@ -478,6 +481,16 @@ gssize spice_file_transfer_task_read_finish(SpiceFileTransferTask *self,
> > return nbytes;
> > }
> >
> > +G_GNUC_INTERNAL
> > +gboolean spice_file_transfer_task_is_completed(SpiceFileTransferTask *self)
> > +{
> > + g_return_val_if_fail(self != NULL, FALSE);
> > +
> > + /* File transfer is considered over at the first time it is marked as
> > + * complete with spice_file_transfer_task_completed. */
> > + return self->completed;
>
> Just to be sure, the race we are talking about is not caused by
> multithtreading, but only by the order in which things happen with
> respect to network/main loop? (checking whether we need mutexes here or
> not, I guess not).
Yes, I don't think any of this is related to multithreading. I did not
follow exactly how we receive the message faster then the idle callback,
but I guess the coroutine can be faster then several callback functions
schedule to run in idle.
>
> > +}
> > +
> > /*******************************************************************************
> > * External API
> > ******************************************************************************/
> > @@ -735,4 +748,5 @@ static void
> > spice_file_transfer_task_init(SpiceFileTransferTask *self)
> > {
> > self->buffer = g_malloc0(FILE_XFER_CHUNK_SIZE);
> > + self->completed = FALSE;
> > }
>
> The instance will be cleared to 0 upon creation, I don't think you need
> the explicit FALSE assignment there.
Ah, true. I'll remove it.
>
> Acked-by: Christophe Fergeau <cfergeau at redhat.com>
Thanks!
toso
>
> Christophe
More information about the Spice-devel
mailing list