[Spice-devel] [spice-gtk v2 8/9] channel-main: avoid race around file-transfer flush

Victor Toso victortoso at redhat.com
Wed Aug 3 11:57:15 UTC 2016


Hi,

On Wed, Aug 03, 2016 at 12:11:28PM +0200, Christophe Fergeau wrote:
> On Tue, Aug 02, 2016 at 11:48:49AM +0200, Victor Toso wrote:
> > This patch avoids a race condition. The race happens when we mark the
> > xfer-task as completed by receiving VD_AGENT_FILE_XFER_STATUS_SUCCESS
> > message while the flush callback was not yet called.
> > 
> > The flush callback is file_xfer_data_flushed_cb() and it might not be
> > called immediately after data is flushed.
> > 
> > The race can be verified while transferring several small files at
> > once. I can see it often with more then 50 files in one transfer
> > operation.
> 
> I've also been able to reliably hit this running in valgrind.
> 
> > 
> > This fix implies that SpiceMainChannel should check in its async
> > callbacks if given SpiceFileTransferTask is completed.
> > 
> > This patch introduces spice_file_transfer_task_is_completed (internal)
> > to help check if spice_file_transfer_task_completed() was called or
> > not.
>
> I wondered if we could cancel the pending async callbacks rather than
> doing things this way, but it's probably more complicated to do that.

Yes, It would be more complicated. I guess this 'completed' state could
be converted to a property and we can use g_object_notify() to trigger a
the cancellation ..? We also would need to change the GCancellable in
this flush_async() function as we are using the SpiceFileTransferTask.

Not sure if we would gain much from it but I could give it a try.

>
> > ---
> >  src/channel-main.c                  | 17 +++++++++++------
> >  src/spice-file-transfer-task-priv.h |  1 +
> >  src/spice-file-transfer-task.c      | 14 ++++++++++++++
> >  3 files changed, 26 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/channel-main.c b/src/channel-main.c
> > index bc7349f..14ad6cf 100644
> > --- a/src/channel-main.c
> > +++ b/src/channel-main.c
> > @@ -1766,10 +1766,12 @@ static void file_xfer_data_flushed_cb(GObject *source_object,
> >          return;
> >      }
> >  
> > -    file_transfer_operation_send_progress(xfer_task);
> > -
> > -    /* Read more data */
> > -    spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL);
> > +    /* task might be completed while on idle */
> > +    if (!spice_file_transfer_task_is_completed(xfer_task)) {
> > +        file_transfer_operation_send_progress(xfer_task);
> > +        /* Read more data */
> > +        spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL);
> > +    }
> >  }
> >  
> >  static void file_xfer_queue_msg_to_agent(SpiceMainChannel *channel,
> > @@ -1814,13 +1816,15 @@ static void file_xfer_read_async_cb(GObject *source_object,
> >      }
> >  
> >      file_xfer_queue_msg_to_agent(channel, spice_file_transfer_task_get_id(xfer_task), buffer, count);
> > -    if (count == 0) {
> > -        /* on EOF just wait for VD_AGENT_FILE_XFER_STATUS from agent */
> > +    if (count == 0 || spice_file_transfer_task_is_completed(xfer_task)) {
> > +        /* on EOF just wait for VD_AGENT_FILE_XFER_STATUS from agent
> > +         * in case the task was completed, nothing to do. */
> >          return;
> >      }
> >  
> >      task_id = spice_file_transfer_task_get_id(xfer_task);
> >      xfer_op = g_hash_table_lookup(channel->priv->file_xfer_tasks, GUINT_TO_POINTER(task_id));
> > +    g_return_if_fail(xfer_op != NULL);
> >      xfer_op->stats.total_sent += count;
> >  
> >      cancellable = spice_file_transfer_task_get_cancellable(xfer_task);
> > @@ -1841,6 +1845,7 @@ static void main_agent_handle_xfer_status(SpiceMainChannel *channel,
> >  
> >      switch (msg->result) {
> >      case VD_AGENT_FILE_XFER_STATUS_CAN_SEND_DATA:
> > +        g_return_if_fail(spice_file_transfer_task_is_completed(xfer_task) == FALSE);
> >          spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL);
> >          return;
> >      case VD_AGENT_FILE_XFER_STATUS_CANCELLED:
> > diff --git a/src/spice-file-transfer-task-priv.h b/src/spice-file-transfer-task-priv.h
> > index 1ceb392..a7b58d8 100644
> > --- a/src/spice-file-transfer-task-priv.h
> > +++ b/src/spice-file-transfer-task-priv.h
> > @@ -52,6 +52,7 @@ gssize spice_file_transfer_task_read_finish(SpiceFileTransferTask *self,
> >                                              GError **error);
> >  guint64 spice_file_transfer_task_get_file_size(SpiceFileTransferTask *self);
> >  guint64 spice_file_transfer_task_get_bytes_read(SpiceFileTransferTask *self);
> > +gboolean spice_file_transfer_task_is_completed(SpiceFileTransferTask *self);
> >  
> >  G_END_DECLS
> >  
> > diff --git a/src/spice-file-transfer-task.c b/src/spice-file-transfer-task.c
> > index eb943c4..4e51b7e 100644
> > --- a/src/spice-file-transfer-task.c
> > +++ b/src/spice-file-transfer-task.c
> > @@ -42,6 +42,7 @@ struct _SpiceFileTransferTask
> >      GObject parent;
> >  
> >      uint32_t                       id;
> > +    gboolean                       completed;
> >      gboolean                       pending;
> >      GFile                          *file;
> >      SpiceMainChannel               *channel;
> > @@ -270,6 +271,8 @@ G_GNUC_INTERNAL
> >  void spice_file_transfer_task_completed(SpiceFileTransferTask *self,
> >                                          GError *error)
> >  {
> > +    self->completed = TRUE;
> > +
>
> Maybe add a g_warn_if_fail(self->completed == FALSE); or
> g_return_if_fail() at the beginning of thee function? Or is it meant to
> be called multiple times?

It can be called multiple times, the code bellow is:

 if (self->pending) {
     /* Complete but pending is okay only if error is set */
     if (self->error == NULL) {
         self->error = g_error_new(SPICE_CLIENT_ERROR,
                                   SPICE_CLIENT_ERROR_FAILED,
                                   "Cannot complete task in pending state");
     }
     return;
 }

In case we receive a message from agent that is going to cancel the
operation but xfer-task is in the middle of an async operation.

The self->error check in the async callback functions are related to
this.

>
> >      /* In case of multiple errors we only report the first error */
> >      if (self->error)
> >          g_clear_error(&error);
> > @@ -478,6 +481,16 @@ gssize spice_file_transfer_task_read_finish(SpiceFileTransferTask *self,
> >      return nbytes;
> >  }
> >
> > +G_GNUC_INTERNAL
> > +gboolean spice_file_transfer_task_is_completed(SpiceFileTransferTask *self)
> > +{
> > +    g_return_val_if_fail(self != NULL, FALSE);
> > +
> > +    /* File transfer is considered over at the first time it is marked as
> > +     * complete with spice_file_transfer_task_completed. */
> > +    return self->completed;
>
> Just to be sure, the race we are talking about is not caused by
> multithtreading, but only by the order in which things happen with
> respect to network/main loop? (checking whether we need mutexes here or
> not, I guess not).

Yes, I don't think any of this is related to multithreading. I did not
follow exactly how we receive the message faster then the idle callback,
but I guess the coroutine can be faster then several callback functions
schedule to run in idle.

>
> > +}
> > +
> >  /*******************************************************************************
> >   * External API
> >   ******************************************************************************/
> > @@ -735,4 +748,5 @@ static void
> >  spice_file_transfer_task_init(SpiceFileTransferTask *self)
> >  {
> >      self->buffer = g_malloc0(FILE_XFER_CHUNK_SIZE);
> > +    self->completed = FALSE;
> >  }
>
> The instance will be cleared to 0 upon creation, I don't think you need
> the explicit FALSE assignment there.

Ah, true. I'll remove it.

>
> Acked-by: Christophe Fergeau <cfergeau at redhat.com>

Thanks!
  toso

>
> Christophe




More information about the Spice-devel mailing list