[Spice-devel] [PATCH] migration: Don't assert() if MIGRATE_DATA comes before attaching the agent

Uri Lublin uril at redhat.com
Wed Apr 30 10:11:38 PDT 2014


Hi Christophe,

I don't understand what exactly goes wrong and how this
patch fixes the race.


On 04/01/2014 04:58 PM, Christophe Fergeau wrote:
> During seamless migration, after switching host, if a client was connected
> during the migration, it will have data to send back to the new
> qemu/spice-server instance. This is handled through MIGRATE_DATA messages.
> SPICE char devices use such MIGRATE_DATA messages to restore their state.
>
> However, the MIGRATE_DATA message can arrive any time after the new qemu
> instance has started, this can happen before or after the SPICE char
> devices have been created. In order to handle this, if the migrate data
> arrives early, it's stored in reds->agent_state.mig_data, and
> attach_to_red_agent() will restore the agent state as appropriate.

Adding more information -- I think the race here is between qemu-kvm
migrating the virtio-device state (with main/migration thread) vs spice 
migrating
"agent" channel state.

The virtio-device state load function, calls (if agent service is 
running in the guest)
   virtio_serial_load ->
    qemu_char_fe_open ->
      spice_chr_guest_open ->
        vmc_register_interface ->
          qemu_spice_add_interface ->
            spice_server_add_interface ->
              spice_server_char_device_add_interface ->
                attach_to_red_agent

> Unfortunately this does not work as expected as expected. If
> attach_to_red_agent() is called before the MIGRATE_DATA message reaches the
> server, all goes well, but if MIGRATE_DATA reaches the server before
> attach_to_red_agent() gets called, then some assert() get triggered in
> spice_char_device_state_restore():
>
> ((null):32507): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
> Thread 3 (Thread 0x7f406b543700 (LWP 32543)):
> Thread 2 (Thread 0x7f40697ff700 (LWP 32586)):
> Thread 1 (Thread 0x7f4079b45a40 (LWP 32507)):

If we split this into 2 asserts, one for num_clients and the other to 
wait_for_migrate_data, we'll know which one's at fault.

> What happens is that dev->wait_for_migrate_data is unconditionally cleared when
> completing handling of the MIGRATE_DATA message, so it's FALSE when
> spice_char_device_state_restore() is called.

dev->wait_for_migrate_data is cleared upon
client_remove/device_reset/state_restore and in state_restore it's
only after it passed the assert mentioned above.

>   Moreover, dev->num_clients is
> also 0 as this is only increased by spice_char_device_start() which
> spice_server_char_device_add_interface() calls after
> attach_to_red_agent()/spice_char_device_state_restore() have been called.

dev->num_clients is set to 1 in spice_char_device_client_add() called
from  spicevmc_connect. So as long as the client did not disconnect
it should be good.
With seamless migration that happens even before (qemu-kvm)
migration starts.

> This commit changes the logic in spice_server_char_device_add_interface(),
> and when there is migrate data pending in reds->agent_state.mig_data, we
> only attempt to restore it after successfully initializing/starting the
> needed char device (which will ensure that dev->num_clients is not 0).

I think that if the client is connected we should have dev->num_clients 
set to 1.

>
> It also changes attach_to_red_agent() to handle the case when
> reds->agent_state.mig_data is non-NULL to be the same as the case when
> red_channel_waits_for_migrate_data() is TRUE. This ensures that
> spice_char_device_client_add() gets called and that 'wait_for_data' is set
> in the added device.

I think each handles a different scenario in that race:
reds->agent_state.mig_data -- migration data has already arrived.
spice_char_waits_for_migarte_data -- we are waiting for migration data
to arrive.

>
> This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1035184
> ---
> Changes since v1:
> - Move block calling reds_agent_state_restore() before calling
>    spice_char_device_start() as this should be safer. This also
>    groups all SUBTYPE_xxx special-casing together. I've tested that
>    things are still working as expected after this move.
>
>
>   server/reds.c | 25 ++++++++++++++-----------
>   1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/server/reds.c b/server/reds.c
> index 0390602..bd4fea1 100644
> --- a/server/reds.c
> +++ b/server/reds.c
> @@ -2863,16 +2863,7 @@ static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
>       state->read_filter.discard_all = FALSE;
>       reds->agent_state.plug_generation++;
>   
> -    if (reds->agent_state.mig_data) {
> -        spice_assert(reds->agent_state.plug_generation == 1);
> -        reds_agent_state_restore(reds->agent_state.mig_data);
> -        free(reds->agent_state.mig_data);
> -        reds->agent_state.mig_data = NULL;
> -    } else if (!red_channel_waits_for_migrate_data(&reds->main_channel->base)) {
> -        /* we will assoicate the client with the char device, upon reds_on_main_agent_start,
> -         * in response to MSGC_AGENT_START */
> -        main_channel_push_agent_connected(reds->main_channel);
> -    } else {
> +    if (red_channel_waits_for_migrate_data(&reds->main_channel->base) || reds->agent_state.mig_data) {
>          spice_debug("waiting for migration data");
>           if (!spice_char_device_client_exists(reds->agent_state.base, reds_get_client())) {
>               int client_added;
> @@ -2889,9 +2880,13 @@ static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
>                   spice_warning("failed to add client to agent");
>                   reds_disconnect();
>               }
> -
>           }
> +    } else {
> +        /* we will associate the client with the char device, upon reds_on_main_agent_start,
> +         * in response to MSGC_AGENT_START */
> +        main_channel_push_agent_connected(reds->main_channel);
>       }
> +
>       return state->base;
>   }
>   
> @@ -2987,6 +2982,13 @@ static int spice_server_char_device_add_interface(SpiceServer *s,
>           } else {
>               dev_state = spicevmc_device_connect(char_device, SPICE_CHANNEL_PORT);
>           }
> +    } else if (strcmp(char_device->subtype, SUBTYPE_VDAGENT) == 0) {
> +        if (reds->agent_state.mig_data) {
> +               spice_assert(reds->agent_state.plug_generation == 1);
> +               reds_agent_state_restore(reds->agent_state.mig_data);
> +               free(reds->agent_state.mig_data);
> +               reds->agent_state.mig_data = NULL;
> +        }
>       }

Since this if statement started with
    if (strcmp(char_device->subtype, SUBTYPE_VDAGENT) == 0)
and was followed by else if statements the lines added here will not be 
reached.


Thanks,
     Uri.


More information about the Spice-devel mailing list