[Spice-devel] [PATCH] migration: Don't assert() if MIGRATE_DATA comes before attaching the agent
Uri Lublin
uril at redhat.com
Wed Apr 30 10:11:38 PDT 2014
Hi Christophe,
I don't understand what exactly goes wrong and how this
patch fixes the race.
On 04/01/2014 04:58 PM, Christophe Fergeau wrote:
> During seamless migration, after switching host, if a client was connected
> during the migration, it will have data to send back to the new
> qemu/spice-server instance. This is handled through MIGRATE_DATA messages.
> SPICE char devices use such MIGRATE_DATA messages to restore their state.
>
> However, the MIGRATE_DATA message can arrive any time after the new qemu
> instance has started, this can happen before or after the SPICE char
> devices have been created. In order to handle this, if the migrate data
> arrives early, it's stored in reds->agent_state.mig_data, and
> attach_to_red_agent() will restore the agent state as appropriate.
Adding more information -- I think the race here is between qemu-kvm
migrating the virtio-device state (with main/migration thread) vs spice
migrating
"agent" channel state.
The virtio-device state load function, calls (if agent service is
running in the guest)
virtio_serial_load ->
qemu_char_fe_open ->
spice_chr_guest_open ->
vmc_register_interface ->
qemu_spice_add_interface ->
spice_server_add_interface ->
spice_server_char_device_add_interface ->
attach_to_red_agent
> Unfortunately this does not work as expected as expected. If
> attach_to_red_agent() is called before the MIGRATE_DATA message reaches the
> server, all goes well, but if MIGRATE_DATA reaches the server before
> attach_to_red_agent() gets called, then some assert() get triggered in
> spice_char_device_state_restore():
>
> ((null):32507): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
> Thread 3 (Thread 0x7f406b543700 (LWP 32543)):
> Thread 2 (Thread 0x7f40697ff700 (LWP 32586)):
> Thread 1 (Thread 0x7f4079b45a40 (LWP 32507)):
If we split this into 2 asserts, one for num_clients and the other to
wait_for_migrate_data, we'll know which one's at fault.
> What happens is that dev->wait_for_migrate_data is unconditionally cleared when
> completing handling of the MIGRATE_DATA message, so it's FALSE when
> spice_char_device_state_restore() is called.
dev->wait_for_migrate_data is cleared upon
client_remove/device_reset/state_restore and in state_restore it's
only after it passed the assert mentioned above.
> Moreover, dev->num_clients is
> also 0 as this is only increased by spice_char_device_start() which
> spice_server_char_device_add_interface() calls after
> attach_to_red_agent()/spice_char_device_state_restore() have been called.
dev->num_clients is set to 1 in spice_char_device_client_add() called
from spicevmc_connect. So as long as the client did not disconnect
it should be good.
With seamless migration that happens even before (qemu-kvm)
migration starts.
> This commit changes the logic in spice_server_char_device_add_interface(),
> and when there is migrate data pending in reds->agent_state.mig_data, we
> only attempt to restore it after successfully initializing/starting the
> needed char device (which will ensure that dev->num_clients is not 0).
I think that if the client is connected we should have dev->num_clients
set to 1.
>
> It also changes attach_to_red_agent() to handle the case when
> reds->agent_state.mig_data is non-NULL to be the same as the case when
> red_channel_waits_for_migrate_data() is TRUE. This ensures that
> spice_char_device_client_add() gets called and that 'wait_for_data' is set
> in the added device.
I think each handles a different scenario in that race:
reds->agent_state.mig_data -- migration data has already arrived.
spice_char_waits_for_migarte_data -- we are waiting for migration data
to arrive.
>
> This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1035184
> ---
> Changes since v1:
> - Move block calling reds_agent_state_restore() before calling
> spice_char_device_start() as this should be safer. This also
> groups all SUBTYPE_xxx special-casing together. I've tested that
> things are still working as expected after this move.
>
>
> server/reds.c | 25 ++++++++++++++-----------
> 1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/server/reds.c b/server/reds.c
> index 0390602..bd4fea1 100644
> --- a/server/reds.c
> +++ b/server/reds.c
> @@ -2863,16 +2863,7 @@ static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
> state->read_filter.discard_all = FALSE;
> reds->agent_state.plug_generation++;
>
> - if (reds->agent_state.mig_data) {
> - spice_assert(reds->agent_state.plug_generation == 1);
> - reds_agent_state_restore(reds->agent_state.mig_data);
> - free(reds->agent_state.mig_data);
> - reds->agent_state.mig_data = NULL;
> - } else if (!red_channel_waits_for_migrate_data(&reds->main_channel->base)) {
> - /* we will assoicate the client with the char device, upon reds_on_main_agent_start,
> - * in response to MSGC_AGENT_START */
> - main_channel_push_agent_connected(reds->main_channel);
> - } else {
> + if (red_channel_waits_for_migrate_data(&reds->main_channel->base) || reds->agent_state.mig_data) {
> spice_debug("waiting for migration data");
> if (!spice_char_device_client_exists(reds->agent_state.base, reds_get_client())) {
> int client_added;
> @@ -2889,9 +2880,13 @@ static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
> spice_warning("failed to add client to agent");
> reds_disconnect();
> }
> -
> }
> + } else {
> + /* we will associate the client with the char device, upon reds_on_main_agent_start,
> + * in response to MSGC_AGENT_START */
> + main_channel_push_agent_connected(reds->main_channel);
> }
> +
> return state->base;
> }
>
> @@ -2987,6 +2982,13 @@ static int spice_server_char_device_add_interface(SpiceServer *s,
> } else {
> dev_state = spicevmc_device_connect(char_device, SPICE_CHANNEL_PORT);
> }
> + } else if (strcmp(char_device->subtype, SUBTYPE_VDAGENT) == 0) {
> + if (reds->agent_state.mig_data) {
> + spice_assert(reds->agent_state.plug_generation == 1);
> + reds_agent_state_restore(reds->agent_state.mig_data);
> + free(reds->agent_state.mig_data);
> + reds->agent_state.mig_data = NULL;
> + }
> }
Since this if statement started with
if (strcmp(char_device->subtype, SUBTYPE_VDAGENT) == 0)
and was followed by else if statements the lines added here will not be
reached.
Thanks,
Uri.
More information about the Spice-devel
mailing list