[Spice-devel] client_migrate_info - do we need a new command?

Tue Dec 13 10:19:14 PST 2011

In our call today, Avi asked that we evaluate whether the interface for 
client_migrate_info is the Right Interface before we introduce a new command to 
work around the fact that async commands are broken.

I looked into this today and here's what I came to.

1) What are the failure scenarios?

The issue is qerror_report().  Roughly speaking, qerror_report either prints to 
stderr or it associates an error with the current monitor command.

The problem with this is that qerror_report() is used all over the code base 
today and if an error occurs in a device that has nothing to do with the 
command, instead of printing to stderr, the command will fail with a bizarre 
error reason (even though it really succeeded).

2) Does the command have the right semantics?

The command has the following doc:

client_migrate_info
------------------

Set the spice/vnc connection info for the migration target.  The spice/vnc
server will ask the spice/vnc client to automatically reconnect using the
new parameters (if specified) once the vm migration finished successfully.

Arguments:

- "protocol":     protocol: "spice" or "vnc" (json-string)
- "hostname":     migration target hostname (json-string)
- "port":         spice/vnc tcp port for plaintext channels (json-int, optional)
- "tls-port":     spice tcp port for tls-secured channels (json-int, optional)
- "cert-subject": server certificate subject (json-string, optional)

Example:

-> { "execute": "client_migrate_info",
      "arguments": { "protocol": "spice",
                     "hostname": "virt42.lab.kraxel.org",
                     "port": 1234 } }
<- { "return": {} }

Originally, the command was a normal sync command and my understanding is that 
it simply posted notification to the clients.  Apparently, users of the 
interface need to actually know when the client has Ack'd this operation because 
otherwise it's racy since a disconnect may occur before the client processes the 
redirection.

OTOH, that means that what we really need is 1) tell connected clients that they 
need to redirect 2) notification when/if connected clients are prepared to redirect.

The trouble with using a async command for this is that the time between (1) & 
(2) may be arbitrarily long.  Since most QMP clients today always use a NULL 
tag, that effectively means the monitor is blocked for an arbitrarily long time 
while this operation is in flight.

I don't know if libspice uses a timeout for this operation, but if it doesn't, 
this could block arbitrarily long.  Even with tagging, we don't have a way to 
cancel in flight commands so blocking for arbitrary time periods is problematic.

I think splitting this into two commands, one that requests the clients to 
redirect and then an event that lets a tool know that the clients are ready to 
migrate ends up being nicer.  It means that we never end up with a blocked QMP 
session and clients are more likely to properly deal with the fact that an event 
may take arbitrarily long to happen.

Clients can also implement their own cancel logic by choosing to stop waiting 
for an event to happen and then ignoring spurious events.

So regardless of the async issue, I think splitting this command is the right 
thing to do long term.

Regards,

Anthony Liguori