[Spice-commits] 12 commits - arch_init.c docs/xbzrle.txt hw/nvram include/migration include/qemu migration/fd.c migration/migration.c migration/qemu-file-buf.c migration/qemu-file-unix.c migration/qemu-file.c page_cache.c target-arm/crypto_helper.c tests/test-vmstate.c
Gerd Hoffmann
kraxel at kemper.freedesktop.org
Mon Jan 19 22:55:18 PST 2015
arch_init.c | 8 +-
docs/xbzrle.txt | 8 ++
hw/nvram/fw_cfg.c | 41 ++------------
include/migration/page_cache.h | 10 ++-
include/migration/qemu-file.h | 10 +++
include/migration/vmstate.h | 2
include/qemu/sockets.h | 7 ++
migration/fd.c | 24 +++++++-
migration/migration.c | 12 ++++
migration/qemu-file-buf.c | 10 ++-
migration/qemu-file-unix.c | 23 ++++++--
migration/qemu-file.c | 12 ++++
page_cache.c | 43 +++++++++------
target-arm/crypto_helper.c | 114 ++++++++++++++++++++++-------------------
tests/test-vmstate.c | 20 ++-----
15 files changed, 213 insertions(+), 131 deletions(-)
New commits:
commit 1e42c353469cb58ca4f3b450eea4211af7d0b147
Merge: e68cba3 36b62ae
Author: Peter Maydell <peter.maydell at linaro.org>
Date: Fri Jan 16 12:06:41 2015 +0000
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150116' into staging
target-arm queue:
* fix endianness handling in fwcfg wide registers
* fix broken crypto insn emulation on big endian hosts
# gpg: Signature made Fri 16 Jan 2015 12:04:08 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell at linaro.org>"
* remotes/pmaydell/tags/pull-target-arm-20150116:
fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write()
target-arm: crypto: fix BE host support
Signed-off-by: Peter Maydell <peter.maydell at linaro.org>
commit 36b62ae6a58f9a588fd33be9386e18a2b90103f5
Author: Laszlo Ersek <lersek at redhat.com>
Date: Fri Jan 16 11:54:30 2015 +0000
fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write()
(1) Let's contemplate what device endianness means, for a memory mapped
device register (independently of QEMU -- that is, on physical hardware).
It determines the byte order that the device will put on the data bus when
the device is producing a *numerical value* for the CPU. This byte order
may differ from the CPU's own byte order, therefore when software wants to
consume the *numerical value*, it may have to swap the byte order first.
For example, suppose we have a device that exposes in a 2-byte register
the number of sheep we have to count before falling asleep. If the value
is decimal 37 (0x0025), then a big endian register will produce [0x00,
0x25], while a little endian register will produce [0x25, 0x00].
If the device register is big endian, but the CPU is little endian, the
numerical value will read as 0x2500 (decimal 9472), which software has to
byte swap before use.
However... if we ask the device about who stole our herd of sheep, and it
answers "XY", then the byte representation coming out of the register must
be [0x58, 0x59], regardless of the device register's endianness for
numeric values. And, software needs to copy these bytes into a string
field regardless of the CPU's own endianness.
(2) QEMU's device register accessor functions work with *numerical values*
exclusively, not strings:
The emulated register's read accessor function returns the numerical value
(eg. 37 decimal, 0x0025) as a *host-encoded* uint64_t. QEMU translates
this value for the guest to the endianness of the emulated device register
(which is recorded in MemoryRegionOps.endianness). Then guest code must
translate the numerical value from device register to guest CPU
endianness, before including it in any computation (see (1)).
(3) However, the data register of the fw_cfg device shall transfer strings
*only* -- that is, opaque blobs. Interpretation of any given blob is
subject to further agreement -- it can be an integer in an independently
determined byte order, or a genuine string, or an array of structs of
integers (in some byte order) and fixed size strings, and so on.
Because register emulation in QEMU is integer-preserving, not
string-preserving (see (2)), we have to jump through a few hoops.
(3a) We defined the memory mapped fw_cfg data register as
DEVICE_BIG_ENDIAN.
The particular choice is not really relevant -- we picked BE only for
consistency with the control register, which *does* transfer integers --
but our choice affects how we must host-encode values from fw_cfg strings.
(3b) Since we want the fw_cfg string "XY" to appear as the [0x58, 0x59]
array on the data register, *and* we picked DEVICE_BIG_ENDIAN, we must
compose the host (== C language) value 0x5859 in the read accessor
function.
(3c) When the guest performs the read access, the immediate uint16_t value
will be 0x5958 (in LE guests) and 0x5859 (in BE guests). However, the
uint16_t value does not matter. The only thing that matters is the byte
pattern [0x58, 0x59], which the guest code must copy into the target
string *without* any byte-swapping.
(4) Now I get to explain where I screwed up. :(
When we decided for big endian *integer* representation in the MMIO data
register -- see (3a) --, I mindlessly added an indiscriminate
byte-swizzling step to the (little endian) guest firmware.
This was a grave error -- it violates (3c) --, but I didn't realize it. I
only saw that the code I otherwise intended for fw_cfg_data_mem_read():
value = 0;
for (i = 0; i < size; ++i) {
value = (value << 8) | fw_cfg_read(s);
}
didn't produce the expected result in the guest.
In true facepalm style, instead of blaming my guest code (which violated
(3c)), I blamed my host code (which was correct). Ultimately, I coded
ldX_he_p() into fw_cfg_data_mem_read(), because that happened to work.
Obviously (...in retrospect) that was wrong. Only because my host happened
to be LE, ldX_he_p() composed the (otherwise incorrect) host value 0x5958
from the fw_cfg string "XY". And that happened to compensate for the bogus
indiscriminate byte-swizzling in my guest code.
Clearly the current code leaks the host endianness through to the guest,
which is wrong. Any device should work the same regardless of host
endianness.
The solution is to compose the host-endian representation (2) of the big
endian interpretation (3a, 3b) of the fw_cfg string, and to drop the wrong
byte-swizzling in the guest (3c).
Brown paper bag time for me.
Signed-off-by: Laszlo Ersek <lersek at redhat.com>
Message-id: 1420024880-15416-1-git-send-email-lersek at redhat.com
Reviewed-by: Peter Maydell <peter.maydell at linaro.org>
Signed-off-by: Peter Maydell <peter.maydell at linaro.org>
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index fcdf821..78a37be 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -287,51 +287,24 @@ static uint64_t fw_cfg_data_mem_read(void *opaque, hwaddr addr,
unsigned size)
{
FWCfgState *s = opaque;
- uint8_t buf[8];
+ uint64_t value = 0;
unsigned i;
for (i = 0; i < size; ++i) {
- buf[i] = fw_cfg_read(s);
+ value = (value << 8) | fw_cfg_read(s);
}
- switch (size) {
- case 1:
- return buf[0];
- case 2:
- return lduw_he_p(buf);
- case 4:
- return (uint32_t)ldl_he_p(buf);
- case 8:
- return ldq_he_p(buf);
- }
- abort();
+ return value;
}
static void fw_cfg_data_mem_write(void *opaque, hwaddr addr,
uint64_t value, unsigned size)
{
FWCfgState *s = opaque;
- uint8_t buf[8];
- unsigned i;
+ unsigned i = size;
- switch (size) {
- case 1:
- buf[0] = value;
- break;
- case 2:
- stw_he_p(buf, value);
- break;
- case 4:
- stl_he_p(buf, value);
- break;
- case 8:
- stq_he_p(buf, value);
- break;
- default:
- abort();
- }
- for (i = 0; i < size; ++i) {
- fw_cfg_write(s, buf[i]);
- }
+ do {
+ fw_cfg_write(s, value >> (8 * --i));
+ } while (i);
}
static bool fw_cfg_data_mem_valid(void *opaque, hwaddr addr,
commit b449ca3c1874418d948878d5417a32fc0dbf9fea
Author: Ard Biesheuvel <ard.biesheuvel at linaro.org>
Date: Fri Jan 16 11:54:29 2015 +0000
target-arm: crypto: fix BE host support
The crypto emulation code in target-arm/crypto_helper.c never worked
correctly on big endian hosts, due to the fact that it uses a union
of array types to convert between the native VFP register size (64
bits) and the types used in the algorithms (bytes and 32 bit words)
We cannot just swab between LE and BE when reading and writing the
registers, as the SHA code performs word additions, so instead, add
array accessors for the CRYPTO_STATE type whose LE and BE specific
implementations ensure that the correct array elements are referenced.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
Acked-by: Laszlo Ersek <lersek at redhat.com>
Message-id: 1420208303-24111-1-git-send-email-ard.biesheuvel at linaro.org
Reviewed-by: Peter Maydell <peter.maydell at linaro.org>
Signed-off-by: Peter Maydell <peter.maydell at linaro.org>
diff --git a/target-arm/crypto_helper.c b/target-arm/crypto_helper.c
index dd60d0b..1fe975d 100644
--- a/target-arm/crypto_helper.c
+++ b/target-arm/crypto_helper.c
@@ -22,6 +22,14 @@ union CRYPTO_STATE {
uint64_t l[2];
};
+#ifdef HOST_WORDS_BIGENDIAN
+#define CR_ST_BYTE(state, i) (state.bytes[(15 - (i)) ^ 8])
+#define CR_ST_WORD(state, i) (state.words[(3 - (i)) ^ 2])
+#else
+#define CR_ST_BYTE(state, i) (state.bytes[i])
+#define CR_ST_WORD(state, i) (state.words[i])
+#endif
+
void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
uint32_t decrypt)
{
@@ -46,7 +54,7 @@ void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
/* combine ShiftRows operation and sbox substitution */
for (i = 0; i < 16; i++) {
- st.bytes[i] = sbox[decrypt][rk.bytes[shift[decrypt][i]]];
+ CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])];
}
env->vfp.regs[rd] = make_float64(st.l[0]);
@@ -198,11 +206,11 @@ void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm,
assert(decrypt < 2);
for (i = 0; i < 16; i += 4) {
- st.words[i >> 2] = cpu_to_le32(
- mc[decrypt][st.bytes[i]] ^
- rol32(mc[decrypt][st.bytes[i + 1]], 8) ^
- rol32(mc[decrypt][st.bytes[i + 2]], 16) ^
- rol32(mc[decrypt][st.bytes[i + 3]], 24));
+ CR_ST_WORD(st, i >> 2) =
+ mc[decrypt][CR_ST_BYTE(st, i)] ^
+ rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
+ rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
+ rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
}
env->vfp.regs[rd] = make_float64(st.l[0]);
@@ -255,24 +263,25 @@ void HELPER(crypto_sha1_3reg)(CPUARMState *env, uint32_t rd, uint32_t rn,
switch (op) {
case 0: /* sha1c */
- t = cho(d.words[1], d.words[2], d.words[3]);
+ t = cho(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
break;
case 1: /* sha1p */
- t = par(d.words[1], d.words[2], d.words[3]);
+ t = par(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
break;
case 2: /* sha1m */
- t = maj(d.words[1], d.words[2], d.words[3]);
+ t = maj(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
break;
default:
g_assert_not_reached();
}
- t += rol32(d.words[0], 5) + n.words[0] + m.words[i];
-
- n.words[0] = d.words[3];
- d.words[3] = d.words[2];
- d.words[2] = ror32(d.words[1], 2);
- d.words[1] = d.words[0];
- d.words[0] = t;
+ t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0)
+ + CR_ST_WORD(m, i);
+
+ CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3);
+ CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+ CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2);
+ CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+ CR_ST_WORD(d, 0) = t;
}
}
env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -286,8 +295,8 @@ void HELPER(crypto_sha1h)(CPUARMState *env, uint32_t rd, uint32_t rm)
float64_val(env->vfp.regs[rm + 1])
} };
- m.words[0] = ror32(m.words[0], 2);
- m.words[1] = m.words[2] = m.words[3] = 0;
+ CR_ST_WORD(m, 0) = ror32(CR_ST_WORD(m, 0), 2);
+ CR_ST_WORD(m, 1) = CR_ST_WORD(m, 2) = CR_ST_WORD(m, 3) = 0;
env->vfp.regs[rd] = make_float64(m.l[0]);
env->vfp.regs[rd + 1] = make_float64(m.l[1]);
@@ -304,10 +313,10 @@ void HELPER(crypto_sha1su1)(CPUARMState *env, uint32_t rd, uint32_t rm)
float64_val(env->vfp.regs[rm + 1])
} };
- d.words[0] = rol32(d.words[0] ^ m.words[1], 1);
- d.words[1] = rol32(d.words[1] ^ m.words[2], 1);
- d.words[2] = rol32(d.words[2] ^ m.words[3], 1);
- d.words[3] = rol32(d.words[3] ^ d.words[0], 1);
+ CR_ST_WORD(d, 0) = rol32(CR_ST_WORD(d, 0) ^ CR_ST_WORD(m, 1), 1);
+ CR_ST_WORD(d, 1) = rol32(CR_ST_WORD(d, 1) ^ CR_ST_WORD(m, 2), 1);
+ CR_ST_WORD(d, 2) = rol32(CR_ST_WORD(d, 2) ^ CR_ST_WORD(m, 3), 1);
+ CR_ST_WORD(d, 3) = rol32(CR_ST_WORD(d, 3) ^ CR_ST_WORD(d, 0), 1);
env->vfp.regs[rd] = make_float64(d.l[0]);
env->vfp.regs[rd + 1] = make_float64(d.l[1]);
@@ -356,20 +365,22 @@ void HELPER(crypto_sha256h)(CPUARMState *env, uint32_t rd, uint32_t rn,
int i;
for (i = 0; i < 4; i++) {
- uint32_t t = cho(n.words[0], n.words[1], n.words[2]) + n.words[3]
- + S1(n.words[0]) + m.words[i];
-
- n.words[3] = n.words[2];
- n.words[2] = n.words[1];
- n.words[1] = n.words[0];
- n.words[0] = d.words[3] + t;
-
- t += maj(d.words[0], d.words[1], d.words[2]) + S0(d.words[0]);
-
- d.words[3] = d.words[2];
- d.words[2] = d.words[1];
- d.words[1] = d.words[0];
- d.words[0] = t;
+ uint32_t t = cho(CR_ST_WORD(n, 0), CR_ST_WORD(n, 1), CR_ST_WORD(n, 2))
+ + CR_ST_WORD(n, 3) + S1(CR_ST_WORD(n, 0))
+ + CR_ST_WORD(m, i);
+
+ CR_ST_WORD(n, 3) = CR_ST_WORD(n, 2);
+ CR_ST_WORD(n, 2) = CR_ST_WORD(n, 1);
+ CR_ST_WORD(n, 1) = CR_ST_WORD(n, 0);
+ CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3) + t;
+
+ t += maj(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
+ + S0(CR_ST_WORD(d, 0));
+
+ CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+ CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
+ CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+ CR_ST_WORD(d, 0) = t;
}
env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -394,13 +405,14 @@ void HELPER(crypto_sha256h2)(CPUARMState *env, uint32_t rd, uint32_t rn,
int i;
for (i = 0; i < 4; i++) {
- uint32_t t = cho(d.words[0], d.words[1], d.words[2]) + d.words[3]
- + S1(d.words[0]) + m.words[i];
-
- d.words[3] = d.words[2];
- d.words[2] = d.words[1];
- d.words[1] = d.words[0];
- d.words[0] = n.words[3 - i] + t;
+ uint32_t t = cho(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
+ + CR_ST_WORD(d, 3) + S1(CR_ST_WORD(d, 0))
+ + CR_ST_WORD(m, i);
+
+ CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+ CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
+ CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+ CR_ST_WORD(d, 0) = CR_ST_WORD(n, 3 - i) + t;
}
env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -418,10 +430,10 @@ void HELPER(crypto_sha256su0)(CPUARMState *env, uint32_t rd, uint32_t rm)
float64_val(env->vfp.regs[rm + 1])
} };
- d.words[0] += s0(d.words[1]);
- d.words[1] += s0(d.words[2]);
- d.words[2] += s0(d.words[3]);
- d.words[3] += s0(m.words[0]);
+ CR_ST_WORD(d, 0) += s0(CR_ST_WORD(d, 1));
+ CR_ST_WORD(d, 1) += s0(CR_ST_WORD(d, 2));
+ CR_ST_WORD(d, 2) += s0(CR_ST_WORD(d, 3));
+ CR_ST_WORD(d, 3) += s0(CR_ST_WORD(m, 0));
env->vfp.regs[rd] = make_float64(d.l[0]);
env->vfp.regs[rd + 1] = make_float64(d.l[1]);
@@ -443,10 +455,10 @@ void HELPER(crypto_sha256su1)(CPUARMState *env, uint32_t rd, uint32_t rn,
float64_val(env->vfp.regs[rm + 1])
} };
- d.words[0] += s1(m.words[2]) + n.words[1];
- d.words[1] += s1(m.words[3]) + n.words[2];
- d.words[2] += s1(d.words[0]) + n.words[3];
- d.words[3] += s1(d.words[1]) + m.words[0];
+ CR_ST_WORD(d, 0) += s1(CR_ST_WORD(m, 2)) + CR_ST_WORD(n, 1);
+ CR_ST_WORD(d, 1) += s1(CR_ST_WORD(m, 3)) + CR_ST_WORD(n, 2);
+ CR_ST_WORD(d, 2) += s1(CR_ST_WORD(d, 0)) + CR_ST_WORD(n, 3);
+ CR_ST_WORD(d, 3) += s1(CR_ST_WORD(d, 1)) + CR_ST_WORD(m, 0);
env->vfp.regs[rd] = make_float64(d.l[0]);
env->vfp.regs[rd + 1] = make_float64(d.l[1]);
commit e68cba36360a2ab5bf0576b66df4d0eb0d822f8d
Merge: df58887 ea987c2
Author: Peter Maydell <peter.maydell at linaro.org>
Date: Fri Jan 16 10:16:14 2015 +0000
Merge remote-tracking branch 'remotes/amit-migration/tags/mig-2.3-1' into staging
A set of patches collected over the holidays. Mix of optimizations and
fixes.
# gpg: Signature made Fri 16 Jan 2015 07:42:00 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <amit at amitshah.net>"
# gpg: aka "Amit Shah <amit at kernel.org>"
# gpg: aka "Amit Shah <amitshah at gmx.net>"
* remotes/amit-migration/tags/mig-2.3-1:
vmstate: type-check sub-arrays
migration_cancel: shutdown migration socket
Handle bi-directional communication for fd migration
socket shutdown
Tests: QEMUSizedBuffer/QEMUBuffer
QEMUSizedBuffer: only free qsb that qemu_bufopen allocated
xbzrle: rebuild the cache_is_cached function
xbzrle: optimize XBZRLE to decrease the cache misses
Signed-off-by: Peter Maydell <peter.maydell at linaro.org>
commit ea987c2c21d4326bb58ee28f6888fdcf8fbda067
Author: Paolo Bonzini <pbonzini at redhat.com>
Date: Wed Jan 7 15:12:13 2015 +0100
vmstate: type-check sub-arrays
While we cannot check against the type of the full array, we can check
against the type of the fields.
Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
Signed-off-by: Amit Shah <amit.shah at redhat.com>
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index e45fc49..d712a65 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -189,7 +189,7 @@ extern const VMStateInfo vmstate_info_bitmap;
type_check_2darray(_type, typeof_field(_state, _field), _n1, _n2))
#define vmstate_offset_sub_array(_state, _field, _type, _start) \
- (offsetof(_state, _field[_start]))
+ vmstate_offset_value(_state, _field[_start], _type)
#define vmstate_offset_buffer(_state, _field) \
vmstate_offset_array(_state, _field, uint8_t, \
commit a26ba26e214911dc879a23e797d2c269cdb38577
Author: Dr. David Alan Gilbert <dgilbert at redhat.com>
Date: Thu Jan 8 11:11:32 2015 +0000
migration_cancel: shutdown migration socket
Force shutdown on migration socket on cancel to cause the cancel
to complete even if the socket is blocked on a dead network.
Signed-off-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini at redhat.com>
Reviewed-by: Amit Shah <amit.shah at redhat.com>
Signed-off-by: Amit Shah <amit.shah at redhat.com>
diff --git a/migration/migration.c b/migration/migration.c
index c49a05a..b3adbc6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -330,6 +330,7 @@ void migrate_fd_error(MigrationState *s)
static void migrate_fd_cancel(MigrationState *s)
{
int old_state ;
+ QEMUFile *f = migrate_get_current()->file;
trace_migrate_fd_cancel();
do {
@@ -339,6 +340,17 @@ static void migrate_fd_cancel(MigrationState *s)
}
migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
} while (s->state != MIG_STATE_CANCELLING);
+
+ /*
+ * If we're unlucky the migration code might be stuck somewhere in a
+ * send/write while the network has failed and is waiting to timeout;
+ * if we've got shutdown(2) available then we can force it to quit.
+ * The outgoing qemu file gets closed in migrate_fd_cleanup that is
+ * called in a bh, so there is no race against this cancel.
+ */
+ if (s->state == MIG_STATE_CANCELLING && f) {
+ qemu_file_shutdown(f);
+ }
}
void add_migration_state_change_notifier(Notifier *notify)
commit 131fe9b843f9a1e55fcbf2457c9cb25c3711b9d8
Author: Cristian Klein <cristian.klein at cs.umu.se>
Date: Thu Jan 8 11:11:31 2015 +0000
Handle bi-directional communication for fd migration
libvirt prefers opening the TCP connection itself, for two reasons.
First, connection failed errors can be detected easier, without having
to parse qemu's error output.
Second, libvirt might be asked to secure the transfer by tunnelling the
communication through an TLS layer.
Therefore, libvirt opens the TCP connection itself and passes an FD to qemu
using QMP and a POSIX-specific mechanism.
Hence, in order to make the reverse-path work in such cases, qemu needs to
distinguish if the transmitted FD is a socket (reverse-path available)
or not (reverse-path might not be available) and use the corresponding
abstraction.
Signed-off-by: Cristian Klein <cristian.klein at cs.umu.se>
Reviewed-by: Paolo Bonzini <pbonzini at redhat.com>
Reviewed-by: Amit Shah <amit.shah at redhat.com>
Signed-off-by: Amit Shah <amit.shah at redhat.com>
diff --git a/migration/fd.c b/migration/fd.c
index d2e523a..129da99 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -31,13 +31,29 @@
do { } while (0)
#endif
+static bool fd_is_socket(int fd)
+{
+ struct stat stat;
+ int ret = fstat(fd, &stat);
+ if (ret == -1) {
+ /* When in doubt say no */
+ return false;
+ }
+ return S_ISSOCK(stat.st_mode);
+}
+
void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
{
int fd = monitor_get_fd(cur_mon, fdname, errp);
if (fd == -1) {
return;
}
- s->file = qemu_fdopen(fd, "wb");
+
+ if (fd_is_socket(fd)) {
+ s->file = qemu_fopen_socket(fd, "wb");
+ } else {
+ s->file = qemu_fdopen(fd, "wb");
+ }
migrate_fd_connect(s);
}
@@ -58,7 +74,11 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
DPRINTF("Attempting to start an incoming migration via fd\n");
fd = strtol(infd, NULL, 0);
- f = qemu_fdopen(fd, "rb");
+ if (fd_is_socket(fd)) {
+ f = qemu_fopen_socket(fd, "rb");
+ } else {
+ f = qemu_fdopen(fd, "rb");
+ }
if(f == NULL) {
error_setg_errno(errp, errno, "failed to open the source descriptor");
return;
commit e1a8c9b67fc97d293211773edcae9e8e2f3367ab
Author: Dr. David Alan Gilbert <dgilbert at redhat.com>
Date: Thu Jan 8 11:11:30 2015 +0000
socket shutdown
Add QEMUFile interface to allow a socket to be 'shut down' - i.e. any
reads/writes will fail (and any blocking read/write will be woken).
Signed-off-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini at redhat.com>
Reviewed-by: Amit Shah <amit.shah at redhat.com>
Signed-off-by: Amit Shah <amit.shah at redhat.com>
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 401676b..d843c00 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -84,6 +84,14 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
size_t size,
int *bytes_sent);
+/*
+ * Stop any read or write (depending on flags) on the underlying
+ * transport on the QEMUFile.
+ * Existing blocking reads/writes must be woken
+ * Returns 0 on success, -err on error
+ */
+typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr);
+
typedef struct QEMUFileOps {
QEMUFilePutBufferFunc *put_buffer;
QEMUFileGetBufferFunc *get_buffer;
@@ -94,6 +102,7 @@ typedef struct QEMUFileOps {
QEMURamHookFunc *after_ram_iterate;
QEMURamHookFunc *hook_ram_load;
QEMURamSaveFunc *save_page;
+ QEMUFileShutdownFunc *shut_down;
} QEMUFileOps;
struct QEMUSizedBuffer {
@@ -177,6 +186,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
int64_t qemu_file_get_rate_limit(QEMUFile *f);
int qemu_file_get_error(QEMUFile *f);
void qemu_file_set_error(QEMUFile *f, int ret);
+int qemu_file_shutdown(QEMUFile *f);
void qemu_fflush(QEMUFile *f);
static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index f47dae6..7992ece 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -44,6 +44,13 @@ int socket_set_fast_reuse(int fd);
int send_all(int fd, const void *buf, int len1);
int recv_all(int fd, void *buf, int len1, bool single_read);
+#ifdef WIN32
+/* Windows has different names for the same constants with the same values */
+#define SHUT_RD 0
+#define SHUT_WR 1
+#define SHUT_RDWR 2
+#endif
+
/* callback function for nonblocking connect
* valid fd on success, negative error code on failure
*/
diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index 9682396..bfbc086 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -26,6 +26,7 @@
#include "qemu/sockets.h"
#include "block/coroutine.h"
#include "migration/qemu-file.h"
+#include "migration/qemu-file-internal.h"
typedef struct QEMUFileSocket {
int fd;
@@ -84,6 +85,17 @@ static int socket_close(void *opaque)
return 0;
}
+static int socket_shutdown(void *opaque, bool rd, bool wr)
+{
+ QEMUFileSocket *s = opaque;
+
+ if (shutdown(s->fd, rd ? (wr ? SHUT_RDWR : SHUT_RD) : SHUT_WR)) {
+ return -errno;
+ } else {
+ return 0;
+ }
+}
+
static ssize_t unix_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
int64_t pos)
{
@@ -192,15 +204,18 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
}
static const QEMUFileOps socket_read_ops = {
- .get_fd = socket_get_fd,
+ .get_fd = socket_get_fd,
.get_buffer = socket_get_buffer,
- .close = socket_close
+ .close = socket_close,
+ .shut_down = socket_shutdown
+
};
static const QEMUFileOps socket_write_ops = {
- .get_fd = socket_get_fd,
+ .get_fd = socket_get_fd,
.writev_buffer = socket_writev_buffer,
- .close = socket_close
+ .close = socket_close,
+ .shut_down = socket_shutdown
};
QEMUFile *qemu_fopen_socket(int fd, const char *mode)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index a7f2a34..edc2830 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,6 +30,18 @@
#include "migration/qemu-file-internal.h"
#include "trace.h"
+/*
+ * Stop a file from being read/written - not all backing files can do this
+ * typically only sockets can.
+ */
+int qemu_file_shutdown(QEMUFile *f)
+{
+ if (!f->ops->shut_down) {
+ return -ENOSYS;
+ }
+ return f->ops->shut_down(f->opaque, true, true);
+}
+
bool qemu_file_mode_is_not_valid(const char *mode)
{
if (mode == NULL ||
commit 8580b06498a5dffe554e7ac627726b1d7775c591
Author: Yang Hongyang <yanghy at cn.fujitsu.com>
Date: Fri Dec 19 11:38:06 2014 +0800
Tests: QEMUSizedBuffer/QEMUBuffer
Modify some of tests/test-vmstate.c due to qemu_bufopen() change.
If you create a QEMUSizedBuffer yourself, you have to explicitly
free it.
Signed-off-by: Yang Hongyang <yanghy at cn.fujitsu.com>
Cc: Dr. David Alan Gilbert <dgilbert at redhat.com>
Cc: Juan Quintela <quintela at redhat.com>
Cc: Amit Shah <amit.shah at redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
Signed-off-by: Amit Shah <amit.shah at redhat.com>
diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index 5e0fd13..39b7b01 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -60,16 +60,6 @@ static QEMUFile *open_test_file(bool write)
return qemu_fdopen(fd, write ? "wb" : "rb");
}
-/* Open a read-only qemu-file from an existing memory block */
-static QEMUFile *open_mem_file_read(const void *data, size_t len)
-{
- /* The qsb gets freed by qemu_fclose */
- QEMUSizedBuffer *qsb = qsb_create(data, len);
- g_assert(qsb);
-
- return qemu_bufopen("r", qsb);
-}
-
/*
* Check that the contents of the memory-buffered file f match
* the given size/data.
@@ -450,7 +440,9 @@ static void test_load_noskip(void)
QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
};
- QEMUFile *loading = open_mem_file_read(buf, sizeof(buf));
+ QEMUSizedBuffer *qsb = qsb_create(buf, sizeof(buf));
+ g_assert(qsb);
+ QEMUFile *loading = qemu_bufopen("r", qsb);
TestStruct obj = { .skip_c_e = false };
vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
g_assert(!qemu_file_get_error(loading));
@@ -461,6 +453,7 @@ static void test_load_noskip(void)
g_assert_cmpint(obj.e, ==, 50);
g_assert_cmpint(obj.f, ==, 60);
qemu_fclose(loading);
+ qsb_free(qsb);
}
static void test_load_skip(void)
@@ -473,7 +466,9 @@ static void test_load_skip(void)
QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
};
- QEMUFile *loading = open_mem_file_read(buf, sizeof(buf));
+ QEMUSizedBuffer *qsb = qsb_create(buf, sizeof(buf));
+ g_assert(qsb);
+ QEMUFile *loading = qemu_bufopen("r", qsb);
TestStruct obj = { .skip_c_e = true, .c = 300, .e = 500 };
vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
g_assert(!qemu_file_get_error(loading));
@@ -484,6 +479,7 @@ static void test_load_skip(void)
g_assert_cmpint(obj.e, ==, 500);
g_assert_cmpint(obj.f, ==, 60);
qemu_fclose(loading);
+ qsb_free(qsb);
}
int main(int argc, char **argv)
commit f018d8cd2123f495300935d5019931abbee4e5d9
Author: Yang Hongyang <yanghy at cn.fujitsu.com>
Date: Fri Dec 19 11:38:05 2014 +0800
QEMUSizedBuffer: only free qsb that qemu_bufopen allocated
Only free qsb that qemu_bufopen allocated, and also allow
qemu_bufopen accept qsb as input for write operation. It
will make the API more logical:
1.If you create the QEMUSizedBuffer yourself, you need to
free it by using qsb_free() but not depends on other API
like qemu_fclose.
2.allow qemu_bufopen() accept QEMUSizedBuffer as input for
write operation, otherwise, it will be a little strange
for this API won't accept the second parameter.
This brings API change, since there are only 3
users of this API currently, this change only impact the
first one which will be fixed in patch 2 of this patchset,
so I think it is safe to do this change.
1 70 tests/test-vmstate.c <<open_mem_file_read>>
return qemu_bufopen("r", qsb);
2 404 tests/test-vmstate.c <<test_save_noskip>>
QEMUFile *fsave = qemu_bufopen("w", NULL);
3 424 tests/test-vmstate.c <<test_save_skip>>
QEMUFile *fsave = qemu_bufopen("w", NULL);
Signed-off-by: Yang Hongyang <yanghy at cn.fujitsu.com>
Cc: Dr. David Alan Gilbert <dgilbert at redhat.com>
Cc: Juan Quintela <quintela at redhat.com>
Cc: Amit Shah <amit.shah at redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
Signed-off-by: Amit Shah <amit.shah at redhat.com>
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index d33dd44..e97e0bd 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -395,6 +395,7 @@ QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *qsb)
typedef struct QEMUBuffer {
QEMUSizedBuffer *qsb;
QEMUFile *file;
+ bool qsb_allocated;
} QEMUBuffer;
static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
@@ -424,7 +425,9 @@ static int buf_close(void *opaque)
{
QEMUBuffer *s = opaque;
- qsb_free(s->qsb);
+ if (s->qsb_allocated) {
+ qsb_free(s->qsb);
+ }
g_free(s);
@@ -463,12 +466,11 @@ QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input)
}
s = g_malloc0(sizeof(QEMUBuffer));
- if (mode[0] == 'r') {
- s->qsb = input;
- }
+ s->qsb = input;
if (s->qsb == NULL) {
s->qsb = qsb_create(NULL, 0);
+ s->qsb_allocated = true;
}
if (!s->qsb) {
g_free(s);
commit 1b826f277814dd9496fe3cc71cbe6ab7b203cadf
Author: ChenLiang <chenliang88 at huawei.com>
Date: Mon Nov 24 19:55:48 2014 +0800
xbzrle: rebuild the cache_is_cached function
Rebuild the cache_is_cached function by cache_get_by_addr. And
drops the asserts because the caller is also asserting the same
thing.
Signed-off-by: ChenLiang <chenliang88 at huawei.com>
Signed-off-by: Gonglei <arei.gonglei at huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
Signed-off-by: Amit Shah <amit.shah at redhat.com>
diff --git a/page_cache.c b/page_cache.c
index aa68192..cf8878d 100644
--- a/page_cache.c
+++ b/page_cache.c
@@ -125,24 +125,6 @@ static size_t cache_get_cache_pos(const PageCache *cache,
return pos;
}
-bool cache_is_cached(const PageCache *cache, uint64_t addr,
- uint64_t current_age)
-{
- size_t pos;
-
- g_assert(cache);
- g_assert(cache->page_cache);
-
- pos = cache_get_cache_pos(cache, addr);
-
- if (cache->page_cache[pos].it_addr == addr) {
- /* update the it_age when the cache hit */
- cache->page_cache[pos].it_age = current_age;
- return true;
- }
- return false;
-}
-
static CacheItem *cache_get_by_addr(const PageCache *cache, uint64_t addr)
{
size_t pos;
@@ -160,14 +142,26 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr)
return cache_get_by_addr(cache, addr)->it_data;
}
+bool cache_is_cached(const PageCache *cache, uint64_t addr,
+ uint64_t current_age)
+{
+ CacheItem *it;
+
+ it = cache_get_by_addr(cache, addr);
+
+ if (it->it_addr == addr) {
+ /* update the it_age when the cache hit */
+ it->it_age = current_age;
+ return true;
+ }
+ return false;
+}
+
int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata,
uint64_t current_age)
{
- CacheItem *it = NULL;
-
- g_assert(cache);
- g_assert(cache->page_cache);
+ CacheItem *it;
/* actual update of entry */
it = cache_get_by_addr(cache, addr);
commit 27af7d6ea5015e5ef1f7985eab94a8a218267a2b
Author: ChenLiang <chenliang88 at huawei.com>
Date: Mon Nov 24 19:55:47 2014 +0800
xbzrle: optimize XBZRLE to decrease the cache misses
Avoid hot pages being replaced by others to remarkably decrease cache
misses
Sample results with the test program which quote from xbzrle.txt ran in
vm:(migrate bandwidth:1GE and xbzrle cache size 8MB)
the test program:
include <stdlib.h>
include <stdio.h>
int main()
{
char *buf = (char *) calloc(4096, 4096);
while (1) {
int i;
for (i = 0; i < 4096 * 4; i++) {
buf[i * 4096 / 4]++;
}
printf(".");
}
}
before this patch:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":1020,"xbzrle-cache":{"bytes":1108284,
"cache-size":8388608,"cache-miss-rate":0.987013,"pages":18297,"overflow":8,
"cache-miss":1228737},"status":"active","setup-time":10,"total-time":52398,
"ram":{"total":12466991104,"remaining":1695744,"mbps":935.559472,
"transferred":5780760580,"dirty-sync-counter":271,"duplicate":2878530,
"dirty-pages-rate":29130,"skipped":0,"normal-bytes":5748592640,
"normal":1403465}},"id":"libvirt-706"}
18k pages sent compressed in 52 seconds.
cache-miss-rate is 98.7%, totally miss.
after optimizing:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":2054,"xbzrle-cache":{"bytes":5066763,
"cache-size":8388608,"cache-miss-rate":0.485924,"pages":194823,"overflow":0,
"cache-miss":210653},"status":"active","setup-time":11,"total-time":18729,
"ram":{"total":12466991104,"remaining":3895296,"mbps":937.663549,
"transferred":1615042219,"dirty-sync-counter":98,"duplicate":2869840,
"dirty-pages-rate":58781,"skipped":0,"normal-bytes":1588404224,
"normal":387794}},"id":"libvirt-266"}
194k pages sent compressed in 18 seconds.
The value of cache-miss-rate decrease to 48.59%.
Signed-off-by: ChenLiang <chenliang88 at huawei.com>
Signed-off-by: Gonglei <arei.gonglei at huawei.com>
Reviewed-by: Eric Blake <eblake at redhat.com>
Signed-off-by: Amit Shah <amit.shah at redhat.com>
diff --git a/arch_init.c b/arch_init.c
index cfedbf0..89c8fa4 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -346,7 +346,8 @@ static void xbzrle_cache_zero_page(ram_addr_t current_addr)
/* We don't care if this fails to allocate a new cache page
* as long as it updated an old one */
- cache_insert(XBZRLE.cache, current_addr, ZERO_TARGET_PAGE);
+ cache_insert(XBZRLE.cache, current_addr, ZERO_TARGET_PAGE,
+ bitmap_sync_count);
}
#define ENCODING_FLAG_XBZRLE 0x1
@@ -358,10 +359,11 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
int encoded_len = 0, bytes_sent = -1;
uint8_t *prev_cached_page;
- if (!cache_is_cached(XBZRLE.cache, current_addr)) {
+ if (!cache_is_cached(XBZRLE.cache, current_addr, bitmap_sync_count)) {
acct_info.xbzrle_cache_miss++;
if (!last_stage) {
- if (cache_insert(XBZRLE.cache, current_addr, *current_data) == -1) {
+ if (cache_insert(XBZRLE.cache, current_addr, *current_data,
+ bitmap_sync_count) == -1) {
return -1;
} else {
/* update *current_data when the page has been
diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
index cc3a26a..52c8511 100644
--- a/docs/xbzrle.txt
+++ b/docs/xbzrle.txt
@@ -71,6 +71,14 @@ encoded buffer:
encoded length 24
e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69
+Cache update strategy
+=====================
+Keeping the hot pages in the cache is effective for decreased cache
+misses. XBZRLE uses a counter as the age of each page. The counter will
+increase after each ram dirty bitmap sync. When a cache conflict is
+detected, XBZRLE will only evict pages in the cache that are older than
+a threshold.
+
Usage
======================
1. Verify the destination QEMU version is able to decode the new format.
diff --git a/include/migration/page_cache.h b/include/migration/page_cache.h
index 2d5ce2d..10ed532 100644
--- a/include/migration/page_cache.h
+++ b/include/migration/page_cache.h
@@ -43,8 +43,10 @@ void cache_fini(PageCache *cache);
*
* @cache pointer to the PageCache struct
* @addr: page addr
+ * @current_age: current bitmap generation
*/
-bool cache_is_cached(const PageCache *cache, uint64_t addr);
+bool cache_is_cached(const PageCache *cache, uint64_t addr,
+ uint64_t current_age);
/**
* get_cached_data: Get the data cached for an addr
@@ -60,13 +62,15 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr);
* cache_insert: insert the page into the cache. the page cache
* will dup the data on insert. the previous value will be overwritten
*
- * Returns -1 on error
+ * Returns -1 when the page isn't inserted into cache
*
* @cache pointer to the PageCache struct
* @addr: page address
* @pdata: pointer to the page
+ * @current_age: current bitmap generation
*/
-int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata);
+int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata,
+ uint64_t current_age);
/**
* cache_resize: resize the page cache. In case of size reduction the extra
diff --git a/page_cache.c b/page_cache.c
index 89bb1ec..aa68192 100644
--- a/page_cache.c
+++ b/page_cache.c
@@ -33,6 +33,9 @@
do { } while (0)
#endif
+/* the page in cache will not be replaced in two cycles */
+#define CACHED_PAGE_LIFETIME 2
+
typedef struct CacheItem CacheItem;
struct CacheItem {
@@ -122,7 +125,8 @@ static size_t cache_get_cache_pos(const PageCache *cache,
return pos;
}
-bool cache_is_cached(const PageCache *cache, uint64_t addr)
+bool cache_is_cached(const PageCache *cache, uint64_t addr,
+ uint64_t current_age)
{
size_t pos;
@@ -131,7 +135,12 @@ bool cache_is_cached(const PageCache *cache, uint64_t addr)
pos = cache_get_cache_pos(cache, addr);
- return (cache->page_cache[pos].it_addr == addr);
+ if (cache->page_cache[pos].it_addr == addr) {
+ /* update the it_age when the cache hit */
+ cache->page_cache[pos].it_age = current_age;
+ return true;
+ }
+ return false;
}
static CacheItem *cache_get_by_addr(const PageCache *cache, uint64_t addr)
@@ -151,7 +160,8 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr)
return cache_get_by_addr(cache, addr)->it_data;
}
-int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata)
+int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata,
+ uint64_t current_age)
{
CacheItem *it = NULL;
@@ -162,6 +172,11 @@ int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata)
/* actual update of entry */
it = cache_get_by_addr(cache, addr);
+ if (it->it_data && it->it_addr != addr &&
+ it->it_age + CACHED_PAGE_LIFETIME > current_age) {
+ /* the cache page is fresh, don't replace it */
+ return -1;
+ }
/* allocate page */
if (!it->it_data) {
it->it_data = g_try_malloc(cache->page_size);
@@ -174,7 +189,7 @@ int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata)
memcpy(it->it_data, pdata, cache->page_size);
- it->it_age = ++cache->max_item_age;
+ it->it_age = current_age;
it->it_addr = addr;
return 0;
More information about the Spice-commits
mailing list