[Spice-commits] 12 commits - arch_init.c docs/xbzrle.txt hw/nvram include/migration include/qemu migration/fd.c migration/migration.c migration/qemu-file-buf.c migration/qemu-file-unix.c migration/qemu-file.c page_cache.c target-arm/crypto_helper.c tests/test-vmstate.c

Mon Jan 19 22:55:18 PST 2015

arch_init.c                    |    8 +-
 docs/xbzrle.txt                |    8 ++
 hw/nvram/fw_cfg.c              |   41 ++------------
 include/migration/page_cache.h |   10 ++-
 include/migration/qemu-file.h  |   10 +++
 include/migration/vmstate.h    |    2 
 include/qemu/sockets.h         |    7 ++
 migration/fd.c                 |   24 +++++++-
 migration/migration.c          |   12 ++++
 migration/qemu-file-buf.c      |   10 ++-
 migration/qemu-file-unix.c     |   23 ++++++--
 migration/qemu-file.c          |   12 ++++
 page_cache.c                   |   43 +++++++++------
 target-arm/crypto_helper.c     |  114 ++++++++++++++++++++++-------------------
 tests/test-vmstate.c           |   20 ++-----
 15 files changed, 213 insertions(+), 131 deletions(-)

New commits:
commit 1e42c353469cb58ca4f3b450eea4211af7d0b147
Merge: e68cba3 36b62ae
Author: Peter Maydell <peter.maydell at linaro.org>
Date:   Fri Jan 16 12:06:41 2015 +0000

    Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150116' into staging
    
    target-arm queue:
     * fix endianness handling in fwcfg wide registers
     * fix broken crypto insn emulation on big endian hosts
    
    # gpg: Signature made Fri 16 Jan 2015 12:04:08 GMT using RSA key ID 14360CDE
    # gpg: Good signature from "Peter Maydell <peter.maydell at linaro.org>"
    
    * remotes/pmaydell/tags/pull-target-arm-20150116:
      fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write()
      target-arm: crypto: fix BE host support
    
    Signed-off-by: Peter Maydell <peter.maydell at linaro.org>

commit 36b62ae6a58f9a588fd33be9386e18a2b90103f5
Author: Laszlo Ersek <lersek at redhat.com>
Date:   Fri Jan 16 11:54:30 2015 +0000

    fw_cfg: fix endianness in fw_cfg_data_mem_read() / _write()
    
    (1) Let's contemplate what device endianness means, for a memory mapped
    device register (independently of QEMU -- that is, on physical hardware).
    
    It determines the byte order that the device will put on the data bus when
    the device is producing a *numerical value* for the CPU. This byte order
    may differ from the CPU's own byte order, therefore when software wants to
    consume the *numerical value*, it may have to swap the byte order first.
    
    For example, suppose we have a device that exposes in a 2-byte register
    the number of sheep we have to count before falling asleep. If the value
    is decimal 37 (0x0025), then a big endian register will produce [0x00,
    0x25], while a little endian register will produce [0x25, 0x00].
    
    If the device register is big endian, but the CPU is little endian, the
    numerical value will read as 0x2500 (decimal 9472), which software has to
    byte swap before use.
    
    However... if we ask the device about who stole our herd of sheep, and it
    answers "XY", then the byte representation coming out of the register must
    be [0x58, 0x59], regardless of the device register's endianness for
    numeric values. And, software needs to copy these bytes into a string
    field regardless of the CPU's own endianness.
    
    (2) QEMU's device register accessor functions work with *numerical values*
    exclusively, not strings:
    
    The emulated register's read accessor function returns the numerical value
    (eg. 37 decimal, 0x0025) as a *host-encoded* uint64_t. QEMU translates
    this value for the guest to the endianness of the emulated device register
    (which is recorded in MemoryRegionOps.endianness). Then guest code must
    translate the numerical value from device register to guest CPU
    endianness, before including it in any computation (see (1)).
    
    (3) However, the data register of the fw_cfg device shall transfer strings
    *only* -- that is, opaque blobs. Interpretation of any given blob is
    subject to further agreement -- it can be an integer in an independently
    determined byte order, or a genuine string, or an array of structs of
    integers (in some byte order) and fixed size strings, and so on.
    
    Because register emulation in QEMU is integer-preserving, not
    string-preserving (see (2)), we have to jump through a few hoops.
    
    (3a) We defined the memory mapped fw_cfg data register as
    DEVICE_BIG_ENDIAN.
    
    The particular choice is not really relevant -- we picked BE only for
    consistency with the control register, which *does* transfer integers --
    but our choice affects how we must host-encode values from fw_cfg strings.
    
    (3b) Since we want the fw_cfg string "XY" to appear as the [0x58, 0x59]
    array on the data register, *and* we picked DEVICE_BIG_ENDIAN, we must
    compose the host (== C language) value 0x5859 in the read accessor
    function.
    
    (3c) When the guest performs the read access, the immediate uint16_t value
    will be 0x5958 (in LE guests) and 0x5859 (in BE guests). However, the
    uint16_t value does not matter. The only thing that matters is the byte
    pattern [0x58, 0x59], which the guest code must copy into the target
    string *without* any byte-swapping.
    
    (4) Now I get to explain where I screwed up. :(
    
    When we decided for big endian *integer* representation in the MMIO data
    register -- see (3a) --, I mindlessly added an indiscriminate
    byte-swizzling step to the (little endian) guest firmware.
    
    This was a grave error -- it violates (3c) --, but I didn't realize it. I
    only saw that the code I otherwise intended for fw_cfg_data_mem_read():
    
        value = 0;
        for (i = 0; i < size; ++i) {
            value = (value << 8) | fw_cfg_read(s);
        }
    
    didn't produce the expected result in the guest.
    
    In true facepalm style, instead of blaming my guest code (which violated
    (3c)), I blamed my host code (which was correct). Ultimately, I coded
    ldX_he_p() into fw_cfg_data_mem_read(), because that happened to work.
    
    Obviously (...in retrospect) that was wrong. Only because my host happened
    to be LE, ldX_he_p() composed the (otherwise incorrect) host value 0x5958
    from the fw_cfg string "XY". And that happened to compensate for the bogus
    indiscriminate byte-swizzling in my guest code.
    
    Clearly the current code leaks the host endianness through to the guest,
    which is wrong. Any device should work the same regardless of host
    endianness.
    
    The solution is to compose the host-endian representation (2) of the big
    endian interpretation (3a, 3b) of the fw_cfg string, and to drop the wrong
    byte-swizzling in the guest (3c).
    
    Brown paper bag time for me.
    
    Signed-off-by: Laszlo Ersek <lersek at redhat.com>
    Message-id: 1420024880-15416-1-git-send-email-lersek at redhat.com
    Reviewed-by: Peter Maydell <peter.maydell at linaro.org>
    Signed-off-by: Peter Maydell <peter.maydell at linaro.org>

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index fcdf821..78a37be 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -287,51 +287,24 @@ static uint64_t fw_cfg_data_mem_read(void *opaque, hwaddr addr,
                                      unsigned size)
 {
     FWCfgState *s = opaque;
-    uint8_t buf[8];
+    uint64_t value = 0;
     unsigned i;
 
     for (i = 0; i < size; ++i) {
-        buf[i] = fw_cfg_read(s);
+        value = (value << 8) | fw_cfg_read(s);
     }
-    switch (size) {
-    case 1:
-        return buf[0];
-    case 2:
-        return lduw_he_p(buf);
-    case 4:
-        return (uint32_t)ldl_he_p(buf);
-    case 8:
-        return ldq_he_p(buf);
-    }
-    abort();
+    return value;
 }
 
 static void fw_cfg_data_mem_write(void *opaque, hwaddr addr,
                                   uint64_t value, unsigned size)
 {
     FWCfgState *s = opaque;
-    uint8_t buf[8];
-    unsigned i;
+    unsigned i = size;
 
-    switch (size) {
-    case 1:
-        buf[0] = value;
-        break;
-    case 2:
-        stw_he_p(buf, value);
-        break;
-    case 4:
-        stl_he_p(buf, value);
-        break;
-    case 8:
-        stq_he_p(buf, value);
-        break;
-    default:
-        abort();
-    }
-    for (i = 0; i < size; ++i) {
-        fw_cfg_write(s, buf[i]);
-    }
+    do {
+        fw_cfg_write(s, value >> (8 * --i));
+    } while (i);
 }
 
 static bool fw_cfg_data_mem_valid(void *opaque, hwaddr addr,
commit b449ca3c1874418d948878d5417a32fc0dbf9fea
Author: Ard Biesheuvel <ard.biesheuvel at linaro.org>
Date:   Fri Jan 16 11:54:29 2015 +0000

    target-arm: crypto: fix BE host support
    
    The crypto emulation code in target-arm/crypto_helper.c never worked
    correctly on big endian hosts, due to the fact that it uses a union
    of array types to convert between the native VFP register size (64
    bits) and the types used in the algorithms (bytes and 32 bit words)
    
    We cannot just swab between LE and BE when reading and writing the
    registers, as the SHA code performs word additions, so instead, add
    array accessors for the CRYPTO_STATE type whose LE and BE specific
    implementations ensure that the correct array elements are referenced.
    
    Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
    Acked-by: Laszlo Ersek <lersek at redhat.com>
    Message-id: 1420208303-24111-1-git-send-email-ard.biesheuvel at linaro.org
    Reviewed-by: Peter Maydell <peter.maydell at linaro.org>
    Signed-off-by: Peter Maydell <peter.maydell at linaro.org>

diff --git a/target-arm/crypto_helper.c b/target-arm/crypto_helper.c
index dd60d0b..1fe975d 100644
--- a/target-arm/crypto_helper.c
+++ b/target-arm/crypto_helper.c
@@ -22,6 +22,14 @@ union CRYPTO_STATE {
     uint64_t   l[2];
 };
 
+#ifdef HOST_WORDS_BIGENDIAN
+#define CR_ST_BYTE(state, i)   (state.bytes[(15 - (i)) ^ 8])
+#define CR_ST_WORD(state, i)   (state.words[(3 - (i)) ^ 2])
+#else
+#define CR_ST_BYTE(state, i)   (state.bytes[i])
+#define CR_ST_WORD(state, i)   (state.words[i])
+#endif
+
 void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
                          uint32_t decrypt)
 {
@@ -46,7 +54,7 @@ void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
 
     /* combine ShiftRows operation and sbox substitution */
     for (i = 0; i < 16; i++) {
-        st.bytes[i] = sbox[decrypt][rk.bytes[shift[decrypt][i]]];
+        CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])];
     }
 
     env->vfp.regs[rd] = make_float64(st.l[0]);
@@ -198,11 +206,11 @@ void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm,
     assert(decrypt < 2);
 
     for (i = 0; i < 16; i += 4) {
-        st.words[i >> 2] = cpu_to_le32(
-            mc[decrypt][st.bytes[i]] ^
-            rol32(mc[decrypt][st.bytes[i + 1]], 8) ^
-            rol32(mc[decrypt][st.bytes[i + 2]], 16) ^
-            rol32(mc[decrypt][st.bytes[i + 3]], 24));
+        CR_ST_WORD(st, i >> 2) =
+            mc[decrypt][CR_ST_BYTE(st, i)] ^
+            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
+            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
+            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
     }
 
     env->vfp.regs[rd] = make_float64(st.l[0]);
@@ -255,24 +263,25 @@ void HELPER(crypto_sha1_3reg)(CPUARMState *env, uint32_t rd, uint32_t rn,
 
             switch (op) {
             case 0: /* sha1c */
-                t = cho(d.words[1], d.words[2], d.words[3]);
+                t = cho(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
                 break;
             case 1: /* sha1p */
-                t = par(d.words[1], d.words[2], d.words[3]);
+                t = par(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
                 break;
             case 2: /* sha1m */
-                t = maj(d.words[1], d.words[2], d.words[3]);
+                t = maj(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
                 break;
             default:
                 g_assert_not_reached();
             }
-            t += rol32(d.words[0], 5) + n.words[0] + m.words[i];
-
-            n.words[0] = d.words[3];
-            d.words[3] = d.words[2];
-            d.words[2] = ror32(d.words[1], 2);
-            d.words[1] = d.words[0];
-            d.words[0] = t;
+            t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0)
+                 + CR_ST_WORD(m, i);
+
+            CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3);
+            CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+            CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2);
+            CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+            CR_ST_WORD(d, 0) = t;
         }
     }
     env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -286,8 +295,8 @@ void HELPER(crypto_sha1h)(CPUARMState *env, uint32_t rd, uint32_t rm)
         float64_val(env->vfp.regs[rm + 1])
     } };
 
-    m.words[0] = ror32(m.words[0], 2);
-    m.words[1] = m.words[2] = m.words[3] = 0;
+    CR_ST_WORD(m, 0) = ror32(CR_ST_WORD(m, 0), 2);
+    CR_ST_WORD(m, 1) = CR_ST_WORD(m, 2) = CR_ST_WORD(m, 3) = 0;
 
     env->vfp.regs[rd] = make_float64(m.l[0]);
     env->vfp.regs[rd + 1] = make_float64(m.l[1]);
@@ -304,10 +313,10 @@ void HELPER(crypto_sha1su1)(CPUARMState *env, uint32_t rd, uint32_t rm)
         float64_val(env->vfp.regs[rm + 1])
     } };
 
-    d.words[0] = rol32(d.words[0] ^ m.words[1], 1);
-    d.words[1] = rol32(d.words[1] ^ m.words[2], 1);
-    d.words[2] = rol32(d.words[2] ^ m.words[3], 1);
-    d.words[3] = rol32(d.words[3] ^ d.words[0], 1);
+    CR_ST_WORD(d, 0) = rol32(CR_ST_WORD(d, 0) ^ CR_ST_WORD(m, 1), 1);
+    CR_ST_WORD(d, 1) = rol32(CR_ST_WORD(d, 1) ^ CR_ST_WORD(m, 2), 1);
+    CR_ST_WORD(d, 2) = rol32(CR_ST_WORD(d, 2) ^ CR_ST_WORD(m, 3), 1);
+    CR_ST_WORD(d, 3) = rol32(CR_ST_WORD(d, 3) ^ CR_ST_WORD(d, 0), 1);
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
     env->vfp.regs[rd + 1] = make_float64(d.l[1]);
@@ -356,20 +365,22 @@ void HELPER(crypto_sha256h)(CPUARMState *env, uint32_t rd, uint32_t rn,
     int i;
 
     for (i = 0; i < 4; i++) {
-        uint32_t t = cho(n.words[0], n.words[1], n.words[2]) + n.words[3]
-                     + S1(n.words[0]) + m.words[i];
-
-        n.words[3] = n.words[2];
-        n.words[2] = n.words[1];
-        n.words[1] = n.words[0];
-        n.words[0] = d.words[3] + t;
-
-        t += maj(d.words[0], d.words[1], d.words[2]) + S0(d.words[0]);
-
-        d.words[3] = d.words[2];
-        d.words[2] = d.words[1];
-        d.words[1] = d.words[0];
-        d.words[0] = t;
+        uint32_t t = cho(CR_ST_WORD(n, 0), CR_ST_WORD(n, 1), CR_ST_WORD(n, 2))
+                     + CR_ST_WORD(n, 3) + S1(CR_ST_WORD(n, 0))
+                     + CR_ST_WORD(m, i);
+
+        CR_ST_WORD(n, 3) = CR_ST_WORD(n, 2);
+        CR_ST_WORD(n, 2) = CR_ST_WORD(n, 1);
+        CR_ST_WORD(n, 1) = CR_ST_WORD(n, 0);
+        CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3) + t;
+
+        t += maj(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
+             + S0(CR_ST_WORD(d, 0));
+
+        CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+        CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
+        CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+        CR_ST_WORD(d, 0) = t;
     }
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -394,13 +405,14 @@ void HELPER(crypto_sha256h2)(CPUARMState *env, uint32_t rd, uint32_t rn,
     int i;
 
     for (i = 0; i < 4; i++) {
-        uint32_t t = cho(d.words[0], d.words[1], d.words[2]) + d.words[3]
-                     + S1(d.words[0]) + m.words[i];
-
-        d.words[3] = d.words[2];
-        d.words[2] = d.words[1];
-        d.words[1] = d.words[0];
-        d.words[0] = n.words[3 - i] + t;
+        uint32_t t = cho(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
+                     + CR_ST_WORD(d, 3) + S1(CR_ST_WORD(d, 0))
+                     + CR_ST_WORD(m, i);
+
+        CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+        CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
+        CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+        CR_ST_WORD(d, 0) = CR_ST_WORD(n, 3 - i) + t;
     }
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -418,10 +430,10 @@ void HELPER(crypto_sha256su0)(CPUARMState *env, uint32_t rd, uint32_t rm)
         float64_val(env->vfp.regs[rm + 1])
     } };
 
-    d.words[0] += s0(d.words[1]);
-    d.words[1] += s0(d.words[2]);
-    d.words[2] += s0(d.words[3]);
-    d.words[3] += s0(m.words[0]);
+    CR_ST_WORD(d, 0) += s0(CR_ST_WORD(d, 1));
+    CR_ST_WORD(d, 1) += s0(CR_ST_WORD(d, 2));
+    CR_ST_WORD(d, 2) += s0(CR_ST_WORD(d, 3));
+    CR_ST_WORD(d, 3) += s0(CR_ST_WORD(m, 0));
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
     env->vfp.regs[rd + 1] = make_float64(d.l[1]);
@@ -443,10 +455,10 @@ void HELPER(crypto_sha256su1)(CPUARMState *env, uint32_t rd, uint32_t rn,
         float64_val(env->vfp.regs[rm + 1])
     } };
 
-    d.words[0] += s1(m.words[2]) + n.words[1];
-    d.words[1] += s1(m.words[3]) + n.words[2];
-    d.words[2] += s1(d.words[0]) + n.words[3];
-    d.words[3] += s1(d.words[1]) + m.words[0];
+    CR_ST_WORD(d, 0) += s1(CR_ST_WORD(m, 2)) + CR_ST_WORD(n, 1);
+    CR_ST_WORD(d, 1) += s1(CR_ST_WORD(m, 3)) + CR_ST_WORD(n, 2);
+    CR_ST_WORD(d, 2) += s1(CR_ST_WORD(d, 0)) + CR_ST_WORD(n, 3);
+    CR_ST_WORD(d, 3) += s1(CR_ST_WORD(d, 1)) + CR_ST_WORD(m, 0);
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
     env->vfp.regs[rd + 1] = make_float64(d.l[1]);
commit e68cba36360a2ab5bf0576b66df4d0eb0d822f8d
Merge: df58887 ea987c2
Author: Peter Maydell <peter.maydell at linaro.org>
Date:   Fri Jan 16 10:16:14 2015 +0000

    Merge remote-tracking branch 'remotes/amit-migration/tags/mig-2.3-1' into staging
    
    A set of patches collected over the holidays.  Mix of optimizations and
    fixes.
    
    # gpg: Signature made Fri 16 Jan 2015 07:42:00 GMT using RSA key ID 854083B6
    # gpg: Good signature from "Amit Shah <amit at amitshah.net>"
    # gpg:                 aka "Amit Shah <amit at kernel.org>"
    # gpg:                 aka "Amit Shah <amitshah at gmx.net>"
    
    * remotes/amit-migration/tags/mig-2.3-1:
      vmstate: type-check sub-arrays
      migration_cancel: shutdown migration socket
      Handle bi-directional communication for fd migration
      socket shutdown
      Tests: QEMUSizedBuffer/QEMUBuffer
      QEMUSizedBuffer: only free qsb that qemu_bufopen allocated
      xbzrle: rebuild the cache_is_cached function
      xbzrle: optimize XBZRLE to decrease the cache misses
    
    Signed-off-by: Peter Maydell <peter.maydell at linaro.org>

commit ea987c2c21d4326bb58ee28f6888fdcf8fbda067
Author: Paolo Bonzini <pbonzini at redhat.com>
Date:   Wed Jan 7 15:12:13 2015 +0100

    vmstate: type-check sub-arrays
    
    While we cannot check against the type of the full array, we can check
    against the type of the fields.
    
    Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
    Signed-off-by: Amit Shah <amit.shah at redhat.com>

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index e45fc49..d712a65 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -189,7 +189,7 @@ extern const VMStateInfo vmstate_info_bitmap;
      type_check_2darray(_type, typeof_field(_state, _field), _n1, _n2))
 
 #define vmstate_offset_sub_array(_state, _field, _type, _start)      \
-    (offsetof(_state, _field[_start]))
+    vmstate_offset_value(_state, _field[_start], _type)
 
 #define vmstate_offset_buffer(_state, _field)                        \
     vmstate_offset_array(_state, _field, uint8_t,                    \
commit a26ba26e214911dc879a23e797d2c269cdb38577
Author: Dr. David Alan Gilbert <dgilbert at redhat.com>
Date:   Thu Jan 8 11:11:32 2015 +0000

    migration_cancel: shutdown migration socket
    
    Force shutdown on migration socket on cancel to cause the cancel
    to complete even if the socket is blocked on a dead network.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
    Reviewed-by: Paolo Bonzini <pbonzini at redhat.com>
    Reviewed-by: Amit Shah <amit.shah at redhat.com>
    Signed-off-by: Amit Shah <amit.shah at redhat.com>

diff --git a/migration/migration.c b/migration/migration.c
index c49a05a..b3adbc6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -330,6 +330,7 @@ void migrate_fd_error(MigrationState *s)
 static void migrate_fd_cancel(MigrationState *s)
 {
     int old_state ;
+    QEMUFile *f = migrate_get_current()->file;
     trace_migrate_fd_cancel();
 
     do {
@@ -339,6 +340,17 @@ static void migrate_fd_cancel(MigrationState *s)
         }
         migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
     } while (s->state != MIG_STATE_CANCELLING);
+
+    /*
+     * If we're unlucky the migration code might be stuck somewhere in a
+     * send/write while the network has failed and is waiting to timeout;
+     * if we've got shutdown(2) available then we can force it to quit.
+     * The outgoing qemu file gets closed in migrate_fd_cleanup that is
+     * called in a bh, so there is no race against this cancel.
+     */
+    if (s->state == MIG_STATE_CANCELLING && f) {
+        qemu_file_shutdown(f);
+    }
 }
 
 void add_migration_state_change_notifier(Notifier *notify)
commit 131fe9b843f9a1e55fcbf2457c9cb25c3711b9d8
Author: Cristian Klein <cristian.klein at cs.umu.se>
Date:   Thu Jan 8 11:11:31 2015 +0000

    Handle bi-directional communication for fd migration
    
    libvirt prefers opening the TCP connection itself, for two reasons.
    First, connection failed errors can be detected easier, without having
    to parse qemu's error output.
    Second, libvirt might be asked to secure the transfer by tunnelling the
    communication through an TLS layer.
    Therefore, libvirt opens the TCP connection itself and passes an FD to qemu
    using QMP and a POSIX-specific mechanism.
    
    Hence, in order to make the reverse-path work in such cases, qemu needs to
    distinguish if the transmitted FD is a socket (reverse-path available)
    or not (reverse-path might not be available) and use the corresponding
    abstraction.
    
    Signed-off-by: Cristian Klein <cristian.klein at cs.umu.se>
    Reviewed-by: Paolo Bonzini <pbonzini at redhat.com>
    Reviewed-by: Amit Shah <amit.shah at redhat.com>
    Signed-off-by: Amit Shah <amit.shah at redhat.com>

diff --git a/migration/fd.c b/migration/fd.c
index d2e523a..129da99 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -31,13 +31,29 @@
     do { } while (0)
 #endif
 
+static bool fd_is_socket(int fd)
+{
+    struct stat stat;
+    int ret = fstat(fd, &stat);
+    if (ret == -1) {
+        /* When in doubt say no */
+        return false;
+    }
+    return S_ISSOCK(stat.st_mode);
+}
+
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
 {
     int fd = monitor_get_fd(cur_mon, fdname, errp);
     if (fd == -1) {
         return;
     }
-    s->file = qemu_fdopen(fd, "wb");
+
+    if (fd_is_socket(fd)) {
+        s->file = qemu_fopen_socket(fd, "wb");
+    } else {
+        s->file = qemu_fdopen(fd, "wb");
+    }
 
     migrate_fd_connect(s);
 }
@@ -58,7 +74,11 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
     DPRINTF("Attempting to start an incoming migration via fd\n");
 
     fd = strtol(infd, NULL, 0);
-    f = qemu_fdopen(fd, "rb");
+    if (fd_is_socket(fd)) {
+        f = qemu_fopen_socket(fd, "rb");
+    } else {
+        f = qemu_fdopen(fd, "rb");
+    }
     if(f == NULL) {
         error_setg_errno(errp, errno, "failed to open the source descriptor");
         return;
commit e1a8c9b67fc97d293211773edcae9e8e2f3367ab
Author: Dr. David Alan Gilbert <dgilbert at redhat.com>
Date:   Thu Jan 8 11:11:30 2015 +0000

    socket shutdown
    
    Add QEMUFile interface to allow a socket to be 'shut down' - i.e. any
    reads/writes will fail (and any blocking read/write will be woken).
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
    Reviewed-by: Paolo Bonzini <pbonzini at redhat.com>
    Reviewed-by: Amit Shah <amit.shah at redhat.com>
    Signed-off-by: Amit Shah <amit.shah at redhat.com>

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 401676b..d843c00 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -84,6 +84,14 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                size_t size,
                                int *bytes_sent);
 
+/*
+ * Stop any read or write (depending on flags) on the underlying
+ * transport on the QEMUFile.
+ * Existing blocking reads/writes must be woken
+ * Returns 0 on success, -err on error
+ */
+typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr);
+
 typedef struct QEMUFileOps {
     QEMUFilePutBufferFunc *put_buffer;
     QEMUFileGetBufferFunc *get_buffer;
@@ -94,6 +102,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
 struct QEMUSizedBuffer {
@@ -177,6 +186,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
+int qemu_file_shutdown(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index f47dae6..7992ece 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -44,6 +44,13 @@ int socket_set_fast_reuse(int fd);
 int send_all(int fd, const void *buf, int len1);
 int recv_all(int fd, void *buf, int len1, bool single_read);
 
+#ifdef WIN32
+/* Windows has different names for the same constants with the same values */
+#define SHUT_RD   0
+#define SHUT_WR   1
+#define SHUT_RDWR 2
+#endif
+
 /* callback function for nonblocking connect
  * valid fd on success, negative error code on failure
  */
diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index 9682396..bfbc086 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -26,6 +26,7 @@
 #include "qemu/sockets.h"
 #include "block/coroutine.h"
 #include "migration/qemu-file.h"
+#include "migration/qemu-file-internal.h"
 
 typedef struct QEMUFileSocket {
     int fd;
@@ -84,6 +85,17 @@ static int socket_close(void *opaque)
     return 0;
 }
 
+static int socket_shutdown(void *opaque, bool rd, bool wr)
+{
+    QEMUFileSocket *s = opaque;
+
+    if (shutdown(s->fd, rd ? (wr ? SHUT_RDWR : SHUT_RD) : SHUT_WR)) {
+        return -errno;
+    } else {
+        return 0;
+    }
+}
+
 static ssize_t unix_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
                                   int64_t pos)
 {
@@ -192,15 +204,18 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }
 
 static const QEMUFileOps socket_read_ops = {
-    .get_fd =     socket_get_fd,
+    .get_fd     = socket_get_fd,
     .get_buffer = socket_get_buffer,
-    .close =      socket_close
+    .close      = socket_close,
+    .shut_down  = socket_shutdown
+
 };
 
 static const QEMUFileOps socket_write_ops = {
-    .get_fd =     socket_get_fd,
+    .get_fd        = socket_get_fd,
     .writev_buffer = socket_writev_buffer,
-    .close =      socket_close
+    .close         = socket_close,
+    .shut_down     = socket_shutdown
 };
 
 QEMUFile *qemu_fopen_socket(int fd, const char *mode)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index a7f2a34..edc2830 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,6 +30,18 @@
 #include "migration/qemu-file-internal.h"
 #include "trace.h"
 
+/*
+ * Stop a file from being read/written - not all backing files can do this
+ * typically only sockets can.
+ */
+int qemu_file_shutdown(QEMUFile *f)
+{
+    if (!f->ops->shut_down) {
+        return -ENOSYS;
+    }
+    return f->ops->shut_down(f->opaque, true, true);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
commit 8580b06498a5dffe554e7ac627726b1d7775c591
Author: Yang Hongyang <yanghy at cn.fujitsu.com>
Date:   Fri Dec 19 11:38:06 2014 +0800

    Tests: QEMUSizedBuffer/QEMUBuffer
    
    Modify some of tests/test-vmstate.c due to qemu_bufopen() change.
    If you create a QEMUSizedBuffer yourself, you have to explicitly
    free it.
    
    Signed-off-by: Yang Hongyang <yanghy at cn.fujitsu.com>
    Cc: Dr. David Alan Gilbert <dgilbert at redhat.com>
    Cc: Juan Quintela <quintela at redhat.com>
    Cc: Amit Shah <amit.shah at redhat.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
    Signed-off-by: Amit Shah <amit.shah at redhat.com>

diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index 5e0fd13..39b7b01 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -60,16 +60,6 @@ static QEMUFile *open_test_file(bool write)
     return qemu_fdopen(fd, write ? "wb" : "rb");
 }
 
-/* Open a read-only qemu-file from an existing memory block */
-static QEMUFile *open_mem_file_read(const void *data, size_t len)
-{
-    /* The qsb gets freed by qemu_fclose */
-    QEMUSizedBuffer *qsb = qsb_create(data, len);
-    g_assert(qsb);
-
-    return qemu_bufopen("r", qsb);
-}
-
 /*
  * Check that the contents of the memory-buffered file f match
  * the given size/data.
@@ -450,7 +440,9 @@ static void test_load_noskip(void)
         QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
     };
 
-    QEMUFile *loading = open_mem_file_read(buf, sizeof(buf));
+    QEMUSizedBuffer *qsb = qsb_create(buf, sizeof(buf));
+    g_assert(qsb);
+    QEMUFile *loading = qemu_bufopen("r", qsb);
     TestStruct obj = { .skip_c_e = false };
     vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
     g_assert(!qemu_file_get_error(loading));
@@ -461,6 +453,7 @@ static void test_load_noskip(void)
     g_assert_cmpint(obj.e, ==, 50);
     g_assert_cmpint(obj.f, ==, 60);
     qemu_fclose(loading);
+    qsb_free(qsb);
 }
 
 static void test_load_skip(void)
@@ -473,7 +466,9 @@ static void test_load_skip(void)
         QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
     };
 
-    QEMUFile *loading = open_mem_file_read(buf, sizeof(buf));
+    QEMUSizedBuffer *qsb = qsb_create(buf, sizeof(buf));
+    g_assert(qsb);
+    QEMUFile *loading = qemu_bufopen("r", qsb);
     TestStruct obj = { .skip_c_e = true, .c = 300, .e = 500 };
     vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
     g_assert(!qemu_file_get_error(loading));
@@ -484,6 +479,7 @@ static void test_load_skip(void)
     g_assert_cmpint(obj.e, ==, 500);
     g_assert_cmpint(obj.f, ==, 60);
     qemu_fclose(loading);
+    qsb_free(qsb);
 }
 
 int main(int argc, char **argv)
commit f018d8cd2123f495300935d5019931abbee4e5d9
Author: Yang Hongyang <yanghy at cn.fujitsu.com>
Date:   Fri Dec 19 11:38:05 2014 +0800

    QEMUSizedBuffer: only free qsb that qemu_bufopen allocated
    
    Only free qsb that qemu_bufopen allocated, and also allow
    qemu_bufopen accept qsb as input for write operation. It
    will make the API more logical:
    1.If you create the QEMUSizedBuffer yourself, you need to
      free it by using qsb_free() but not depends on other API
      like qemu_fclose.
    2.allow qemu_bufopen() accept QEMUSizedBuffer as input for
      write operation, otherwise, it will be a little strange
      for this API won't accept the second parameter.
    
    This brings API change, since there are only 3
    users of this API currently, this change only impact the
    first one which will be fixed in patch 2 of this patchset,
    so I think it is safe to do this change.
    
    1     70  tests/test-vmstate.c <<open_mem_file_read>>
                return qemu_bufopen("r", qsb);
    2    404  tests/test-vmstate.c <<test_save_noskip>>
                QEMUFile *fsave = qemu_bufopen("w", NULL);
    3    424  tests/test-vmstate.c <<test_save_skip>>
                QEMUFile *fsave = qemu_bufopen("w", NULL);
    
    Signed-off-by: Yang Hongyang <yanghy at cn.fujitsu.com>
    Cc: Dr. David Alan Gilbert <dgilbert at redhat.com>
    Cc: Juan Quintela <quintela at redhat.com>
    Cc: Amit Shah <amit.shah at redhat.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
    Signed-off-by: Amit Shah <amit.shah at redhat.com>

diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index d33dd44..e97e0bd 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -395,6 +395,7 @@ QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *qsb)
 typedef struct QEMUBuffer {
     QEMUSizedBuffer *qsb;
     QEMUFile *file;
+    bool qsb_allocated;
 } QEMUBuffer;
 
 static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
@@ -424,7 +425,9 @@ static int buf_close(void *opaque)
 {
     QEMUBuffer *s = opaque;
 
-    qsb_free(s->qsb);
+    if (s->qsb_allocated) {
+        qsb_free(s->qsb);
+    }
 
     g_free(s);
 
@@ -463,12 +466,11 @@ QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input)
     }
 
     s = g_malloc0(sizeof(QEMUBuffer));
-    if (mode[0] == 'r') {
-        s->qsb = input;
-    }
+    s->qsb = input;
 
     if (s->qsb == NULL) {
         s->qsb = qsb_create(NULL, 0);
+        s->qsb_allocated = true;
     }
     if (!s->qsb) {
         g_free(s);
commit 1b826f277814dd9496fe3cc71cbe6ab7b203cadf
Author: ChenLiang <chenliang88 at huawei.com>
Date:   Mon Nov 24 19:55:48 2014 +0800

    xbzrle: rebuild the cache_is_cached function
    
    Rebuild the cache_is_cached function by cache_get_by_addr. And
    drops the asserts because the caller is also asserting the same
    thing.
    
    Signed-off-by: ChenLiang <chenliang88 at huawei.com>
    Signed-off-by: Gonglei <arei.gonglei at huawei.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert at redhat.com>
    Signed-off-by: Amit Shah <amit.shah at redhat.com>

diff --git a/page_cache.c b/page_cache.c
index aa68192..cf8878d 100644
--- a/page_cache.c
+++ b/page_cache.c
@@ -125,24 +125,6 @@ static size_t cache_get_cache_pos(const PageCache *cache,
     return pos;
 }
 
-bool cache_is_cached(const PageCache *cache, uint64_t addr,
-                     uint64_t current_age)
-{
-    size_t pos;
-
-    g_assert(cache);
-    g_assert(cache->page_cache);
-
-    pos = cache_get_cache_pos(cache, addr);
-
-    if (cache->page_cache[pos].it_addr == addr) {
-        /* update the it_age when the cache hit */
-        cache->page_cache[pos].it_age = current_age;
-        return true;
-    }
-    return false;
-}
-
 static CacheItem *cache_get_by_addr(const PageCache *cache, uint64_t addr)
 {
     size_t pos;
@@ -160,14 +142,26 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr)
     return cache_get_by_addr(cache, addr)->it_data;
 }
 
+bool cache_is_cached(const PageCache *cache, uint64_t addr,
+                     uint64_t current_age)
+{
+    CacheItem *it;
+
+    it = cache_get_by_addr(cache, addr);
+
+    if (it->it_addr == addr) {
+        /* update the it_age when the cache hit */
+        it->it_age = current_age;
+        return true;
+    }
+    return false;
+}
+
 int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata,
                  uint64_t current_age)
 {
 
-    CacheItem *it = NULL;
-
-    g_assert(cache);
-    g_assert(cache->page_cache);
+    CacheItem *it;
 
     /* actual update of entry */
     it = cache_get_by_addr(cache, addr);
commit 27af7d6ea5015e5ef1f7985eab94a8a218267a2b
Author: ChenLiang <chenliang88 at huawei.com>
Date:   Mon Nov 24 19:55:47 2014 +0800

    xbzrle: optimize XBZRLE to decrease the cache misses
    
    Avoid hot pages being replaced by others to remarkably decrease cache
    misses
    
    Sample results with the test program which quote from xbzrle.txt ran in
    vm:(migrate bandwidth:1GE and xbzrle cache size 8MB)
    
    the test program:
    
    include <stdlib.h>
    include <stdio.h>
    int main()
     {
            char *buf = (char *) calloc(4096, 4096);
            while (1) {
                int i;
                for (i = 0; i < 4096 * 4; i++) {
                    buf[i * 4096 / 4]++;
                }
                printf(".");
            }
     }
    
    before this patch:
    virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
    {"return":{"expected-downtime":1020,"xbzrle-cache":{"bytes":1108284,
    "cache-size":8388608,"cache-miss-rate":0.987013,"pages":18297,"overflow":8,
    "cache-miss":1228737},"status":"active","setup-time":10,"total-time":52398,
    "ram":{"total":12466991104,"remaining":1695744,"mbps":935.559472,
    "transferred":5780760580,"dirty-sync-counter":271,"duplicate":2878530,
    "dirty-pages-rate":29130,"skipped":0,"normal-bytes":5748592640,
    "normal":1403465}},"id":"libvirt-706"}
    
    18k pages sent compressed in 52 seconds.
    cache-miss-rate is 98.7%, totally miss.
    
    after optimizing:
    virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
    {"return":{"expected-downtime":2054,"xbzrle-cache":{"bytes":5066763,
    "cache-size":8388608,"cache-miss-rate":0.485924,"pages":194823,"overflow":0,
    "cache-miss":210653},"status":"active","setup-time":11,"total-time":18729,
    "ram":{"total":12466991104,"remaining":3895296,"mbps":937.663549,
    "transferred":1615042219,"dirty-sync-counter":98,"duplicate":2869840,
    "dirty-pages-rate":58781,"skipped":0,"normal-bytes":1588404224,
    "normal":387794}},"id":"libvirt-266"}
    
    194k pages sent compressed in 18 seconds.
    The value of cache-miss-rate decrease to 48.59%.
    
    Signed-off-by: ChenLiang <chenliang88 at huawei.com>
    Signed-off-by: Gonglei <arei.gonglei at huawei.com>
    Reviewed-by: Eric Blake <eblake at redhat.com>
    Signed-off-by: Amit Shah <amit.shah at redhat.com>

diff --git a/arch_init.c b/arch_init.c
index cfedbf0..89c8fa4 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -346,7 +346,8 @@ static void xbzrle_cache_zero_page(ram_addr_t current_addr)
 
     /* We don't care if this fails to allocate a new cache page
      * as long as it updated an old one */
-    cache_insert(XBZRLE.cache, current_addr, ZERO_TARGET_PAGE);
+    cache_insert(XBZRLE.cache, current_addr, ZERO_TARGET_PAGE,
+                 bitmap_sync_count);
 }
 
 #define ENCODING_FLAG_XBZRLE 0x1
@@ -358,10 +359,11 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
     int encoded_len = 0, bytes_sent = -1;
     uint8_t *prev_cached_page;
 
-    if (!cache_is_cached(XBZRLE.cache, current_addr)) {
+    if (!cache_is_cached(XBZRLE.cache, current_addr, bitmap_sync_count)) {
         acct_info.xbzrle_cache_miss++;
         if (!last_stage) {
-            if (cache_insert(XBZRLE.cache, current_addr, *current_data) == -1) {
+            if (cache_insert(XBZRLE.cache, current_addr, *current_data,
+                             bitmap_sync_count) == -1) {
                 return -1;
             } else {
                 /* update *current_data when the page has been
diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
index cc3a26a..52c8511 100644
--- a/docs/xbzrle.txt
+++ b/docs/xbzrle.txt
@@ -71,6 +71,14 @@ encoded buffer:
 encoded length 24
 e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69
 
+Cache update strategy
+=====================
+Keeping the hot pages in the cache is effective for decreased cache
+misses. XBZRLE uses a counter as the age of each page. The counter will
+increase after each ram dirty bitmap sync. When a cache conflict is
+detected, XBZRLE will only evict pages in the cache that are older than
+a threshold.
+
 Usage
 ======================
 1. Verify the destination QEMU version is able to decode the new format.
diff --git a/include/migration/page_cache.h b/include/migration/page_cache.h
index 2d5ce2d..10ed532 100644
--- a/include/migration/page_cache.h
+++ b/include/migration/page_cache.h
@@ -43,8 +43,10 @@ void cache_fini(PageCache *cache);
  *
  * @cache pointer to the PageCache struct
  * @addr: page addr
+ * @current_age: current bitmap generation
  */
-bool cache_is_cached(const PageCache *cache, uint64_t addr);
+bool cache_is_cached(const PageCache *cache, uint64_t addr,
+                     uint64_t current_age);
 
 /**
  * get_cached_data: Get the data cached for an addr
@@ -60,13 +62,15 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr);
  * cache_insert: insert the page into the cache. the page cache
  * will dup the data on insert. the previous value will be overwritten
  *
- * Returns -1 on error
+ * Returns -1 when the page isn't inserted into cache
  *
  * @cache pointer to the PageCache struct
  * @addr: page address
  * @pdata: pointer to the page
+ * @current_age: current bitmap generation
  */
-int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata);
+int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata,
+                 uint64_t current_age);
 
 /**
  * cache_resize: resize the page cache. In case of size reduction the extra
diff --git a/page_cache.c b/page_cache.c
index 89bb1ec..aa68192 100644
--- a/page_cache.c
+++ b/page_cache.c
@@ -33,6 +33,9 @@
     do { } while (0)
 #endif
 
+/* the page in cache will not be replaced in two cycles */
+#define CACHED_PAGE_LIFETIME 2
+
 typedef struct CacheItem CacheItem;
 
 struct CacheItem {
@@ -122,7 +125,8 @@ static size_t cache_get_cache_pos(const PageCache *cache,
     return pos;
 }
 
-bool cache_is_cached(const PageCache *cache, uint64_t addr)
+bool cache_is_cached(const PageCache *cache, uint64_t addr,
+                     uint64_t current_age)
 {
     size_t pos;
 
@@ -131,7 +135,12 @@ bool cache_is_cached(const PageCache *cache, uint64_t addr)
 
     pos = cache_get_cache_pos(cache, addr);
 
-    return (cache->page_cache[pos].it_addr == addr);
+    if (cache->page_cache[pos].it_addr == addr) {
+        /* update the it_age when the cache hit */
+        cache->page_cache[pos].it_age = current_age;
+        return true;
+    }
+    return false;
 }
 
 static CacheItem *cache_get_by_addr(const PageCache *cache, uint64_t addr)
@@ -151,7 +160,8 @@ uint8_t *get_cached_data(const PageCache *cache, uint64_t addr)
     return cache_get_by_addr(cache, addr)->it_data;
 }
 
-int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata)
+int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata,
+                 uint64_t current_age)
 {
 
     CacheItem *it = NULL;
@@ -162,6 +172,11 @@ int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata)
     /* actual update of entry */
     it = cache_get_by_addr(cache, addr);
 
+    if (it->it_data && it->it_addr != addr &&
+        it->it_age + CACHED_PAGE_LIFETIME > current_age) {
+        /* the cache page is fresh, don't replace it */
+        return -1;
+    }
     /* allocate page */
     if (!it->it_data) {
         it->it_data = g_try_malloc(cache->page_size);
@@ -174,7 +189,7 @@ int cache_insert(PageCache *cache, uint64_t addr, const uint8_t *pdata)
 
     memcpy(it->it_data, pdata, cache->page_size);
 
-    it->it_age = ++cache->max_item_age;
+    it->it_age = current_age;
     it->it_addr = addr;
 
     return 0;