[PATCH v2 09/15] xen/gntdev: use mmu_range_notifier_insert

Fri Nov 1 17:48:27 UTC 2019

On Wed, Oct 30, 2019 at 12:55:37PM -0400, Boris Ostrovsky wrote:
> On 10/28/19 4:10 PM, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg at mellanox.com>
> >
> > gntdev simply wants to monitor a specific VMA for any notifier events,
> > this can be done straightforwardly using mmu_range_notifier_insert() over
> > the VMA's VA range.
> >
> > The notifier should be attached until the original VMA is destroyed.
> >
> > It is unclear if any of this is even sane, but at least a lot of duplicate
> > code is removed.
> 
> I didn't have a chance to look at the patch itself yet but as a heads-up
> --- it crashes dom0.

Thanks Boris. I spent a bit of time and got a VM running with a xen
4.9 hypervisor and a kernel with this patch series. It a ubuntu bionic
VM with the distro's xen stuff.

Can you give some guidance how you made it crash? I see the VM
autoloaded gntdev:

Module                  Size  Used by
xen_gntdev             24576  2
xen_evtchn             16384  1
xenfs                  16384  1
xen_privcmd            24576  16 xenfs

And lsof says several xen processes have the chardev open:

xenstored  819                 root   13u      CHR              10,53      0t0      19595 /dev/xen/gntdev
xenconsol  857                 root    8u      CHR              10,53      0t0      19595 /dev/xen/gntdev
xenconsol  857 860             root    8u      CHR              10,53      0t0      19595 /dev/xen/gntdev

But no crashing..

However, I wasn't able to get my usual debug kernel .config to boot
with the xen hypervisor, it crashes on early boot with:

(XEN) Dom0 has maximum 8 VCPUs
(XEN) Scrubbing Free RAM on 1 nodes using 8 CPUs
(XEN) .done.
(XEN) Initial low memory virq threshold set at 0x1000 pages.
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 468kB init memory
(XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0002]
(XEN) Pagetable walk from fffffbfff0480fbe:
(XEN)  L4[0x1f7] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S: fault at ffff82d080348a06 entry.o#create_bounce_frame+0x135/0x15f
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.9.2  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff82b9f731>]
(XEN) RFLAGS: 0000000000000296   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: fffffbfff0480fbe   rbx: 0000000000000000   rcx: 00000000c0000101
(XEN) rdx: 00000000ffffffff   rsi: ffffffff84026000   rdi: ffffffff82cb4a20
(XEN) rbp: ffffffff82407ff8   rsp: ffffffff82407da0   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 1ffffffff0480fbe   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000003506e0
(XEN) cr3: 0000000034027000   cr2: fffffbfff0480fbe
(XEN) fsb: 0000000000000000   gsb: ffffffff82b61000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033

Which is surely some .config issue, but I didn't figure out what.

Jason