[Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support

Shameerali Kolothum Thodi shameerali.kolothum.thodi at huawei.com
Tue Mar 14 11:38:11 UTC 2023



> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 08 March 2023 15:55
> To: 'Nicolin Chen' <nicolinc at nvidia.com>
> Cc: Xu, Terrence <terrence.xu at intel.com>; Liu, Yi L <yi.l.liu at intel.com>;
> Jason Gunthorpe <jgg at nvidia.com>; alex.williamson at redhat.com; Tian,
> Kevin <kevin.tian at intel.com>; joro at 8bytes.org; robin.murphy at arm.com;
> cohuck at redhat.com; eric.auger at redhat.com; kvm at vger.kernel.org;
> mjrosato at linux.ibm.com; chao.p.peng at linux.intel.com;
> yi.y.sun at linux.intel.com; peterx at redhat.com; jasowang at redhat.com;
> lulu at redhat.com; suravee.suthikulpanit at amd.com;
> intel-gvt-dev at lists.freedesktop.org; intel-gfx at lists.freedesktop.org;
> linux-s390 at vger.kernel.org; Hao, Xudong <xudong.hao at intel.com>; Zhao,
> Yan Y <yan.y.zhao at intel.com>
> Subject: RE: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 

[...]
> > > > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum
> > > > Thodi
> > > > wrote:
> > > >
> > > > > Hi Nicolin,
> > > > >
> > > > > Thanks for the latest ARM64 branch. Do you have a working Qemu
> > > > > branch
> > > > corresponding to the
> > > > > above one?
> > > > >
> > > > > I tried the
> > > >
> >
> https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> > > > smmuv3
> > > > > but for some reason not able to launch the Guest.
> > > > >
> > > > > Please let me know.
> > > >
> > > > I do use that branch. It might not be that robust though as it
> > > > went through a big rebase.
> > >
> > > Ok. The issue seems to be quite random in nature and only happens
> > > when there are multiple vCPUs. Also doesn't look like related to
> > > VFIO device assignment as I can reproduce Guest hang without it by
> > > only having nested-smmuv3 and iommufd object.
> > >
> > > ./qemu-system-aarch64-iommuf -machine
> > > virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
> > -enable-kvm
> > > -cpu host -m 1G -smp cpus=8,maxcpus=8 \ -object
> iommufd,id=iommufd0
> > \
> > > -bios QEMU_EFI.fd \ -kernel Image-6.2-iommufd \ -initrd
> > > rootfs-iperf.cpio \ -net none \ -nographic \ -append "rdinit=init
> > > console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ -trace
> > > events=events \ -D trace_iommufd
> > >
> > > When the issue happens, no output on terminal as if Qemu is in a
> > > locked
> > state.
> > >
> > >  Can you try with the followings?
> > > >
> > > > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*"
> > > > --trace "msi_*" --trace "nvme_*"
> > >
> > > The only trace events with above are this,
> > >
> > > iommufd_backend_connect fd=22 owned=1 users=1 (0) smmu_add_mr
> > > smmuv3-iommu-memory-region-0-0
> > >
> > > I haven't debugged this further. Please let me know if issue is
> > > reproducible with multiple vCPUs at your end. For now will focus on
> > > VFIO
> > dev specific tests.
> >
> > Oh. My test environment has been a single-core vCPU. So that doesn't
> > happen to me. Can you try a vanilla QEMU branch that our nesting
> > branch is rebased on? I took a branch from Yi as the baseline, while
> > he might take from Eric for the rfcv3.
> >
> > I am guessing that it might be an issue in the common tree.
> 
> Yes, that looks like the case.
> I tried with:
>  commit 13356edb8750("Merge tag 'block-pull-request' of
> https://gitlab.com/stefanha/qemu into staging")
> 
> And issue is still there. So hopefully once we rebase everything it will go
> away.

Hi Nicolin,

I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed
the above issue so far. However noticed couple of other issues when
we try to hot add/remove devices.

(qemu) device_del net1
qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for device
qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000101000, 0xf000) = -2 (No such file or directory)
qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000000000, 0x100000) = -2 (No such file or directory)
qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource busy

Ignoring the MMIO UNMAP errors, it looks like the object free is
not proper on dev removal path. I have few quick fixes here 
for this,
https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting

With the above, it seems the HWPT/IOAS objects are destroyed properly
on dev detach path. But when the dev is added back, gets a Qemu seg fault
and so far I have no clue why that happens.

(qemu) device_add vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1
./qemu_run-iommufd-nested: line 13:  7041 Segmentation fault
(core dumped) ./qemu-system-aarch64-iommufd
-machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0
-enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 -object
iommufd,id=iommufd0 -bios QEMU_EFI_Dec2018.fd -kernel
Image-iommufd -initrd rootfs-iperf.cpio -device
ioh3420,id=rp1 -device
vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1 -append
"rdinit=init console=ttyAMA0 root=/dev/vda rw
earlycon=pl011,0x9000000" -net none -nographic -trace events=events -D
trace_iommufd

There are no kernel log/crash and not much useful traces while this happens.
Understand these are early days and it is not robust in anyway, but please
let me know if you suspect anything. I will continue debugging and will update
if anything.

Thanks,
Shameer

[1] https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3




More information about the Intel-gfx mailing list