Issue on unmap_grant_pages before releasing dmabuf

Oleksii Moisieiev Oleksii_Moisieiev at epam.com
Wed Oct 19 12:46:07 UTC 2022


On Wed, Oct 19, 2022 at 02:30:13PM +0200, Juergen Gross wrote:
> On 19.10.22 12:43, Oleksii Moisieiev wrote:
> > Greetings.
> > 
> > I need your advise about the problem I'm facing right now:
> > I'm working on the new type of dmabuf export implementation. This
> > is going to be new ioctl to the gntdev and it's purpose is to use
> > external buffer, provided by file descriptor as the backing storage
> > during export to grant refs.
> > Few words about the functionality I'm working on right now:
> > My setup is based on IMX8QM (please see PS below if you need
> > configuration details)
> > 
> > When using dma-buf exporter to create dma-buf with backing storage and
> > map it to the grant refs, provided from the domain, we've met a problem,
> > that several HW (i.MX8 gpu in our case) do not support external buffer
> > and requires backing storage to be created using it's native tools
> > (eglCreateImageKHR returns EGL_NO_IMAGE_KHR for buffers, which were not
> > created using gbm_bo_create).
> > That's why new ioctls were added to be able to pass existing dma-buffer
> > fd as input parameter and use it as backing storage to export to refs.
> > Kernel version on IMX8QM board is 5.10.72 and itworks fine on this kernel
> > version.
> > 
> > New ioctls source code can be found here:
> >   https://github.com/oleksiimoisieiev/linux/tree/gntdev_map_buf_upstr_for-linus-6.1_2
> > Now regarding the problem I've met when rebased those code on master:
> > On my test stand I use DRM_IOCTL_MODE_CREATE_DUMB/DRM_IOCTL_MODE_DESTROY_DUMB ioctls
> > to allocate buffer and I'm observing the following backtrace on DRM_IOCTL_MODE_DESTROY_DUMB:
> > 
> > Unable to handle kernel paging request at virtual address 0000000387000098
> > Mem abort info:
> >    ESR = 0x0000000096000005
> >    EC = 0x25: DABT (current EL), IL = 32 bits
> >    SET = 0, FnV = 0
> >    EA = 0, S1PTW = 0
> >    FSC = 0x05: level 1 translation fault
> > Data abort info:
> >    ISV = 0, ISS = 0x00000005
> >    CM = 0, WnR = 0
> > user pgtable: 4k pages, 48-bit VAs, pgdp=000000006df98000
> > [0000000387000098] pgd=0800000064f4f003, p4d=0800000064f4f003, pud=0000000000000000
> > Internal error: Oops: 96000005 [#1] PREEMPT SMP
> > Modules linked in: xen_pciback overlay crct10dif_ce ip_tables x_tables ipv6
> > PU: 0 PID: 34 Comm: kworker/0:1 Not tainted 6.0.0 #85
> > Hardware name: linux,dummy-virt (DT)
> > Workqueue: events virtio_gpu_dequeue_ctrl_func
> > pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : check_move_unevictable_folios+0xb8/0x4d0
> > lr : check_move_unevictable_folios+0xb4/0x4d0
> > sp : ffff8000081a3ad0
> > x29: ffff8000081a3ad0 x28: ffff03856ac98800 x27: 0000000000000000
> > x26: ffffde7b168ee9d8 x25: ffff03856ae26008 x24: 0000000000000000
> > x23: ffffde7b1758d6c0 x22: 0000000000000001 x21: ffff8000081a3b68
> > x20: 0000000000000001 x19: fffffc0e15935040 x18: ffffffffffffffff
> > x17: ffff250a68e3d000 x16: 0000000000000012 x15: ffff8000881a38d7
> > x14: 0000000000000000 x13: ffffde7b175a3150 x12: 0000000000002c55
> > x11: 0000000000000ec7 x10: ffffde7b176113f8 x9 : ffffde7b175a3150
> > x8 : 0000000100004ec7 x7 : ffffde7b175fb150 x6 : ffff8000081a3b70
> > x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffff03856ac98850
> > x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000387000000
> > Call trace:
> >   check_move_unevictable_folios+0xb8/0x4d0
> >   check_move_unevictable_pages+0x8c/0x110
> >   drm_gem_put_pages+0x118/0x198
> >   drm_gem_shmem_put_pages_locked+0x4c/0x70
> >   drm_gem_shmem_unpin+0x30/0x50
> >   virtio_gpu_cleanup_object+0x84/0x130
> >   virtio_gpu_cmd_unref_cb+0x18/0x2c
> >   virtio_gpu_dequeue_ctrl_func+0x124/0x290
> >   process_one_work+0x1d0/0x320
> >   worker_thread+0x14c/0x444
> >   kthread+0x10c/0x110
> >   ret_from_fork+0x10/0x20
> > Code: 97fc3fe1 aa1303e0 94003ac7 b4000080 (f9404c00)
> > ---[ end trace 0000000000000000 ]---
> > 
> > After some investigation I think I've found the cause of the problem:
> > This is the functionality, added in commit 3f9f1c67572f5e5e6dc84216d48d1480f3c4fcf6 which
> > introduces safe mechanism to unmap grant pages which is waiting until page_count(page) = 1
> > before doing unmap.
> > The problem is that DRM_IOCTL_MODE_CREATE_DUMB creates buffer where page_count(page) = 2.
> > 
> > On my QEMU test stand I'm using Xen 4.16 (aarch64) with debian based Dom0 + DomU on the latest
> > kernels.
> > I've created some apps for testing:
> > The first one is to allocate grant refs on DomU:
> > https://github.com/oleksiimoisieiev/linux/tree/gntdev_map_buf_upstr_for-linus-6.1_2
> > The name is test.ko and it can be built using command:
> > cd ./test; make
> > NOTE: makefile expects kernel build to be present on ../../test-build directory.
> > 
> > It should be run on DomU using command:
> > insmod test.ko; echo "create" > /sys/kernel/etx_sysfs/etx_value
> > 
> > Result will be the following:
> > [  126.104903] test: loading out-of-tree module taints kernel.
> > [  126.150586] Sysfs - Write!!!
> > [  126.150773] create
> > [  126.150773]
> > [  126.150888] Hello, World!
> > [  126.151203] Hello, World!
> > [  126.151324] gref 301
> > [  126.151376] addr ffff00000883d000
> > [  126.151431] gref 302
> > [  126.151454] addr ffff00000883e000
> > [  126.151478] gref 303
> > [  126.151497] addr ffff00000883f000
> > [  126.151525] gref 304
> > [  126.151546] addr ffff000008840000
> > [  126.151573] gref 305
> > [  126.151593] addr ffff000008841000
> > 
> > The second is for dom0 and can be found here:
> > https://github.com/oleksiimoisieiev/xen/tree/gntdev_fd
> > 
> > How to build:
> > make -C tools/console all
> > 
> > Result: ./tools/console/gnt_test should be uploaded to Dom0
> > 
> > Start: sudo ./gnt_test_map 1 301 302 303 304 305
> > Where 1 is DomU ID and 301 302 303 304 305 - grefs from test.ko output
> > 
> > This will create buffer using ioctls DRM_IOCTL_MODE_CREATE_DUMB them passes it as backing
> > storage to gntdev and then destroys it using DRM_IOCTL_MODE_DESTROY_DUMB.
> > The problem is that when dumb buffer is created we observe page_count(page) = 2. So
> > when before buffer release I'm trying to unmap grant refs using unmap_grant_pages it is calling
> > __gnttab_unmap_refs_async, which postpones actual unmapping to 5 ms because
> > page_count(page) > 1.
> > Which causes drm_gem_get_pages to try to free pages, which are still mapped.
> > Also if I change in the following line:
> > https://github.com/torvalds/linux/blob/bb1a1146467ad812bb65440696df0782e2bc63c8/drivers/xen/grant-table.c#L1313
> > change from page_count(item->pages[pc]) > 1 to page_count(item->pages[pc]) > 2 - everything works fine.
> > The obvious way for fix this issue I see is to make the expected page_count
> > for __gnttab_unmap_refs_async configurable for each buffer, but I'm now sure
> > if this is the
> > best solution.
> > 
> > I would be happy to hear your thoughts and advises about how to fix this situation.
> 
> My first thought would be to save the page_count() of each page when doing
> the map operation, and then compare to that value.
> 
> The natural place to store this count would be struct xen_page_foreign,
> but there are only 16 bits free for the 64-bit system case (it is using
> the struct page->private field for that purpose), so you'd need to bail
> out in case page_count() is > 65535.
> 
> 
> Juergen

Hello Juergen,

Thank you for the advise.
I think I'll use this approach if everybody fine with it.

Best regards,
Oleksii.


More information about the dri-devel mailing list