[RFC] drm/vgem: Don't use get_page in fault handler.
Chris Wilson
chris at chris-wilson.co.uk
Tue Jul 7 08:22:14 UTC 2020
Quoting Lepton Wu (2020-07-07 05:21:00)
> For pages which are allocated in ttm with transparent huge pages,
> tail pages have zero as reference count. The current vgem code use
> get_page on them and it will trigger BUG when release_pages get called
> on those pages later.
>
> Here I attach the minimal code to trigger this bug on a qemu VM which
> enables virtio gpu (card1) and vgem (card0). BTW, since the upstream
> virtio gpu has changed to use drm gem and moved away from ttm. So we
> have to use an old kernel like 5.4 to reproduce it. But I guess
> same thing could happen for a real GPU if the real GPU use similar code
> path to allocate pages from ttm. I am not sure about two things: first, do we
> need to fix this? will a real GPU hit this code path? Second, suppose this
> need to be fixed, should this be fixed in ttm side or vgem side? If we remove
> "huge_flags &= ~__GFP_COMP" from ttm_get_pages, then get_page in vgem works
> fine. But it's there for fix another bug:
> https://bugs.freedesktop.org/show_bug.cgi?id=103138
> It could also be "fixed" with this patch. But I am really not sure if this is
> the way to go.
>
> Here is the code to reproduce this bug:
>
> unsigned int WIDTH = 1024;
> unsigned int HEIGHT = 513;
> unsigned int size = WIDTH * HEIGHT * 4;
>
> int work(int vfd, int dfd, int handle) {
> int ret;
> struct drm_prime_handle hf = {.handle = handle };
> ret = ioctl(dfd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &hf);
> fprintf(stderr, "fd is %d\n", hf.fd);
> hf.flags = DRM_CLOEXEC | DRM_RDWR;
> ret = ioctl(vfd, DRM_IOCTL_PRIME_FD_TO_HANDLE, &hf);
> fprintf(stderr, "new handle is %d\n", hf.handle);
> struct drm_mode_map_dumb map = {.handle = hf.handle };
> ret = ioctl(vfd, DRM_IOCTL_MODE_MAP_DUMB, &map);
> fprintf(stderr, "need map at offset %lld\n", map.offset);
> unsigned char * ptr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, vfd,
> map.offset);
> memset(ptr, 2, size);
> munmap(ptr, size);
> }
>
> int main()
> {
> int vfd = open("/dev/dri/card0", O_RDWR); // vgem
> int dfd = open("/dev/dri/card1", O_RDWR); // virtio gpu
>
> int ret;
> struct drm_mode_create_dumb ct = {};
>
> ct.height = HEIGHT;
> ct.width = WIDTH;
> ct.bpp = 32;
> ret = ioctl(dfd, DRM_IOCTL_MODE_CREATE_DUMB, &ct);
> work(vfd, dfd, ct.handle);
> fprintf(stderr, "done\n");
> }
>
> Signed-off-by: Lepton Wu <ytht.net at gmail.com>
> ---
> drivers/gpu/drm/vgem/vgem_drv.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> index ec1a8ebb6f1b..be3d97e29804 100644
> --- a/drivers/gpu/drm/vgem/vgem_drv.c
> +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> @@ -87,9 +87,8 @@ static vm_fault_t vgem_gem_fault(struct vm_fault *vmf)
>
> mutex_lock(&obj->pages_lock);
> if (obj->pages) {
> - get_page(obj->pages[page_offset]);
> - vmf->page = obj->pages[page_offset];
> - ret = 0;
> + ret = vmf_insert_pfn(vmf->vma, vmf->address,
> + page_to_pfn(obj->pages[page_offset]));
The danger here is that you store a pointer to a physical page in the
CPU page tables and never remove it should the vgem bo be unpinned and
the obj->pages[] released [and the physical pages returned to the system
for reuse].
So judicial adding of
drm_vma_node_unmap(&bo->base.vma_node, bo->base.anon_inode->i_mapping);
drm_gem_put_pages, and also the kvfree in vgem_gem_free_object is
suspect (for not being a drm_gem_put_pages).
-Chris
More information about the dri-devel
mailing list