[RFC] drm/vgem: Don't use get_page in fault handler.

Chris Wilson chris at chris-wilson.co.uk
Tue Jul 7 08:22:14 UTC 2020


Quoting Lepton Wu (2020-07-07 05:21:00)
> For pages which are allocated in ttm with transparent huge pages,
> tail pages have zero as reference count. The current vgem code use
> get_page on them and it will trigger BUG when release_pages get called
> on those pages later.
> 
> Here I attach the minimal code to trigger this bug on a qemu VM which
> enables virtio gpu (card1) and vgem (card0). BTW, since the upstream
> virtio gpu has changed to use drm gem and moved away from ttm. So we
> have to use an old kernel like 5.4 to reproduce it. But I guess
> same thing could happen for a real GPU if the real GPU use similar code
> path to allocate pages from ttm. I am not sure about two things: first, do we
> need to fix this? will a real GPU hit this code path? Second, suppose this
> need to be fixed, should this be fixed in ttm side or vgem side?  If we remove
> "huge_flags &= ~__GFP_COMP" from ttm_get_pages, then get_page in vgem works
> fine. But it's there for fix another bug:
> https://bugs.freedesktop.org/show_bug.cgi?id=103138
> It could also be "fixed" with this patch. But I am really not sure if this is
> the way to go.
> 
> Here is the code to reproduce this bug:
> 
> unsigned int WIDTH = 1024;
> unsigned int HEIGHT = 513;
> unsigned int size = WIDTH * HEIGHT * 4;
> 
> int work(int vfd, int dfd, int handle) {
>         int ret;
>         struct drm_prime_handle hf = {.handle =  handle };
>         ret = ioctl(dfd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &hf);
>         fprintf(stderr, "fd is %d\n", hf.fd);
>         hf.flags = DRM_CLOEXEC | DRM_RDWR;
>         ret = ioctl(vfd, DRM_IOCTL_PRIME_FD_TO_HANDLE, &hf);
>         fprintf(stderr, "new handle is %d\n", hf.handle);
>         struct drm_mode_map_dumb map = {.handle = hf.handle };
>         ret = ioctl(vfd, DRM_IOCTL_MODE_MAP_DUMB, &map);
>         fprintf(stderr, "need map at offset %lld\n", map.offset);
>         unsigned char * ptr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, vfd,
>                           map.offset);
>         memset(ptr, 2, size);
>         munmap(ptr, size);
> }
> 
> int main()
> {
>         int vfd = open("/dev/dri/card0", O_RDWR); // vgem
>         int dfd = open("/dev/dri/card1", O_RDWR); // virtio gpu
> 
>         int ret;
>         struct drm_mode_create_dumb ct = {};
> 
>         ct.height = HEIGHT;
>         ct.width = WIDTH;
>         ct.bpp = 32;
>         ret = ioctl(dfd, DRM_IOCTL_MODE_CREATE_DUMB, &ct);
>         work(vfd, dfd, ct.handle);
>         fprintf(stderr, "done\n");
> }
> 
> Signed-off-by: Lepton Wu <ytht.net at gmail.com>
> ---
>  drivers/gpu/drm/vgem/vgem_drv.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> index ec1a8ebb6f1b..be3d97e29804 100644
> --- a/drivers/gpu/drm/vgem/vgem_drv.c
> +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> @@ -87,9 +87,8 @@ static vm_fault_t vgem_gem_fault(struct vm_fault *vmf)
>  
>         mutex_lock(&obj->pages_lock);
>         if (obj->pages) {
> -               get_page(obj->pages[page_offset]);
> -               vmf->page = obj->pages[page_offset];
> -               ret = 0;
> +               ret = vmf_insert_pfn(vmf->vma, vmf->address,
> +                                    page_to_pfn(obj->pages[page_offset]));

The danger here is that you store a pointer to a physical page in the
CPU page tables and never remove it should the vgem bo be unpinned and
the obj->pages[] released [and the physical pages returned to the system
for reuse].

So judicial adding of 
drm_vma_node_unmap(&bo->base.vma_node, bo->base.anon_inode->i_mapping);
drm_gem_put_pages, and also the kvfree in vgem_gem_free_object is
suspect (for not being a drm_gem_put_pages).
-Chris


More information about the dri-devel mailing list