[RFC] deadlock in "drm/exynos: fix wrong pointer access at vm close"

Tue Sep 24 21:34:30 PDT 2013

> -----Original Message-----
> From: Inki Dae [mailto:inki.dae at samsung.com]
> Sent: Tuesday, September 24, 2013 1:41 PM
> To: 'Al Viro'
> Cc: 'YoungJun Cho'; 'dri-devel at lists.freedesktop.org'
> Subject: RE: [RFC] deadlock in "drm/exynos: fix wrong pointer access at vm
> close"
> 
> Thanks for your comment.
> 
> > -----Original Message-----
> > From: Al Viro [mailto:viro at ftp.linux.org.uk] On Behalf Of Al Viro
> > Sent: Monday, September 23, 2013 10:07 PM
> > To: Inki Dae
> > Cc: 'YoungJun Cho'; dri-devel at lists.freedesktop.org
> > Subject: Re: [RFC] deadlock in "drm/exynos: fix wrong pointer access at
> vm
> > close"
> >
> > On Mon, Sep 23, 2013 at 04:49:30PM +0900, Inki Dae wrote:
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: Al Viro [mailto:viro at ftp.linux.org.uk] On Behalf Of Al Viro
> > > > Sent: Monday, September 23, 2013 6:29 AM
> > > > To: YoungJun Cho
> > > > Cc: dri-devel at lists.freedesktop.org; Inki Dae
> > > > Subject: [RFC] deadlock in "drm/exynos: fix wrong pointer access at
> vm
> > > > close"
> > > >
> > > > 	You have drm_dev->struct_mutex grabbed before ->mmap_sem in
> > > > exynos_drm_gem_mmap_ioctl() and after - in exynos_drm_gem_fault()
> > > > (since ->fault() is always called with ->mmap_sem held).  Looks like
> > > > a garden-variety AB-BA deadlock...
> > > >
> > >
> > > Right, if mmap system call is requested by another process between -
> > >f_op
> > > pointer changing and restoring, the deadlock can be incurred.
> > >
> > > For this, I think we can resolve this issue like below,
> > >
> > > At exynos_drm_gem_mmap_ioctl()
> > > 	down_write(&mm->mmap_sem);
> > > 	mutex_lock(&dev->struct_mutex);
> > > 	...
> >
> > Umm...  If you do it that way, why bother with changing ->private_data
> > at all?  You can as well stash obj in dev->dev_private->something after
> > you've grabbed the mutex and have ->mmap() pick it there...
> 
> I changed ->private_data to a gem object so that mmap_buffer function can
> get the gem object. But yes, we can use dev->dev_private->something
> instead, and that seems like better way. For this, I missed that we can
> get drm_device object from file pointer.
> 
> >
> > Said that, I really don't like that approach - both playing with ->f_op
> > and the games with private vmas;
> 
> Agree. But is there other better way to support direct mapping; not
> needing page fault handler to map, and indirect mapping interfaces;
> needing the page fault handler?
> 
> > exynos_gem_get_vma(), AFAICS, calls
> > find_vma() (without bothering to hold ->mmap_sem, BTW - there's nothing
> > to prevent the result of find_vma() being freed just as it's returned
> 
> I can't see to hold ->mmap_sem when it calls find_vma() anywhere else.
> 
> > to caller) and clones it manually, regardless of whether that vma allows
> > to clone itself or not.  Quite a few drivers rely on that not
> happening...
> >
> 
> I think that has no any problem because exynos_gem_get_vma() takes a
> reference to vma, and also v4l2 side is using same way. I and v4l2 guys
> might be missing something what you are concerning. So Could you give us
> more comments?
> 
> And, I found a bug that we never call exynos_gem_put_vma() when userptr is
> freed. That should be fixed.
> 
> > IOW, you are already digging deep inside VM guts and this only makes
> > it deeper ;-/
> >
> > And no, the deadlock doesn't depend on race between ioctl() and mmap()
> > from another process; all it takes is
> 
> Yes, there was my wrong comments. That is  a mmap
> callback(exynos_drm_gem_mmap_buffer) we don't want can be called if we
> called mmap system call when ->f_op points to exynos_drm_gem_fops.
> 
> > 1) thread A does clone(), creating thread B that shares address space
> with
> > it.
> > 2) thread A does that ioctl, creating a mapping
> > 3) thread A does that ioctl again, creating another mapping, while
> thread
> > B dereferences an address in the first mapping and triggers a page
fault.
> >
> > The only race is on step 3 in the above; the question about mmap() vs.
> > ioctl() was about mmap(2) _during_ that ioctl() hitting
> > exynos_drm_gem_mmap_buffer() instead of exynos_drm_gem_mmap() it
> would've
> > called before or after ioctl().  Here the interesting case is when
> > callers of mmap() and ioctl() do *not* share the address space, since
> > in that case mmap(2) won't notice ->mmap_sem held by you - it's on the
> > different mm_struct, so mmap(2) will get to calling the ->f_op->mmap()
> > without waiting for you to restore ->f_op...
> 
> Yes, it's a problem. When two more threads are running, and when thread A
> called mmap system call, exynos_drm_gem_mmap_buffer() will be called if
> thread B called  ioctl() first and f_op was not restored yet. I have no
> idea to resolve this issue yet. So do you have any good idea? At any rate,
> we should fix this issue.
> 
> >
> > For another bug in the same area, try building that driver modular and
> > watch what happens to use count after a round of this switching ->f_op
> and
> > restoring it back to original; fops_get() in there is wrong and AFAICS
> > pointless.
> 
> Yes, it would also be another bug. Will check it out. It seems that it's
> not good way to change and restore ->f_op, and this way makes mmap part to
> be complicated. I will try to find a way to resolve this issue or to find
> another better way to support direct and indirect mapping interfaces.
> 

It seems that we can use a new anon file instead of using drm file to
resolve the issue.

Thanks,
Inki Dae

> I'd appreciate all the advices I can get.
> 
> Thanks,
> Inki Dae