[Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2
Christian König
ckoenig.leichtzumerken at gmail.com
Thu Mar 22 09:37:55 UTC 2018
Am 22.03.2018 um 08:14 schrieb Daniel Vetter:
> On Wed, Mar 21, 2018 at 10:34:05AM +0100, Christian König wrote:
>> Am 21.03.2018 um 09:18 schrieb Daniel Vetter:
>>> [SNIP]
>> For correct operation you always need to implement invalidate_range_end as
>> well and add some lock/completion work Otherwise get_user_pages() can again
>> grab the reference to the wrong page.
> Is this really a problem?
Yes, and quite a big one.
> I figured that if a mmu_notifier invalidation is
> going on, a get_user_pages on that mm from anywhere else (whether i915 or
> anyone really) will serialize with the ongoing invalidate?
No, that isn't correct. Jerome can probably better explain that than I do.
> If that's not the case, then really any get_user_pages is racy, including all the
> DIRECT_IO ones.
The key point here is that get_user_pages() grabs a reference to the
page. So what you get is a bunch of pages which where mapped at that
location at a specific point in time.
There is no guarantee that after get_user_pages() return you still have
the same pages mapped at that point, you only guarantee that the pages
are not reused for something else.
That is perfectly sufficient for a task like DIRECT_IO where you can
only have block or network I/O, but unfortunately not really for GPUs
where you crunch of results, write them back to pages and actually count
on that the CPU sees the result in the right place.
>> [SNIP]
>> So no matter how you put it i915 is clearly doing something wrong here :)
> tbh I'm not entirely clear on the reasons why this works, but
> cross-release lockdep catches these things, and it did not complain.
> On a high-level we make sure that mm locks needed by get_user_pages do
> _not_ nest within dev->struct_mutex. We have massive back-off slowpaths to
> do anything that could fault outside of our own main gem locking.
I'm pretty sure that this doesn't work as intended and just hides the
real problem.
> That was (at least in the past) a major difference with amdgpu, which
> essentially has none of these paths. That would trivially deadlock with
> your own gem mmap fault handler, so you had (maybe that changed) a dumb
> retry loop, which did shut up lockdep but didn't fix any of the locking
> inversions.
Any lock you grab in an MMU callback can't be even held when you call
kmalloc() or get_free_page() (without GFP_NOIO).
Even simple things like drm_vm_open() violate that by using GFP_KERNEL.
So I can 100% ensure you that what you do here is not correct.
> So yeah, grabbing dev->struct_mutex is in principle totally fine while
> holding all kinds of struct mm/vma locks. I'm not entirely clear why we
> punt the actual unmapping to the worker though, maybe simply to not have a
> constrained stack.
I strongly disagree on that. As far as I can see what TTM does looks
actually like the right approach to the problem.
> This is re: your statement that you can't unamp sg tables from the
> shrinker. We can, because we've actually untangled the locking depencies
> so that you can fully operate on gem objects from within mm/vma locks.
> Maybe code has changed, but last time I looked at radeon/ttm a while back
> that was totally not the case, and if you don't do all this work then yes
> you'll deadlock.
>
> Doen't mean it's not impossible, because we've done it :-)
And I'm pretty sure you didn't do it correctly :D
> Well, it actually gets the job done. We'd need to at least get to
> per-object locking, and probably even then we'd need to rewrite the code a
> lot. But please note that this here is only to avoid the GFP_NOIO
> constraint, all the other bits I clarified around why we don't actually
> have circular locking (because the entire hierarchy is inverted for us)
> still hold even if you would only trylock here.
Well you reversed your allocation and mmap_sem lock which avoids the
lock inversion during page faults, but it doesn't help you at all with
the MMU notifier and shrinker because then there are a lot more locks
involved.
Regards,
Christian.
> Aside: Given that yesterday a bunch of folks complained on #dri-devel that
> amdgpu prematurely OOMs compared to i915, and that we've switched from a
> simple trylock to this nastiness to be able to recover from more low
> memory situation it's maybe not such a silly idea. Horrible, but not silly
> because actually necessary.
> -Daniel
More information about the dri-devel
mailing list