[Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence

Fri Jun 11 06:55:34 UTC 2021

Am 10.06.21 um 22:42 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 10:10 PM Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Thu, Jun 10, 2021 at 8:35 AM Jason Ekstrand <jason at jlekstrand.net> wrote:
>>> On Thu, Jun 10, 2021 at 6:30 AM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>>>> On Thu, Jun 10, 2021 at 11:39 AM Christian König
>>>> <christian.koenig at amd.com> wrote:
>>>>> Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin:
>>>>>> On 09/06/2021 22:29, Jason Ekstrand wrote:
>>>>>>> We've tried to keep it somewhat contained by doing most of the hard work
>>>>>>> to prevent access of recycled objects via dma_fence_get_rcu_safe().
>>>>>>> However, a quick grep of kernel sources says that, of the 30 instances
>>>>>>> of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe().
>>>>>>> It's likely there bear traps in DRM and related subsystems just waiting
>>>>>>> for someone to accidentally step in them.
>>>>>> ...because dma_fence_get_rcu_safe apears to be about whether the
>>>>>> *pointer* to the fence itself is rcu protected, not about the fence
>>>>>> object itself.
>>>>> Yes, exactly that.
>>> The fact that both of you think this either means that I've completely
>>> missed what's going on with RCUs here (possible but, in this case, I
>>> think unlikely) or RCUs on dma fences should scare us all.
>> Taking a step back for a second and ignoring SLAB_TYPESAFE_BY_RCU as
>> such,  I'd like to ask a slightly different question:  What are the
>> rules about what is allowed to be done under the RCU read lock and
>> what guarantees does a driver need to provide?
>>
>> I think so far that we've all agreed on the following:
>>
>>   1. Freeing an unsignaled fence is ok as long as it doesn't have any
>> pending callbacks.  (Callbacks should hold a reference anyway).
>>
>>   2. The pointer race solved by dma_fence_get_rcu_safe is real and
>> requires the loop to sort out.
>>
>> But let's say I have a dma_fence pointer that I got from, say, calling
>> dma_resv_excl_fence() under rcu_read_lock().  What am I allowed to do
>> with it under the RCU lock?  What assumptions can I make?  Is this
>> code, for instance, ok?
>>
>> rcu_read_lock();
>> fence = dma_resv_excl_fence(obj);
>> idle = !fence || test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
>> rcu_read_unlock();
>>
>> This code very much looks correct under the following assumptions:
>>
>>   1. A valid fence pointer stays alive under the RCU read lock
>>   2. SIGNALED_BIT is set-once (it's never unset after being set).
>>
>> However, if it were, we wouldn't have dma_resv_test_singnaled(), now
>> would we? :-)
>>
>> The moment you introduce ANY dma_fence recycling that recycles a
>> dma_fence within a single RCU grace period, all your assumptions break
>> down.  SLAB_TYPESAFE_BY_RCU is just one way that i915 does this.  We
>> also have a little i915_request recycler to try and help with memory
>> pressure scenarios in certain critical sections that also doesn't
>> respect RCU grace periods.  And, as mentioned multiple times, our
>> recycling leaks into every other driver because, thanks to i915's
>> choice, the above 4-line code snippet isn't valid ANYWHERE in the
>> kernel.
>>
>> So the question I'm raising isn't so much about the rules today.
>> Today, we live in the wild wild west where everything is YOLO.  But
>> where do we want to go?  Do we like this wild west world?  So we want
>> more consistency under the RCU read lock?  If so, what do we want the
>> rules to be?
>>
>> One option would be to accept the wild-west world we live in and say
>> "The RCU read lock gains you nothing.  If you want to touch the guts
>> of a dma_fence, take a reference".  But, at that point, we're eating
>> two atomics for every time someone wants to look at a dma_fence.  Do
>> we want that?
>>
>> Alternatively, and this what I think Daniel and I were trying to
>> propose here, is that we place some constraints on dma_fence
>> recycling.  Specifically that, under the RCU read lock, the fence
>> doesn't suddenly become a new fence.  All of the immutability and
>> once-mutability guarantees of various bits of dma_fence hold as long
>> as you have the RCU read lock.
> Yeah this is suboptimal. Too many potential bugs, not enough benefits.
>
> This entire __rcu business started so that there would be a lockless
> way to get at fences, or at least the exclusive one. That did not
> really pan out. I think we have a few options:
>
> - drop the idea of rcu/lockless dma-fence access outright. A quick
> sequence of grabbing the lock, acquiring the dma_fence and then
> dropping your lock again is probably plenty good. There's a lot of
> call_rcu and other stuff we could probably delete. I have no idea what
> the perf impact across all the drivers would be.

The question is maybe not the perf impact, but rather if that is 
possible over all.

IIRC we now have some cases in TTM where RCU is mandatory and we simply 
don't have any other choice than using it.

> - try to make all drivers follow some stricter rules. The trouble is
> that at least with radeon dma_fence callbacks aren't even very
> reliable (that's why it has its own dma_fence_wait implementation), so
> things are wobbly anyway.
>
> - live with the current situation, but radically delete all unsafe
> interfaces. I.e. nothing is allowed to directly deref an rcu fence
> pointer, everything goes through dma_fence_get_rcu_safe. The
> kref_get_unless_zero would become an internal implementation detail.
> Our "fast" and "lockless" dma_resv fence access stays a pile of
> seqlock, retry loop and an a conditional atomic inc + atomic dec. The
> only thing that's slightly faster would be dma_resv_test_signaled()
>
> - I guess minimally we should rename dma_fence_get_rcu to
> dma_fence_tryget. It has nothing to do with rcu really, and the use is
> very, very limited.

I think what we should do is to use RCU internally in the dma_resv 
object but disallow drivers/frameworks to mess with that directly.

In other words drivers should use one of the following:
1. dma_resv_wait_timeout()
2. dma_resv_test_signaled()
3. dma_resv_copy_fences()
4. dma_resv_get_fences()
5. dma_resv_for_each_fence() <- to be implemented
6. dma_resv_for_each_fence_unlocked() <- to be implemented

Inside those functions we then make sure that we only save ways of 
accessing the RCU protected data structures.

This way we only need to make sure that those accessor functions are 
sane and don't need to audit every driver individually.

I can tackle implementing for the dma_res_for_each_fence()/_unlocked(). 
Already got a large bunch of that coded out anyway.

Regards,
Christian.

>
> Not sure what's a good idea here tbh.
> -Daniel