[Intel-gfx] [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe

Thu Jun 10 18:12:19 UTC 2021

Am 10.06.21 um 19:11 schrieb Daniel Vetter:
> On Thu, Jun 10, 2021 at 06:54:13PM +0200, Christian König wrote:
>> Am 10.06.21 um 18:37 schrieb Daniel Vetter:
>>> On Thu, Jun 10, 2021 at 6:24 PM Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>> On Thu, Jun 10, 2021 at 10:13 AM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>>>>> On Thu, Jun 10, 2021 at 3:59 PM Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>>>> On Thu, Jun 10, 2021 at 1:51 AM Christian König
>>>>>> <christian.koenig at amd.com> wrote:
>>>>>>> Am 09.06.21 um 23:29 schrieb Jason Ekstrand:
>>>>>>>> This helper existed to handle the weird corner-cases caused by using
>>>>>>>> SLAB_TYPESAFE_BY_RCU for backing dma_fence.  Now that no one is using
>>>>>>>> that anymore (i915 was the only real user), dma_fence_get_rcu is
>>>>>>>> sufficient.  The one slightly annoying thing we have to deal with here
>>>>>>>> is that dma_fence_get_rcu_safe did an rcu_dereference as well as a
>>>>>>>> SLAB_TYPESAFE_BY_RCU-safe dma_fence_get_rcu.  This means each call site
>>>>>>>> ends up being 3 lines instead of 1.
>>>>>>> That's an outright NAK.
>>>>>>>
>>>>>>> The loop in dma_fence_get_rcu_safe is necessary because the underlying
>>>>>>> fence object can be replaced while taking the reference.
>>>>>> Right.  I had missed a bit of that when I first read through it.  I
>>>>>> see the need for the loop now.  But there are some other tricky bits
>>>>>> in there besides just the loop.
>>>>> I thought that's what the kref_get_unless_zero was for in
>>>>> dma_fence_get_rcu? Otherwise I guess I'm not seeing why still have
>>>>> dma_fence_get_rcu around, since that should either be a kref_get or
>>>>> it's just unsafe to call it ...
>>>> AFAICT, dma_fence_get_rcu is unsafe unless you somehow know that it's
>>>> your fence and it's never recycled.
>>>>
>>>> Where the loop comes in is if you have someone come along, under the
>>>> RCU write lock or not, and swap out the pointer and unref it while
>>>> you're trying to fetch it.  In this case, if you just write the three
>>>> lines I duplicated throughout this patch, you'll end up with NULL if
>>>> you (partially) lose the race.  The loop exists to ensure that you get
>>>> either the old pointer or the new pointer and you only ever get NULL
>>>> if somewhere during the mess, the pointer actually gets set to NULL.
>>> It's not that easy. At least not for dma_resv.
>>>
>>> The thing is, you can't just go in and replace the write fence with
>>> something else. There's supposed to be some ordering here (how much we
>>> actually still follow that or not is a bit another question, that I'm
>>> trying to answer with an audit of lots of drivers), which means if you
>>> replace e.g. the exclusive fence, the previous fence will _not_ just
>>> get freed. Because the next exclusive fence needs to wait for that to
>>> finish first.
>>>
>>> Conceptually the refcount will _only_ go to 0 once all later
>>> dependencies have seen it get signalled, and once the fence itself has
>>> been signalled.
>> I think that's the point where it breaks.
>>
>> See IIRC radeon for example doesn't keep unsignaled fences around when
>> nobody is interested in them. And I think noveau does it that way as well.
>>
>> So for example you can have the following
>> 1. Submission to 3D ring, this creates fence A.
>> 2. Fence A is put as en exclusive fence in a dma_resv object.
>> 3. Submission to 3D ring, this creates fence B.
>> 4. Fence B is replacing fence A as the exclusive fence in the dma_resv
>> object.
>>
>> Fence A is replaced and therefore destroyed while it is not even close to be
>> signaled. But the replacement is perfectly ok, since fence B is submitted to
>> the same ring.
>>
>> When somebody would use dma_fence_get_rcu on the exclusive fence and get
>> NULL it would fail to wait for the submissions. You don't really need the
>> SLAB_TYPESAFE_BY_RCU for this to blow up in your face.
> Uh that's wild ...
>
> I thought that's impossible, but in dma_fence_release() we only complain
> if there's both waiters and the fence isn't signalled yet. I had no idea.
>
>> We could change that rule of curse, amdgpu for example is always keeping
>> fences around until they are signaled. But IIRC that's how it was for radeon
>> like forever.
> Yeah I think we could, but then we need to do a few things:
> - document that defactor only get_rcu_safe is ok to use
> - delete get_rcu, it's not really a safe thing to do anywhere

Well I would rename dma_fence_get_rcu into dma_fence_get_unless_zero.

And then we can dma_fence_get_rcu_safe() into dma_fence_get_rcu().

Christian.

>
> -Daniel
>
>> Regards,
>> Christian.
>>
>>>    A signalled fence might as well not exist, so if
>>> that's what  happened in that tiny window, then yes a legal scenario
>>> is the following:
>>>
>>> thread A:
>>> - rcu_dereference(resv->exclusive_fence);
>>>
>>> thread B:
>>> - dma_fence signals, retires, drops refcount to 0
>>> - sets the exclusive fence to NULL
>>> - creates a new dma_fence
>>> - sets the exclusive fence to that new fence
>>>
>>> thread A:
>>> - kref_get_unless_zero fails, we report that the exclusive fence slot is NULL
>>>
>>> Ofc normally we're fully pipeline, and we lazily clear slots, so no
>>> one ever writes the fence ptr to NULL. But conceptually it's totally
>>> fine, and an indistinguishable sequence of events from the point of
>>> view of thread A.
>>>
>>> Ergo dma_fence_get_rcu is enough. If it's not, we've screwed up really
>>> big time. The only reason you need _unsafe is if you have
>>> typesafe_by_rcu, or maybe if you yolo your fence ordering a bit much
>>> and break the DAG property in a few cases.
>>>
>>>> I agree with Christian that that part of dma_fence_get_rcu_safe needs
>>>> to stay.  I was missing that until I did my giant "let's walk through
>>>> the code" e-mail.
>>> Well if I'm wrong there's a _ton_ of broken code in upstream right
>>> now, even in dma-buf/dma-resv.c. We're using dma_fence_get_rcu a lot.
>>>
>>> Also the timing is all backwards: get_rcu_safe was added as a fix for
>>> when i915 made its dma_fence typesafe_by_rcu. We didn't have any need
>>> for this beforehand. So I'm really not quite buying this story here
>>> yet you're all trying to sell me on.
>>> -Daniel
>>>
>>>> --Jason
>>>>
>>>>>>> This is completely unrelated to SLAB_TYPESAFE_BY_RCU. See the
>>>>>>> dma_fence_chain usage for reference.
>>>>>>>
>>>>>>> What you can remove is the sequence number handling in dma-buf. That
>>>>>>> should make adding fences quite a bit quicker.
>>>>>> I'll look at that and try to understand what's going on there.
>>>>> Hm I thought the seqlock was to make sure we have a consistent set of
>>>>> fences across exclusive and all shared slot. Not to protect against
>>>>> the fence disappearing due to typesafe_by_rcu.
>>>>> -Daniel
>>>>>
>>>>>> --Jason
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
>>>>>>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>>>>>> Cc: Christian König <christian.koenig at amd.com>
>>>>>>>> Cc: Matthew Auld <matthew.auld at intel.com>
>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>>>>>>>> ---
>>>>>>>>     drivers/dma-buf/dma-fence-chain.c         |  8 ++--
>>>>>>>>     drivers/dma-buf/dma-resv.c                |  4 +-
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  4 +-
>>>>>>>>     drivers/gpu/drm/i915/i915_active.h        |  4 +-
>>>>>>>>     drivers/gpu/drm/i915/i915_vma.c           |  4 +-
>>>>>>>>     include/drm/drm_syncobj.h                 |  4 +-
>>>>>>>>     include/linux/dma-fence.h                 | 50 -----------------------
>>>>>>>>     include/linux/dma-resv.h                  |  4 +-
>>>>>>>>     8 files changed, 23 insertions(+), 59 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
>>>>>>>> index 7d129e68ac701..46dfc7d94d8ed 100644
>>>>>>>> --- a/drivers/dma-buf/dma-fence-chain.c
>>>>>>>> +++ b/drivers/dma-buf/dma-fence-chain.c
>>>>>>>> @@ -15,15 +15,17 @@ static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
>>>>>>>>      * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
>>>>>>>>      * @chain: chain node to get the previous node from
>>>>>>>>      *
>>>>>>>> - * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
>>>>>>>> - * chain node.
>>>>>>>> + * Use rcu_dereference and dma_fence_get_rcu to get a reference to the
>>>>>>>> + * previous fence of the chain node.
>>>>>>>>      */
>>>>>>>>     static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain *chain)
>>>>>>>>     {
>>>>>>>>         struct dma_fence *prev;
>>>>>>>>
>>>>>>>>         rcu_read_lock();
>>>>>>>> -     prev = dma_fence_get_rcu_safe(&chain->prev);
>>>>>>>> +     prev = rcu_dereference(chain->prev);
>>>>>>>> +     if (prev)
>>>>>>>> +             prev = dma_fence_get_rcu(prev);
>>>>>>>>         rcu_read_unlock();
>>>>>>>>         return prev;
>>>>>>>>     }
>>>>>>>> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
>>>>>>>> index f26c71747d43a..cfe0db3cca292 100644
>>>>>>>> --- a/drivers/dma-buf/dma-resv.c
>>>>>>>> +++ b/drivers/dma-buf/dma-resv.c
>>>>>>>> @@ -376,7 +376,9 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
>>>>>>>>                 dst_list = NULL;
>>>>>>>>         }
>>>>>>>>
>>>>>>>> -     new = dma_fence_get_rcu_safe(&src->fence_excl);
>>>>>>>> +     new = rcu_dereference(src->fence_excl);
>>>>>>>> +     if (new)
>>>>>>>> +             new = dma_fence_get_rcu(new);
>>>>>>>>         rcu_read_unlock();
>>>>>>>>
>>>>>>>>         src_list = dma_resv_shared_list(dst);
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> index 72d9b92b17547..0aeb6117f3893 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> @@ -161,7 +161,9 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
>>>>>>>>                 struct dma_fence *old;
>>>>>>>>
>>>>>>>>                 rcu_read_lock();
>>>>>>>> -             old = dma_fence_get_rcu_safe(ptr);
>>>>>>>> +             old = rcu_dereference(*ptr);
>>>>>>>> +             if (old)
>>>>>>>> +                     old = dma_fence_get_rcu(old);
>>>>>>>>                 rcu_read_unlock();
>>>>>>>>
>>>>>>>>                 if (old) {
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
>>>>>>>> index d0feda68b874f..bd89cfc806ca5 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_active.h
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_active.h
>>>>>>>> @@ -103,7 +103,9 @@ i915_active_fence_get(struct i915_active_fence *active)
>>>>>>>>         struct dma_fence *fence;
>>>>>>>>
>>>>>>>>         rcu_read_lock();
>>>>>>>> -     fence = dma_fence_get_rcu_safe(&active->fence);
>>>>>>>> +     fence = rcu_dereference(active->fence);
>>>>>>>> +     if (fence)
>>>>>>>> +             fence = dma_fence_get_rcu(fence);
>>>>>>>>         rcu_read_unlock();
>>>>>>>>
>>>>>>>>         return fence;
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>>> index 0f227f28b2802..ed0388d99197e 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>>> @@ -351,7 +351,9 @@ int i915_vma_wait_for_bind(struct i915_vma *vma)
>>>>>>>>                 struct dma_fence *fence;
>>>>>>>>
>>>>>>>>                 rcu_read_lock();
>>>>>>>> -             fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
>>>>>>>> +             fence = rcu_dereference(vma->active.excl.fence);
>>>>>>>> +             if (fence)
>>>>>>>> +                     fence = dma_fence_get_rcu(fence);
>>>>>>>>                 rcu_read_unlock();
>>>>>>>>                 if (fence) {
>>>>>>>>                         err = dma_fence_wait(fence, MAX_SCHEDULE_TIMEOUT);
>>>>>>>> diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
>>>>>>>> index 6cf7243a1dc5e..6c45d52988bcc 100644
>>>>>>>> --- a/include/drm/drm_syncobj.h
>>>>>>>> +++ b/include/drm/drm_syncobj.h
>>>>>>>> @@ -105,7 +105,9 @@ drm_syncobj_fence_get(struct drm_syncobj *syncobj)
>>>>>>>>         struct dma_fence *fence;
>>>>>>>>
>>>>>>>>         rcu_read_lock();
>>>>>>>> -     fence = dma_fence_get_rcu_safe(&syncobj->fence);
>>>>>>>> +     fence = rcu_dereference(syncobj->fence);
>>>>>>>> +     if (fence)
>>>>>>>> +             fence = dma_fence_get_rcu(syncobj->fence);
>>>>>>>>         rcu_read_unlock();
>>>>>>>>
>>>>>>>>         return fence;
>>>>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>>>>>>> index 6ffb4b2c63715..f4a2ab2b1ae46 100644
>>>>>>>> --- a/include/linux/dma-fence.h
>>>>>>>> +++ b/include/linux/dma-fence.h
>>>>>>>> @@ -307,56 +307,6 @@ static inline struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence)
>>>>>>>>                 return NULL;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> -/**
>>>>>>>> - * dma_fence_get_rcu_safe  - acquire a reference to an RCU tracked fence
>>>>>>>> - * @fencep: pointer to fence to increase refcount of
>>>>>>>> - *
>>>>>>>> - * Function returns NULL if no refcount could be obtained, or the fence.
>>>>>>>> - * This function handles acquiring a reference to a fence that may be
>>>>>>>> - * reallocated within the RCU grace period (such as with SLAB_TYPESAFE_BY_RCU),
>>>>>>>> - * so long as the caller is using RCU on the pointer to the fence.
>>>>>>>> - *
>>>>>>>> - * An alternative mechanism is to employ a seqlock to protect a bunch of
>>>>>>>> - * fences, such as used by struct dma_resv. When using a seqlock,
>>>>>>>> - * the seqlock must be taken before and checked after a reference to the
>>>>>>>> - * fence is acquired (as shown here).
>>>>>>>> - *
>>>>>>>> - * The caller is required to hold the RCU read lock.
>>>>>>>> - */
>>>>>>>> -static inline struct dma_fence *
>>>>>>>> -dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
>>>>>>>> -{
>>>>>>>> -     do {
>>>>>>>> -             struct dma_fence *fence;
>>>>>>>> -
>>>>>>>> -             fence = rcu_dereference(*fencep);
>>>>>>>> -             if (!fence)
>>>>>>>> -                     return NULL;
>>>>>>>> -
>>>>>>>> -             if (!dma_fence_get_rcu(fence))
>>>>>>>> -                     continue;
>>>>>>>> -
>>>>>>>> -             /* The atomic_inc_not_zero() inside dma_fence_get_rcu()
>>>>>>>> -              * provides a full memory barrier upon success (such as now).
>>>>>>>> -              * This is paired with the write barrier from assigning
>>>>>>>> -              * to the __rcu protected fence pointer so that if that
>>>>>>>> -              * pointer still matches the current fence, we know we
>>>>>>>> -              * have successfully acquire a reference to it. If it no
>>>>>>>> -              * longer matches, we are holding a reference to some other
>>>>>>>> -              * reallocated pointer. This is possible if the allocator
>>>>>>>> -              * is using a freelist like SLAB_TYPESAFE_BY_RCU where the
>>>>>>>> -              * fence remains valid for the RCU grace period, but it
>>>>>>>> -              * may be reallocated. When using such allocators, we are
>>>>>>>> -              * responsible for ensuring the reference we get is to
>>>>>>>> -              * the right fence, as below.
>>>>>>>> -              */
>>>>>>>> -             if (fence == rcu_access_pointer(*fencep))
>>>>>>>> -                     return rcu_pointer_handoff(fence);
>>>>>>>> -
>>>>>>>> -             dma_fence_put(fence);
>>>>>>>> -     } while (1);
>>>>>>>> -}
>>>>>>>> -
>>>>>>>>     #ifdef CONFIG_LOCKDEP
>>>>>>>>     bool dma_fence_begin_signalling(void);
>>>>>>>>     void dma_fence_end_signalling(bool cookie);
>>>>>>>> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
>>>>>>>> index 562b885cf9c3d..a38c021f379af 100644
>>>>>>>> --- a/include/linux/dma-resv.h
>>>>>>>> +++ b/include/linux/dma-resv.h
>>>>>>>> @@ -248,7 +248,9 @@ dma_resv_get_excl_unlocked(struct dma_resv *obj)
>>>>>>>>                 return NULL;
>>>>>>>>
>>>>>>>>         rcu_read_lock();
>>>>>>>> -     fence = dma_fence_get_rcu_safe(&obj->fence_excl);
>>>>>>>> +     fence = rcu_dereference(obj->fence_excl);
>>>>>>>> +     if (fence)
>>>>>>>> +             fence = dma_fence_get_rcu(fence);
>>>>>>>>         rcu_read_unlock();
>>>>>>>>
>>>>>>>>         return fence;
>>>>>
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C7f22d939bf6146fc14ad08d92c32dc58%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637589419248542906%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rvT5PhpGMLS0mjFyPTfTXQoGFz43rxa6arU5upQZBDk%3D&reserved=0
>>>