[PATCH] drm/amdgpu: fix amdgpu_amdkfd_remove_eviction_fence

Thu Aug 16 06:43:23 UTC 2018

Am 15.08.2018 um 20:49 schrieb Felix Kuehling:
> On 2018-08-15 02:27 PM, Christian König wrote:
>> Am 15.08.2018 um 20:17 schrieb Felix Kuehling:
>>> On 2018-08-15 03:02 AM, Christian König wrote:
>>>> Hi Felix,
>>>>
>>>> yeah, you pretty much nailed it.
>>>>
>>>> The problem is that the array itself is RCU protected. This means that
>>>> you can only copy the whole structure when you want to update it.
>>>>
>>>> The exception is reservation_object_add_shared() which only works
>>>> because we replace an either signaled fence or a fence within the same
>>>> context but a later sequence number.
>>>>
>>>> This also explains why this is only a band aid and the whole approach
>>>> of removing fences doesn't work in general. At any time somebody could
>>>> have taken an RCU reference to the old array, created a copy of it and
>>>> is now still using this one.
>>>>
>>>> The only 100% correct solution would be to mark the existing fence as
>>>> signaled and replace it everywhere else.
>>> Depends what you're trying to achieve. I think the problem you see is,
>>> that some reader may still be using the old reservation_object_list copy
>>> with the fence still in it. But, the remaining lifetime of the
>>> reservation_object_list copy is limited. If we wanted to be sure no more
>>> copies with the old fence exist, all we'd need to do is call
>>> synchronize_rcu. Maybe we need to do that before releasing the fence
>>> references, or release the fence reference in an RCU callback to be
>>> safe.
>> The assumption that the fence would die with the array is what is
>> incorrect here.
> I'm making no such assumption. The point is not to destroy the fence.
> Only to remove the fence reference from the reservation object. That is,
> we want any BO with this reservation object to stop checking or waiting
> for our eviction fence.

Maybe I'm overthinking this, but let me explain the point once more.

See reservation_object_wait_timeout_rcu() for an example of the problem 
I see here:
>         seq = read_seqcount_begin(&obj->seq);
>         rcu_read_lock();
....
>                         if (!dma_fence_get_rcu(lfence))
>                                 goto unlock_retry;
...
>         rcu_read_unlock();
...
>                 if (read_seqcount_retry(&obj->seq, seq)) {
>                         dma_fence_put(fence);
>                         goto retry;
>                 }
...
>                 ret = dma_fence_wait_timeout(fence, intr, ret);

In other words the RCU mechanism guarantees that we also wait for newly 
added fences, but it does not guarantee that we don't wait for a fence 
which is already removed.

>> The lifetime of RCUed array object is limit, but there is absolutely
>> no guarantee that somebody didn't made a copy of the fences.
>>
>> E.g. somebody could have called reservation_object_get_fences_rcu(),
>> reservation_object_copy_fences() or a concurrent
>> reservation_object_wait_timeout_rcu() is underway.
> That's fine. Then there will be additional fence references. When we
> remove a fence from a reservation object, we don't know and don't care
> who else is holding a reference to the fence anyway. This is no different.

So the KFD implementation doesn't care that the fence is removed from a 
BO but somebody could still start to wait on it because of the BO?

I mean it could be that in the worst case we race and stop a KFD process 
for no good reason.

Regards,
Christian.

>
> Regards,
>    Felix
>
>> That's also the reason why fences live for much longer than their
>> signaling, e.g. somebody can have a reference to the fence object even
>> hours after it is signaled.
>>
>> Regards,
>> Christian.
>>
>>> Regards,
>>>     Felix
>>>
>>>> Going to fix the copy&paste error I made with the sequence number and
>>>> send it out again.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 14.08.2018 um 22:17 schrieb Felix Kuehling:
>>>>> [+Harish]
>>>>>
>>>>> I think this looks good for the most part. See one comment inline
>>>>> below.
>>>>>
>>>>> But bear with me while I'm trying to understand what was wrong with
>>>>> the
>>>>> old code. Please correct me if I get this wrong or point out anything
>>>>> I'm missing.
>>>>>
>>>>> The reservation_object_list looks to be protected by a combination of
>>>>> three mechanism:
>>>>>
>>>>>      * Holding the reservation object
>>>>>      * RCU
>>>>>      * seqcount
>>>>>
>>>>> Updating the fence list requires holding the reservation object. But
>>>>> there are some readers that can be called without holding that lock
>>>>> (reservation_object_copy_fences, reservation_object_get_fences_rcu,
>>>>> reservation_object_wait_timeout_rcu,
>>>>> reservation_object_test_signaled_rcu). They rely on RCU to work on a
>>>>> copy and seqcount to make sure they had the most up-to-date
>>>>> information.
>>>>> So any function updating the fence lists needs to do RCU and seqcount
>>>>> correctly to prevent breaking those readers.
>>>>>
>>>>> As I understand it, RCU with seqcount retry just means that readers
>>>>> will
>>>>> spin retrying while there are writers, and writers are never
>>>>> blocked by
>>>>> readers. Writers are blocked by each other, because they need to hold
>>>>> the reservation.
>>>>>
>>>>> The code you added looks a lot like
>>>>> reservation_object_add_shared_replace, which removes fences that have
>>>>> signalled, and atomically replaces obj->fences with a new
>>>>> reservation_fence_list. That atomicity is important because each
>>>>> pointer
>>>>> in the obj->fences->shared array is separately protected by RCU,
>>>>> but not
>>>>> the array as a whole or the shared_count.
>>>>>
>>>>> See one comment inline.
>>>>>
>>>>> Regards,
>>>>>      Felix
>>>>>
>>>>> On 2018-08-14 03:39 AM, Christian König wrote:
>>>>>> Fix quite a number of bugs here. Unfortunately only compile tested.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 103
>>>>>> ++++++++++-------------
>>>>>>     1 file changed, 46 insertions(+), 57 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> index fa38a960ce00..dece31516dc4 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> @@ -206,11 +206,9 @@ static int
>>>>>> amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
>>>>>>                         struct amdgpu_amdkfd_fence ***ef_list,
>>>>>>                         unsigned int *ef_count)
>>>>>>     {
>>>>>> -    struct reservation_object_list *fobj;
>>>>>> -    struct reservation_object *resv;
>>>>>> -    unsigned int i = 0, j = 0, k = 0, shared_count;
>>>>>> -    unsigned int count = 0;
>>>>>> -    struct amdgpu_amdkfd_fence **fence_list;
>>>>>> +    struct reservation_object *resv = bo->tbo.resv;
>>>>>> +    struct reservation_object_list *old, *new;
>>>>>> +    unsigned int i, j, k;
>>>>>>           if (!ef && !ef_list)
>>>>>>             return -EINVAL;
>>>>>> @@ -220,76 +218,67 @@ static int
>>>>>> amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
>>>>>>             *ef_count = 0;
>>>>>>         }
>>>>>>     -    resv = bo->tbo.resv;
>>>>>> -    fobj = reservation_object_get_list(resv);
>>>>>> -
>>>>>> -    if (!fobj)
>>>>>> +    old = reservation_object_get_list(resv);
>>>>>> +    if (!old)
>>>>>>             return 0;
>>>>>>     -    preempt_disable();
>>>>>> -    write_seqcount_begin(&resv->seq);
>>>>>> +    new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
>>>>>> +              GFP_KERNEL);
>>>>>> +    if (!new)
>>>>>> +        return -ENOMEM;
>>>>>>     -    /* Go through all the shared fences in the resevation
>>>>>> object. If
>>>>>> -     * ef is specified and it exists in the list, remove it and
>>>>>> reduce the
>>>>>> -     * count. If ef is not specified, then get the count of
>>>>>> eviction fences
>>>>>> -     * present.
>>>>>> +    /* Go through all the shared fences in the resevation object
>>>>>> and sort
>>>>>> +     * the interesting ones to the end of the list.
>>>>>>          */
>>>>>> -    shared_count = fobj->shared_count;
>>>>>> -    for (i = 0; i < shared_count; ++i) {
>>>>>> +    for (i = 0, j = old->shared_count, k = 0; i <
>>>>>> old->shared_count; ++i) {
>>>>>>             struct dma_fence *f;
>>>>>>     -        f = rcu_dereference_protected(fobj->shared[i],
>>>>>> +        f = rcu_dereference_protected(old->shared[i],
>>>>>>                               reservation_object_held(resv));
>>>>>>     -        if (ef) {
>>>>>> -            if (f->context == ef->base.context) {
>>>>>> -                dma_fence_put(f);
>>>>>> -                fobj->shared_count--;
>>>>>> -            } else {
>>>>>> -                RCU_INIT_POINTER(fobj->shared[j++], f);
>>>>>> -            }
>>>>>> -        } else if (to_amdgpu_amdkfd_fence(f))
>>>>>> -            count++;
>>>>>> +        if ((ef && f->context == ef->base.context) ||
>>>>>> +            (!ef && to_amdgpu_amdkfd_fence(f)))
>>>>>> +            RCU_INIT_POINTER(new->shared[--j], f);
>>>>>> +        else
>>>>>> +            RCU_INIT_POINTER(new->shared[k++], f);
>>>>>>         }
>>>>>> -    write_seqcount_end(&resv->seq);
>>>>>> -    preempt_enable();
>>>>>> +    new->shared_max = old->shared_max;
>>>>>> +    new->shared_count = k;
>>>>>>     -    if (ef || !count)
>>>>>> -        return 0;
>>>>>> +    if (!ef) {
>>>>>> +        unsigned int count = old->shared_count - j;
>>>>>>     -    /* Alloc memory for count number of eviction fence pointers.
>>>>>> Fill the
>>>>>> -     * ef_list array and ef_count
>>>>>> -     */
>>>>>> -    fence_list = kcalloc(count, sizeof(struct amdgpu_amdkfd_fence
>>>>>> *),
>>>>>> -                 GFP_KERNEL);
>>>>>> -    if (!fence_list)
>>>>>> -        return -ENOMEM;
>>>>>> +        /* Alloc memory for count number of eviction fence
>>>>>> pointers. Fill the
>>>>>> +         * ef_list array and ef_count
>>>>>> +         */
>>>>>> +        *ef_list = kcalloc(count, sizeof(**ef_list), GFP_KERNEL);
>>>>>> +        *ef_count = count;
>>>>>>     +        if (!*ef_list) {
>>>>>> +            kfree(new);
>>>>>> +            return -ENOMEM;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    /* Install the new fence list, seqcount provides the barriers */
>>>>>> +    preempt_disable();
>>>>>> +    write_seqcount_begin(&resv->seq);
>>>>>> +    RCU_INIT_POINTER(resv->fence, new);
>>>>>>         preempt_disable();
>>>>>>         write_seqcount_begin(&resv->seq);
>>>>> You're disabling preemption and calling write_seqcount_begin twice. I
>>>>> think this is meant to be
>>>>>
>>>>>       write_seqcount_end(&resv->seq);
>>>>>       preempt_enable();
>>>>>
>>>>>
>>>>>>     -    j = 0;
>>>>>> -    for (i = 0; i < shared_count; ++i) {
>>>>>> +    /* Drop the references to the removed fences or move them to
>>>>>> ef_list */
>>>>>> +    for (i = j, k = 0; i < old->shared_count; ++i) {
>>>>>>             struct dma_fence *f;
>>>>>> -        struct amdgpu_amdkfd_fence *efence;
>>>>>>     -        f = rcu_dereference_protected(fobj->shared[i],
>>>>>> -            reservation_object_held(resv));
>>>>>> -
>>>>>> -        efence = to_amdgpu_amdkfd_fence(f);
>>>>>> -        if (efence) {
>>>>>> -            fence_list[k++] = efence;
>>>>>> -            fobj->shared_count--;
>>>>>> -        } else {
>>>>>> -            RCU_INIT_POINTER(fobj->shared[j++], f);
>>>>>> -        }
>>>>>> +        f = rcu_dereference_protected(new->shared[i],
>>>>>> +                          reservation_object_held(resv));
>>>>>> +        if (!ef)
>>>>>> +            (*ef_list)[k++] = to_amdgpu_amdkfd_fence(f);
>>>>>> +        else
>>>>>> +            dma_fence_put(f);
>>>>>>         }
>>>>>> -
>>>>>> -    write_seqcount_end(&resv->seq);
>>>>>> -    preempt_enable();
>>>>>> -
>>>>>> -    *ef_list = fence_list;
>>>>>> -    *ef_count = k;
>>>>>> +    kfree_rcu(old, rcu);
>>>>>>           return 0;
>>>>>>     }
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx