[PATCH] drm/amdgpu: add ih waiter on process until checkpoint

Tue Mar 9 09:09:25 UTC 2021

Am 08.03.21 um 22:09 schrieb Kim, Jonathan:
> [SNIP]
>> First of all rptr/wptr are not 32bit, their maximum is 20 or 19 bits IIRC (and
>> they are dw, so 4M or 2M bytes).
>>
> Thanks Christian.  This makes sense now.  I can see how rptrs advance by dword sets in the iv decode helper.
> My apologies, but I'm still a bit confused on the pseudo code below and have a few questions before I give this another go ...
>
>> Then the purpose of the wait_event() is to wait for changes of the rptr, so
>> the matching wake_up() should be at the same place as calling
>> amdgpu_ih_set_rptr().
>>
>> My original idea of the wrap around counter assumes that the counter is
>> updated in amdgpu_ih_process(). That isn't strictly necessary, but it could be
>> better even if it adds some overhead to IH processing.
>>
>> If you want to implement it like below it should look something like this:
>>
>> uint32_t rptr, wptr, tmp;
>>
>> wptr = amdgpu_ih_get_wptr(adev, ih);
>> rmb();
>> rptr = READ_ONCE(ih->rptr);
>>
>> if (rptr > wptr)
>>       rptr += ih->ptr_mask + 1;
> So I think you're trying to be safe by assuming the rptr still needs to overtake the wptr and hasn't done so yet so this is looking ahead by saying the rptr will have to wrap?

I really need to wake up before writing stuff like that. No, the problem 
why you don't understand is that I mixed up rptr and wptr :)

>> wait_event(ig->processing, {
>>       tmp = amdgpu_ih_get_wptr(adev, ih);
>>       tmp |= wptr & ~ih->ptr_mask;
>>       if (tmp < wptr)
>>           tmp += ih->ptr_mask + 1;
>>       wptr = tmp;
> For short term use, I think the caller will try to guarantee that the checkpoint wptr is checked during a time when interrupts won't be generated anymore, but I guess for general use, we can't guarantee this and this is why we're using a moving wptr checkpoint?
>
>>       wptr >= rptr;
> Assuming you mean 'return wptr >= rptr' will unblock if true, I can see this for wptr == rptr, but why wptr > rptr?  What happens if the caller waits when nothing is wrapped but there are entries to be processed? (I'm assuming rptr < wptr can be true with entries that still require processing).  Won't the caller advance without waiting?

No as I wrote above I just completely mixed up things here. Let me try 
once more:

wptr = amdgpu_ih_get_wptr(adev, ih);
rmb();
rptr = READ_ONCE(ih->rptr);

if (rptr > wptr)
      wptr += ih->ptr_mask + 1;

wait_event(ig->processing, {
     tmp = ih->rptr;
     tmp |= rptr & ~ih->ptr_mask;
     if (tmp < rptr)
          tmp += ih->ptr_mask + 1;
     rptr = tmp;
     rptr >= wptr;
});

What I also wanted to note here is that using wait_event() makes things 
much easier since it provides the necessary memory barriers.

>> })
>>
>> I would put the condition of the wait_event into a separate function to make
>> it more readable and not use GCC extension, but essentially that's it.
>>
>> The only problem this could have is that the wptr wraps around multiple
>> times between wake ups. To handle this as well we would need to increment
>> the wrap around counter in amdgpu_ih_process() as well, but this means
>> some extra overhead during IH processing. And then I would also use 64bit
>> counters to make the stuff easier to handle.
> I'm still not fully clear on how 64-bit counters help.  It already looks like you're leveraging the ptr_mask limit of 19/20 bits to increment the unused top bits of the rptr/wptr by adding the rb size.

When you detect the rptr wrap around only on the waiters side there is 
the chance that you miss a wrap around and wait to long.

But I now think that is better than adding overhead to the processing side.

regards,
Christian.

>
> Thanks,
>
> Jon
>
>> Regards,
>> Christian.
>>
>>> Thanks,
>>>
>>> Jon
>>>
>>>> Christian.
>>>>
>>>>> +
>>>>> +     spin_begin();
>>>>> +     while (true) {
>>>>> +             bool rptr_wrapped = false, wptr_wrapped = false;
>>>>> +             u32 rptr, wptr;
>>>>> +
>>>>> +             spin_cpu_relax();
>>>>> +
>>>>> +             /* Update wptr checkpoint/wrap count compare. */
>>>>> +             wptr = amdgpu_ih_get_wptr(adev, ih);
>>>>> +             if (wptr < prev_wptr) {
>>>>> +                     wptr_wrap++;
>>>>> +                     wptr_check = wptr | (wptr_wrap << 32);
>>>>> +                     wptr_wrapped = true;
>>>>> +             }
>>>>> +             prev_wptr = wptr;
>>>>> +
>>>>> +             /* Order wptr with rptr. */
>>>>> +             rmb();
>>>>> +
>>>>> +             /* Update rptr/wrap count compare. */
>>>>> +             rptr = READ_ONCE(ih->rptr);
>>>>> +             if (rptr < prev_rptr) {
>>>>> +                     rptr_wrap++;
>>>>> +                     rptr_wrapped = true;
>>>>> +             }
>>>>> +             rptr_check = rptr | (rptr_wrap << 32);
>>>>> +             prev_rptr = rptr;
>>>>> +
>>>>> +             /* Wrapping occurred so restart. */
>>>>> +             if (rptr_wrapped || wptr_wrapped)
>>>>> +                     continue;
>>>>> +
>>>>> +             /* Exit on reaching or passing checkpoint. */
>>>>> +             if (rptr_check >= wptr_check &&
>>>>> +                                     rptr >= (wptr_check & ih->ptr_mask))
>>>>> +                     break;
>>>>> +     }
>>>>> +     spin_end();
>>>>> +
>>>>> +     return 0;
>>>>> +}
>>>>> +
>>>>>     /**
>>>>>      * amdgpu_ih_process - interrupt handler
>>>>>      *
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>>> index 6ed4a85fc7c3..6817f0a812d2 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>>>> @@ -87,6 +87,8 @@ int amdgpu_ih_ring_init(struct amdgpu_device
>>>>> *adev,
>>>> struct amdgpu_ih_ring *ih,
>>>>>     void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct
>>>> amdgpu_ih_ring *ih);
>>>>>     void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t
>> *iv,
>>>>>                           unsigned int num_dw);
>>>>> +int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
>>>> *adev,
>>>>> +                                     struct amdgpu_ih_ring *ih);
>>>>>     int amdgpu_ih_process(struct amdgpu_device *adev, struct
>>>> amdgpu_ih_ring *ih);
>>>>>     void amdgpu_ih_decode_iv_helper(struct amdgpu_device *adev,
>>>>>                                 struct amdgpu_ih_ring *ih,