[PATCH] drm/amdgpu: add ih waiter on process until checkpoint

Sat Mar 6 09:12:08 UTC 2021

Am 05.03.21 um 22:34 schrieb Kim, Jonathan:
> [AMD Official Use Only - Internal Distribution Only]
>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Friday, March 5, 2021 3:18 PM
>> To: Kim, Jonathan <Jonathan.Kim at amd.com>; amd-
>> gfx at lists.freedesktop.org
>> Cc: Yang, Philip <Philip.Yang at amd.com>; Kuehling, Felix
>> <Felix.Kuehling at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: add ih waiter on process until checkpoint
>>
>> [CAUTION: External Email]
>>
>> Am 05.03.21 um 21:11 schrieb Jonathan Kim:
>>> Add IH function to allow caller to wait until ring entries are
>>> processed until the checkpoint write pointer.
>>>
>>> This will be primarily used by HMM to drain pending page fault
>>> interrupts before memory unmap to prevent HMM from handling stale
>> interrupts.
>>> v2: Update title and description to clarify use.
>>> Add rptr/wptr wrap counter checks to guarantee ring entries are
>>> processed until the checkpoint.
>> First of all as I said please use a wait_event, busy waiting is a clear NAK.
> Who would do the wake though?  Are you suggesting wake be done in amdgpu_ih_process?  Or is waiting happening by the caller and this should go somewhere higher (like amdgpu_amdkfd for example)?
>
>>> Signed-off-by: Jonathan Kim <jonathan.kim at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 68
>> +++++++++++++++++++++++++-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h |  2 +
>>>    2 files changed, 69 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> index dc852af4f3b7..954518b4fb79 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
>>> @@ -22,7 +22,7 @@
>>>     */
>>>
>>>    #include <linux/dma-mapping.h>
>>> -
>>> +#include <linux/processor.h>
>>>    #include "amdgpu.h"
>>>    #include "amdgpu_ih.h"
>>>
>>> @@ -160,6 +160,72 @@ void amdgpu_ih_ring_write(struct
>> amdgpu_ih_ring *ih, const uint32_t *iv,
>>>        }
>>>    }
>>>
>>> +/**
>>> + * amdgpu_ih_wait_on_checkpoint_process - wait to process IVs up to
>>> +checkpoint
>>> + *
>>> + * @adev: amdgpu_device pointer
>>> + * @ih: ih ring to process
>>> + *
>>> + * Used to ensure ring has processed IVs up to the checkpoint write
>> pointer.
>>> + */
>>> +int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
>> *adev,
>>> +                                     struct amdgpu_ih_ring *ih) {
>>> +     u64 rptr_check, wptr_check, rptr_wrap = 0, wptr_wrap = 0;
>>> +     u32 prev_rptr, prev_wptr;
>>> +
>>> +     if (!ih->enabled || adev->shutdown)
>>> +             return -ENODEV;
>>> +
>>> +     prev_wptr = amdgpu_ih_get_wptr(adev, ih);
>>> +     /* Order wptr with rptr. */
>>> +     rmb();
>>> +     prev_rptr = READ_ONCE(ih->rptr);
>>> +     rptr_check = prev_rptr | (rptr_wrap << 32);
>>> +     wptr_check = prev_wptr | (wptr_wrap << 32);
>> Hui what? That check doesn't even make remotely sense to me.
> Can you clarify what you meant by creating a new 64 bit compare?
> Snip from your last response:
>
> "This way you do something like this:
> 1. get the wrap around counter.
> 2. get wptr
> 3. get rptr
> 4. compare the wrap around counter with the old one, if it has changed start over with #1
> 5. Use wrap around counter and rptr/wptr to come up with 64bit values.
> 6. Compare wptr with rptr/wrap around counter until we are sure the IHs are processed."
>
>  From a discussion with Felix, I interpreted this as a way to guarantee rptr/wtpr ordering so that rptr monotonically follows wptr per check.
> I'm assuming rptr/wptrs are 32 bits wide by the use of ptr_mask on read/write functions so a respective mask of rptr/wptr wrap count to the top 32 bits would mark how far apart the rptr and wptr are per check.

Mhm, sounds like my description was a bit confusing. Let me try again.

First of all rptr/wptr are not 32bit, their maximum is 20 or 19 bits 
IIRC (and they are dw, so 4M or 2M bytes).

Then the purpose of the wait_event() is to wait for changes of the rptr, 
so the matching wake_up() should be at the same place as calling 
amdgpu_ih_set_rptr().

My original idea of the wrap around counter assumes that the counter is 
updated in amdgpu_ih_process(). That isn't strictly necessary, but it 
could be better even if it adds some overhead to IH processing.

If you want to implement it like below it should look something like this:

uint32_t rptr, wptr, tmp;

wptr = amdgpu_ih_get_wptr(adev, ih);
rmb();
rptr = READ_ONCE(ih->rptr);

if (rptr > wptr)
     rptr += ih->ptr_mask + 1;

wait_event(ig->processing, {
     tmp = amdgpu_ih_get_wptr(adev, ih);
     tmp |= wptr & ~ih->ptr_mask;
     if (tmp < wptr)
         tmp += ih->ptr_mask + 1;
     wptr = tmp;
     wptr >= rptr;
})

I would put the condition of the wait_event into a separate function to 
make it more readable and not use GCC extension, but essentially that's it.

The only problem this could have is that the wptr wraps around multiple 
times between wake ups. To handle this as well we would need to 
increment the wrap around counter in amdgpu_ih_process() as well, but 
this means some extra overhead during IH processing. And then I would 
also use 64bit counters to make the stuff easier to handle.

Regards,
Christian.

>
> Thanks,
>
> Jon
>
>> Christian.
>>
>>> +
>>> +     spin_begin();
>>> +     while (true) {
>>> +             bool rptr_wrapped = false, wptr_wrapped = false;
>>> +             u32 rptr, wptr;
>>> +
>>> +             spin_cpu_relax();
>>> +
>>> +             /* Update wptr checkpoint/wrap count compare. */
>>> +             wptr = amdgpu_ih_get_wptr(adev, ih);
>>> +             if (wptr < prev_wptr) {
>>> +                     wptr_wrap++;
>>> +                     wptr_check = wptr | (wptr_wrap << 32);
>>> +                     wptr_wrapped = true;
>>> +             }
>>> +             prev_wptr = wptr;
>>> +
>>> +             /* Order wptr with rptr. */
>>> +             rmb();
>>> +
>>> +             /* Update rptr/wrap count compare. */
>>> +             rptr = READ_ONCE(ih->rptr);
>>> +             if (rptr < prev_rptr) {
>>> +                     rptr_wrap++;
>>> +                     rptr_wrapped = true;
>>> +             }
>>> +             rptr_check = rptr | (rptr_wrap << 32);
>>> +             prev_rptr = rptr;
>>> +
>>> +             /* Wrapping occurred so restart. */
>>> +             if (rptr_wrapped || wptr_wrapped)
>>> +                     continue;
>>> +
>>> +             /* Exit on reaching or passing checkpoint. */
>>> +             if (rptr_check >= wptr_check &&
>>> +                                     rptr >= (wptr_check & ih->ptr_mask))
>>> +                     break;
>>> +     }
>>> +     spin_end();
>>> +
>>> +     return 0;
>>> +}
>>> +
>>>    /**
>>>     * amdgpu_ih_process - interrupt handler
>>>     *
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>> index 6ed4a85fc7c3..6817f0a812d2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
>>> @@ -87,6 +87,8 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev,
>> struct amdgpu_ih_ring *ih,
>>>    void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct
>> amdgpu_ih_ring *ih);
>>>    void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv,
>>>                          unsigned int num_dw);
>>> +int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
>> *adev,
>>> +                                     struct amdgpu_ih_ring *ih);
>>>    int amdgpu_ih_process(struct amdgpu_device *adev, struct
>> amdgpu_ih_ring *ih);
>>>    void amdgpu_ih_decode_iv_helper(struct amdgpu_device *adev,
>>>                                struct amdgpu_ih_ring *ih,