[PATCH] drm/amdgpu: add ih waiter on process until checkpoint

Mon Mar 8 21:09:27 UTC 2021

[AMD Official Use Only - Internal Distribution Only]

> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Saturday, March 6, 2021 4:12 AM
> To: Kim, Jonathan <Jonathan.Kim at amd.com>; Christian König
> <ckoenig.leichtzumerken at gmail.com>; amd-gfx at lists.freedesktop.org
> Cc: Yang, Philip <Philip.Yang at amd.com>; Kuehling, Felix
> <Felix.Kuehling at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: add ih waiter on process until checkpoint
>
>
>
> Am 05.03.21 um 22:34 schrieb Kim, Jonathan:
> > [AMD Official Use Only - Internal Distribution Only]
> >
> >> -----Original Message-----
> >> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> >> Sent: Friday, March 5, 2021 3:18 PM
> >> To: Kim, Jonathan <Jonathan.Kim at amd.com>; amd-
> >> gfx at lists.freedesktop.org
> >> Cc: Yang, Philip <Philip.Yang at amd.com>; Kuehling, Felix
> >> <Felix.Kuehling at amd.com>; Koenig, Christian
> >> <Christian.Koenig at amd.com>
> >> Subject: Re: [PATCH] drm/amdgpu: add ih waiter on process until
> >> checkpoint
> >>
> >> [CAUTION: External Email]
> >>
> >> Am 05.03.21 um 21:11 schrieb Jonathan Kim:
> >>> Add IH function to allow caller to wait until ring entries are
> >>> processed until the checkpoint write pointer.
> >>>
> >>> This will be primarily used by HMM to drain pending page fault
> >>> interrupts before memory unmap to prevent HMM from handling stale
> >> interrupts.
> >>> v2: Update title and description to clarify use.
> >>> Add rptr/wptr wrap counter checks to guarantee ring entries are
> >>> processed until the checkpoint.
> >> First of all as I said please use a wait_event, busy waiting is a clear NAK.
> > Who would do the wake though?  Are you suggesting wake be done in
> amdgpu_ih_process?  Or is waiting happening by the caller and this should go
> somewhere higher (like amdgpu_amdkfd for example)?
> >
> >>> Signed-off-by: Jonathan Kim <jonathan.kim at amd.com>
> >>> ---
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 68
> >> +++++++++++++++++++++++++-
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h |  2 +
> >>>    2 files changed, 69 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> >>> index dc852af4f3b7..954518b4fb79 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> >>> @@ -22,7 +22,7 @@
> >>>     */
> >>>
> >>>    #include <linux/dma-mapping.h>
> >>> -
> >>> +#include <linux/processor.h>
> >>>    #include "amdgpu.h"
> >>>    #include "amdgpu_ih.h"
> >>>
> >>> @@ -160,6 +160,72 @@ void amdgpu_ih_ring_write(struct
> >> amdgpu_ih_ring *ih, const uint32_t *iv,
> >>>        }
> >>>    }
> >>>
> >>> +/**
> >>> + * amdgpu_ih_wait_on_checkpoint_process - wait to process IVs up to
> >>> +checkpoint
> >>> + *
> >>> + * @adev: amdgpu_device pointer
> >>> + * @ih: ih ring to process
> >>> + *
> >>> + * Used to ensure ring has processed IVs up to the checkpoint write
> >> pointer.
> >>> + */
> >>> +int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
> >> *adev,
> >>> +                                     struct amdgpu_ih_ring *ih) {
> >>> +     u64 rptr_check, wptr_check, rptr_wrap = 0, wptr_wrap = 0;
> >>> +     u32 prev_rptr, prev_wptr;
> >>> +
> >>> +     if (!ih->enabled || adev->shutdown)
> >>> +             return -ENODEV;
> >>> +
> >>> +     prev_wptr = amdgpu_ih_get_wptr(adev, ih);
> >>> +     /* Order wptr with rptr. */
> >>> +     rmb();
> >>> +     prev_rptr = READ_ONCE(ih->rptr);
> >>> +     rptr_check = prev_rptr | (rptr_wrap << 32);
> >>> +     wptr_check = prev_wptr | (wptr_wrap << 32);
> >> Hui what? That check doesn't even make remotely sense to me.
> > Can you clarify what you meant by creating a new 64 bit compare?
> > Snip from your last response:
> >
> > "This way you do something like this:
> > 1. get the wrap around counter.
> > 2. get wptr
> > 3. get rptr
> > 4. compare the wrap around counter with the old one, if it has changed
> > start over with #1 5. Use wrap around counter and rptr/wptr to come up
> with 64bit values.
> > 6. Compare wptr with rptr/wrap around counter until we are sure the IHs
> are processed."
> >
> >  From a discussion with Felix, I interpreted this as a way to guarantee
> rptr/wtpr ordering so that rptr monotonically follows wptr per check.
> > I'm assuming rptr/wptrs are 32 bits wide by the use of ptr_mask on
> read/write functions so a respective mask of rptr/wptr wrap count to the top
> 32 bits would mark how far apart the rptr and wptr are per check.
>
> Mhm, sounds like my description was a bit confusing. Let me try again.
>
> First of all rptr/wptr are not 32bit, their maximum is 20 or 19 bits IIRC (and
> they are dw, so 4M or 2M bytes).
>

Thanks Christian.  This makes sense now.  I can see how rptrs advance by dword sets in the iv decode helper.
My apologies, but I'm still a bit confused on the pseudo code below and have a few questions before I give this another go ...

> Then the purpose of the wait_event() is to wait for changes of the rptr, so
> the matching wake_up() should be at the same place as calling
> amdgpu_ih_set_rptr().
>
> My original idea of the wrap around counter assumes that the counter is
> updated in amdgpu_ih_process(). That isn't strictly necessary, but it could be
> better even if it adds some overhead to IH processing.
>
> If you want to implement it like below it should look something like this:
>
> uint32_t rptr, wptr, tmp;
>
> wptr = amdgpu_ih_get_wptr(adev, ih);
> rmb();
> rptr = READ_ONCE(ih->rptr);
>
> if (rptr > wptr)
>      rptr += ih->ptr_mask + 1;

So I think you're trying to be safe by assuming the rptr still needs to overtake the wptr and hasn't done so yet so this is looking ahead by saying the rptr will have to wrap?

>
> wait_event(ig->processing, {
>      tmp = amdgpu_ih_get_wptr(adev, ih);
>      tmp |= wptr & ~ih->ptr_mask;
>      if (tmp < wptr)
>          tmp += ih->ptr_mask + 1;
>      wptr = tmp;

For short term use, I think the caller will try to guarantee that the checkpoint wptr is checked during a time when interrupts won't be generated anymore, but I guess for general use, we can't guarantee this and this is why we're using a moving wptr checkpoint?

>      wptr >= rptr;

Assuming you mean 'return wptr >= rptr' will unblock if true, I can see this for wptr == rptr, but why wptr > rptr?  What happens if the caller waits when nothing is wrapped but there are entries to be processed? (I'm assuming rptr < wptr can be true with entries that still require processing).  Won't the caller advance without waiting?

> })
>
> I would put the condition of the wait_event into a separate function to make
> it more readable and not use GCC extension, but essentially that's it.
>
> The only problem this could have is that the wptr wraps around multiple
> times between wake ups. To handle this as well we would need to increment
> the wrap around counter in amdgpu_ih_process() as well, but this means
> some extra overhead during IH processing. And then I would also use 64bit
> counters to make the stuff easier to handle.

I'm still not fully clear on how 64-bit counters help.  It already looks like you're leveraging the ptr_mask limit of 19/20 bits to increment the unused top bits of the rptr/wptr by adding the rb size.

Thanks,

Jon

>
> Regards,
> Christian.
>
> >
> > Thanks,
> >
> > Jon
> >
> >> Christian.
> >>
> >>> +
> >>> +     spin_begin();
> >>> +     while (true) {
> >>> +             bool rptr_wrapped = false, wptr_wrapped = false;
> >>> +             u32 rptr, wptr;
> >>> +
> >>> +             spin_cpu_relax();
> >>> +
> >>> +             /* Update wptr checkpoint/wrap count compare. */
> >>> +             wptr = amdgpu_ih_get_wptr(adev, ih);
> >>> +             if (wptr < prev_wptr) {
> >>> +                     wptr_wrap++;
> >>> +                     wptr_check = wptr | (wptr_wrap << 32);
> >>> +                     wptr_wrapped = true;
> >>> +             }
> >>> +             prev_wptr = wptr;
> >>> +
> >>> +             /* Order wptr with rptr. */
> >>> +             rmb();
> >>> +
> >>> +             /* Update rptr/wrap count compare. */
> >>> +             rptr = READ_ONCE(ih->rptr);
> >>> +             if (rptr < prev_rptr) {
> >>> +                     rptr_wrap++;
> >>> +                     rptr_wrapped = true;
> >>> +             }
> >>> +             rptr_check = rptr | (rptr_wrap << 32);
> >>> +             prev_rptr = rptr;
> >>> +
> >>> +             /* Wrapping occurred so restart. */
> >>> +             if (rptr_wrapped || wptr_wrapped)
> >>> +                     continue;
> >>> +
> >>> +             /* Exit on reaching or passing checkpoint. */
> >>> +             if (rptr_check >= wptr_check &&
> >>> +                                     rptr >= (wptr_check & ih->ptr_mask))
> >>> +                     break;
> >>> +     }
> >>> +     spin_end();
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>>    /**
> >>>     * amdgpu_ih_process - interrupt handler
> >>>     *
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> >>> index 6ed4a85fc7c3..6817f0a812d2 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> >>> @@ -87,6 +87,8 @@ int amdgpu_ih_ring_init(struct amdgpu_device
> >>> *adev,
> >> struct amdgpu_ih_ring *ih,
> >>>    void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct
> >> amdgpu_ih_ring *ih);
> >>>    void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t
> *iv,
> >>>                          unsigned int num_dw);
> >>> +int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
> >> *adev,
> >>> +                                     struct amdgpu_ih_ring *ih);
> >>>    int amdgpu_ih_process(struct amdgpu_device *adev, struct
> >> amdgpu_ih_ring *ih);
> >>>    void amdgpu_ih_decode_iv_helper(struct amdgpu_device *adev,
> >>>                                struct amdgpu_ih_ring *ih,