[Intel-gfx] [PATCH 05/50] drm/i915: Introduce struct intel_wakeref

Fri Apr 12 13:12:26 UTC 2019

Quoting Tvrtko Ursulin (2019-04-12 13:59:30)
> 
> On 12/04/2019 09:53, Chris Wilson wrote:
> > For controlling runtime pm of the GT and engines, we would like to have
> > a callback to do extra work the first time we wake up and the last time
> > we drop the wakeref. This first/last access needs serialisation and so
> > we encompass a mutex with the regular intel_wakeref_t tracker.
> > 
> > v2: Drop the _once naming and report the errors.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc; Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> >   drivers/gpu/drm/i915/Makefile             |   1 +
> >   drivers/gpu/drm/i915/Makefile.header-test |   3 +-
> >   drivers/gpu/drm/i915/i915_drv.h           |   3 +-
> >   drivers/gpu/drm/i915/intel_wakeref.c      |  62 ++++++++++
> >   drivers/gpu/drm/i915/intel_wakeref.h      | 133 ++++++++++++++++++++++
> >   5 files changed, 199 insertions(+), 3 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/intel_wakeref.c
> >   create mode 100644 drivers/gpu/drm/i915/intel_wakeref.h
> > 
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 40130cf5c003..233bad5e361f 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -50,6 +50,7 @@ i915-y += i915_drv.o \
> >         intel_device_info.o \
> >         intel_pm.o \
> >         intel_runtime_pm.o \
> > +       intel_wakeref.o \
> >         intel_uncore.o
> >   
> >   # core library code
> > diff --git a/drivers/gpu/drm/i915/Makefile.header-test b/drivers/gpu/drm/i915/Makefile.header-test
> > index 96a5d90629ec..e6b3e7588860 100644
> > --- a/drivers/gpu/drm/i915/Makefile.header-test
> > +++ b/drivers/gpu/drm/i915/Makefile.header-test
> > @@ -31,7 +31,8 @@ header_test := \
> >       intel_psr.h \
> >       intel_sdvo.h \
> >       intel_sprite.h \
> > -     intel_tv.h
> > +     intel_tv.h \
> > +     intel_wakeref.h
> >   
> >   quiet_cmd_header_test = HDRTEST $@
> >         cmd_header_test = echo "\#include \"$(<F)\"" > $@
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index fad5306f07da..62a7e91acd7f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -74,6 +74,7 @@
> >   #include "intel_opregion.h"
> >   #include "intel_uc.h"
> >   #include "intel_uncore.h"
> > +#include "intel_wakeref.h"
> >   #include "intel_wopcm.h"
> >   
> >   #include "i915_gem.h"
> > @@ -134,8 +135,6 @@ bool i915_error_injected(void);
> >       __i915_printk(i915, i915_error_injected() ? KERN_DEBUG : KERN_ERR, \
> >                     fmt, ##__VA_ARGS__)
> >   
> > -typedef depot_stack_handle_t intel_wakeref_t;
> > -
> >   enum hpd_pin {
> >       HPD_NONE = 0,
> >       HPD_TV = HPD_NONE,     /* TV is known to be unreliable */
> > diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c
> > new file mode 100644
> > index 000000000000..f4cfaa154303
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/intel_wakeref.c
> > @@ -0,0 +1,62 @@
> > +/*
> > + * SPDX-License-Identifier: MIT
> > + *
> > + * Copyright © 2019 Intel Corporation
> > + */
> > +
> > +#include "intel_drv.h"
> > +#include "intel_wakeref.h"
> > +
> > +int __intel_wakeref_get_first(struct drm_i915_private *i915,
> > +                           struct intel_wakeref *wf,
> > +                           int (*fn)(struct intel_wakeref *wf))
> > +{
> > +     /*
> > +      * Treat get/put as different subclasses, as we may need to run
> > +      * the put callback from under the shrinker and do not want to
> > +      * cross-contanimate that callback with any extra work performed
> > +      * upon acquiring the wakeref.
> > +      */
> 
> So you want to seralize the get/put callbacks but don't want lockdep to 
> see any chains resulting from that. Sounds worrying.
> 
> What if you moved them (callbacks) outside wf->mutex then and mandate 
> the callback pairs to serialize themselves?

Doesn't work for the purpose -- we do want the global transition of
first/last use serialised with all other users, not the callbacks
themselves, and that requires the mutex across the entire 0<->1
transition. For example, we use the knowledge that there is a
global/engine mutex held to insert a GPU context switch on parking,
outside of a context where we would be allow to take any of the mutexes
required for doing so.

It's exactly the same problem we have with context pinning and
elsewhere, intel_context_pin() is expensive and cannot allow
allocations; intel_context_unpin() is cheap and may be called from the
shrinker.

> > +     mutex_lock_nested(&wf->mutex, SINGLE_DEPTH_NESTING);
> > +     if (!atomic_read(&wf->count)) {
> > +             int err;
> > +
> > +             wf->wakeref = intel_runtime_pm_get(i915);
> > +
> > +             err = fn(wf);
> > +             if (unlikely(err)) {
> > +                     intel_runtime_pm_put(i915, wf->wakeref);
> > +                     mutex_unlock(&wf->mutex);
> > +                     return err;
> > +             }
> > +
> > +             smp_mb__before_atomic(); /* release wf->count */
> 
> These ones are like usually "What.." for me.
> 
> According to the docs this line would suggest you want some other memory 
> write to become visible before the atomic_inc below. Which one?

All the writes between atomic_read() and atomic_inc().

atomic_inc() broadcasts that you do not need the lock (or barrier) to
see the results, so therefore all the results must be visible beforehand.

We need to turn atomic_inc() into an atomic_inc_release/atomic_inc_unlock.
-Chris