[Intel-gfx] [PATCH 01/62] drm/i915: Only start retire worker when idle

Wed Jun 8 10:53:15 UTC 2016

On Tue, Jun 07, 2016 at 02:31:07PM +0300, Joonas Lahtinen wrote:
> On pe, 2016-06-03 at 17:36 +0100, Chris Wilson wrote:
> >  i915_gem_idle_work_handler(struct work_struct *work)
> >  {
> >  	struct drm_i915_private *dev_priv =
> > -		container_of(work, typeof(*dev_priv), mm.idle_work.work);
> > +		container_of(work, typeof(*dev_priv), gt.idle_work.work);
> >  	struct drm_device *dev = dev_priv->dev;
> >  	struct intel_engine_cs *engine;
> >  
> > -	for_each_engine(engine, dev_priv)
> > -		if (!list_empty(&engine->request_list))
> > -			return;
> > +	if (!READ_ONCE(dev_priv->gt.awake))
> > +		return;
> >  
> > -	/* we probably should sync with hangcheck here, using cancel_work_sync.
> > -	 * Also locking seems to be fubar here, engine->request_list is protected
> > -	 * by dev->struct_mutex. */
> > +	mutex_lock(&dev->struct_mutex);
> > +	if (dev_priv->gt.active_engines)
> > +		goto out;
> >  
> > -	intel_mark_idle(dev_priv);
> > +	for_each_engine(engine, dev_priv)
> > +		i915_gem_batch_pool_fini(&engine->batch_pool);
> >  
> > -	if (mutex_trylock(&dev->struct_mutex)) {
> > -		for_each_engine(engine, dev_priv)
> > -			i915_gem_batch_pool_fini(&engine->batch_pool);
> > +	GEM_BUG_ON(!dev_priv->gt.awake);
> > +	dev_priv->gt.awake = false;
> >  
> > -		mutex_unlock(&dev->struct_mutex);
> > +	if (INTEL_INFO(dev_priv)->gen >= 6)
> > +		gen6_rps_idle(dev_priv);
> > +	intel_runtime_pm_put(dev_priv);
> > +out:
> > +	mutex_unlock(&dev->struct_mutex);
> > +
> > +	if (!dev_priv->gt.awake &&
> 
> No READ_ONCE here, even we just unlocked the mutex. So lacks some
> consistency.
> 
> Also, this assumes we might be pre-empted between unlocking mutex and
> making this test, so I'm little bit confused. Do you want to optimize
> by avoiding calling cancel_delayed_work_sync?

General principle to never call work_sync functions with locks held. I
had actually thought I had fixed this up (but realized that I just
rewrote hangcheck later on instead ;)

Ok, what I think is safer here is

	bool hangcheck = cancel_delay_work_sync(hangcheck_work)

	mutex_lock()
	if (actually_idle()) {
		awake = false;
		missed_irq_rings |= intel_kick_waiters();
	}
	mutex_unlock();

	if (awake && hangcheck)
		queue_hangcheck()

So always kick the hangcheck and reeanble if we tried to idle too early.
This will potentially delay hangcheck by one full hangcheck period if we
do encounter that race. But we shouldn't be hitting this race that
often, or hanging the GPU for that mterr.

> > index 166f1a3829b0..d0cd9a1aa80e 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -372,13 +372,13 @@ struct intel_engine_cs {
> >  };
> >  
> >  static inline bool
> > -intel_engine_initialized(struct intel_engine_cs *engine)
> > +intel_engine_initialized(const struct intel_engine_cs *engine)
> >  {
> >  	return engine->i915 != NULL;
> >  }
> >  
> >  static inline unsigned
> > -intel_engine_flag(struct intel_engine_cs *engine)
> > +intel_engine_flag(const struct intel_engine_cs *engine)
> >  {
> >  	return 1 << engine->id;
> >  }
> 
> I think majority of our functions are not const-correct, I remember
> some grunting on the subject when I tried to change some to be. But I'm
> all for it myself.

Not yet, a few more gradual drive bys and we'll be in position for a
grander scheme of correctness ;)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre