[Intel-gfx] [PATCH] drm/i915: Use fence_write() from rpm resume

Mon Oct 10 13:31:10 UTC 2016

On Mon, Oct 10, 2016 at 04:13:26PM +0300, Ville Syrjälä wrote:
> On Mon, Oct 10, 2016 at 02:07:16PM +0100, Chris Wilson wrote:
> > On Mon, Oct 10, 2016 at 03:41:04PM +0300, Ville Syrjälä wrote:
> > > On Fri, Oct 07, 2016 at 08:31:50PM +0100, Chris Wilson wrote:
> > > > During rpm resume we restore the fences, but we do not have the
> > > > protection of struct_mutex. This rules out updating the activity
> > > > tracking on the fences, and requires us to rely on the rpm as the
> > > > serialisation barrier instead.
> > > > 
> > > > [  350.298052] [drm:intel_runtime_resume [i915]] Resuming device
> > > > [  350.308606]
> > > > [  350.310520] ===============================
> > > > [  350.315560] [ INFO: suspicious RCU usage. ]
> > > > [  350.320554] 4.8.0-rc8-bsw-rapl+ #3133 Tainted: G     U  W
> > > > [  350.327208] -------------------------------
> > > > [  350.331977] ../drivers/gpu/drm/i915/i915_gem_request.h:371 suspicious rcu_dereference_protected() usage!
> > > > [  350.342619]
> > > > [  350.342619] other info that might help us debug this:
> > > > [  350.342619]
> > > > [  350.351593]
> > > > [  350.351593] rcu_scheduler_active = 1, debug_locks = 0
> > > > [  350.358952] 3 locks held by Xorg/320:
> > > > [  350.363077]  #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa030589c>] drm_modeset_lock_all+0x3c/0xd0 [drm]
> > > > [  350.375162]  #1:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa03058a6>] drm_modeset_lock_all+0x46/0xd0 [drm]
> > > > [  350.387022]  #2:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffa0305056>] drm_modeset_lock+0x36/0x110 [drm]
> > > > [  350.398236]
> > > > [  350.398236] stack backtrace:
> > > > [  350.403196] CPU: 1 PID: 320 Comm: Xorg Tainted: G     U  W       4.8.0-rc8-bsw-rapl+ #3133
> > > > [  350.412457] Hardware name: Intel Corporation CHERRYVIEW C0 PLATFORM/Braswell CRB, BIOS BRAS.X64.X088.R00.1510270350 10/27/2015
> > > > [  350.425212]  0000000000000000 ffff8801680a78c8 ffffffff81332187 ffff88016c5c5000
> > > > [  350.433611]  0000000000000001 ffff8801680a78f8 ffffffff810ca6da ffff88016cc8b0f0
> > > > [  350.442012]  ffff88016cc80000 ffff88016cc80000 ffff880177ad0000 ffff8801680a7948
> > > > [  350.450409] Call Trace:
> > > > [  350.453165]  [<ffffffff81332187>] dump_stack+0x67/0x90
> > > > [  350.458931]  [<ffffffff810ca6da>] lockdep_rcu_suspicious+0xea/0x120
> > > > [  350.466002]  [<ffffffffa039e8dd>] fence_update+0xbd/0x670 [i915]
> > > > [  350.472766]  [<ffffffffa039efe2>] i915_gem_restore_fences+0x52/0x70 [i915]
> > > > [  350.480496]  [<ffffffffa0368f42>] vlv_resume_prepare+0x72/0x570 [i915]
> > > > [  350.487839]  [<ffffffffa0369802>] intel_runtime_resume+0x102/0x210 [i915]
> > > > [  350.495442]  [<ffffffff8137f26f>] pci_pm_runtime_resume+0x7f/0xb0
> > > > [  350.502274]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
> > > > [  350.509883]  [<ffffffff814401c5>] __rpm_callback+0x35/0x70
> > > > [  350.516037]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
> > > > [  350.523646]  [<ffffffff81440224>] rpm_callback+0x24/0x80
> > > > [  350.529604]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
> > > > [  350.537212]  [<ffffffff814417bd>] rpm_resume+0x4ad/0x740
> > > > [  350.543161]  [<ffffffff81441aa1>] __pm_runtime_resume+0x51/0x80
> > > > [  350.549824]  [<ffffffffa03889c8>] intel_runtime_pm_get+0x28/0x90 [i915]
> > > > [  350.557265]  [<ffffffffa0388a53>] intel_display_power_get+0x23/0x50 [i915]
> > > > [  350.565001]  [<ffffffffa03ef23d>] intel_atomic_commit_tail+0xdfd/0x10b0 [i915]
> > > > [  350.573106]  [<ffffffffa034b2e9>] ? drm_atomic_helper_swap_state+0x159/0x300 [drm_kms_helper]
> > > > [  350.582659]  [<ffffffff81615091>] ? _raw_spin_unlock+0x31/0x50
> > > > [  350.589205]  [<ffffffffa034b2e9>] ? drm_atomic_helper_swap_state+0x159/0x300 [drm_kms_helper]
> > > > [  350.598787]  [<ffffffffa03ef8a5>] intel_atomic_commit+0x3b5/0x500 [i915]
> > > > [  350.606319]  [<ffffffffa03061dc>] ? drm_atomic_set_crtc_for_connector+0xcc/0x100 [drm]
> > > > [  350.615209]  [<ffffffffa0306b49>] drm_atomic_commit+0x49/0x50 [drm]
> > > > [  350.622242]  [<ffffffffa034dee8>] drm_atomic_helper_set_config+0x88/0xc0 [drm_kms_helper]
> > > > [  350.631419]  [<ffffffffa02f94ac>] drm_mode_set_config_internal+0x6c/0x120 [drm]
> > > > [  350.639623]  [<ffffffffa02fa94c>] drm_mode_setcrtc+0x22c/0x4d0 [drm]
> > > > [  350.646760]  [<ffffffffa02f0f19>] drm_ioctl+0x209/0x460 [drm]
> > > > [  350.653217]  [<ffffffffa02fa720>] ? drm_mode_getcrtc+0x150/0x150 [drm]
> > > > [  350.660536]  [<ffffffff810c984a>] ? __lock_is_held+0x4a/0x70
> > > > [  350.666885]  [<ffffffff81202303>] do_vfs_ioctl+0x93/0x6b0
> > > > [  350.672939]  [<ffffffff8120f843>] ? __fget+0x113/0x200
> > > > [  350.678797]  [<ffffffff8120f735>] ? __fget+0x5/0x200
> > > > [  350.684361]  [<ffffffff81202964>] SyS_ioctl+0x44/0x80
> > > > [  350.690030]  [<ffffffff81001deb>] do_syscall_64+0x5b/0x120
> > > > [  350.696184]  [<ffffffff81615ada>] entry_SYSCALL64_slow_path+0x25/0x25
> > > > 
> > > > Note we also have to remember the lesson from commit 4fc788f5ee3d
> > > > ("drm/i915: Flush delayed fence releases after reset") where we have to
> > > > flush any changes to the fence on restore.
> > > > 
> > > > Fixes: 49ef5294cda2 ("drm/i915: Move fence tracking from object to vma")
> > > > Reported-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem_fence.c | 15 +++++++++++++--
> > > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
> > > > index 834d649496f3..1d6d95cef7f5 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_fence.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_fence.c
> > > > @@ -371,6 +371,12 @@ void i915_gem_restore_fences(struct drm_device *dev)
> > > >  	struct drm_i915_private *dev_priv = to_i915(dev);
> > > >  	int i;
> > > >  
> > > > +	/* Note that this may be called outside of struct_mutex, by
> > > > +	 * runtime suspend/resume. The barrier we require is enforced by
> > > > +	 * rpm itself - all access to fences/GTT are only within an rpm
> > > > +	 * wakeref, and to acquire that wakeref you must pass through here.
> > > > +	 */
> > > > +
> > > >  	for (i = 0; i < dev_priv->num_fence_regs; i++) {
> > > >  		struct drm_i915_fence_reg *reg = &dev_priv->fence_regs[i];
> > > >  		struct i915_vma *vma = reg->vma;
> > > > @@ -379,10 +385,15 @@ void i915_gem_restore_fences(struct drm_device *dev)
> > > >  		 * Commit delayed tiling changes if we have an object still
> > > >  		 * attached to the fence, otherwise just clear the fence.
> > > >  		 */
> > > > -		if (vma && !i915_gem_object_is_tiled(vma->obj))
> > > > +		if (vma && !i915_gem_object_is_tiled(vma->obj)) {
> > > > +			list_move(&reg->link, &dev_priv->mm.fence_list);
> > > > +			i915_gem_release_mmap(vma->obj);
> > > 
> > > That guy still has a lockdep_assert_held(struct_mutex);
> > 
> > He isn't meant to for the reason above. The patches to remove that have
> > been delayed for a few years.
> 
> Actually, why do we even call that thing from here? Surely the mmaps
> have been nuked before we suspended the device?

This path is taken on GPU reset as well.... But if the fence is dirty
(as implied by the vma being associated with the fence, but now
untiled), then the object will also be unmapped.

Can replace with GEM_BUG_ON(vma->obj->fault_mappable) and
GEM_BUG_ON(fence->dirty).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre