[PATCH 1/2] drm: Mark up accesses of vblank->enabled outside of its spinlock

Fri Mar 17 11:25:32 UTC 2017

On Fri, Mar 17, 2017 at 10:19:51AM +0000, Chris Wilson wrote:
> On Fri, Mar 17, 2017 at 11:47:51AM +0200, Ville Syrjälä wrote:
> > On Thu, Mar 16, 2017 at 11:47:48PM +0000, Chris Wilson wrote:
> > > @@ -360,7 +358,7 @@ static void vblank_disable_fn(unsigned long arg)
> > >  	unsigned long irqflags;
> > >  
> > >  	spin_lock_irqsave(&dev->vbl_lock, irqflags);
> > > -	if (atomic_read(&vblank->refcount) == 0 && vblank->enabled) {
> > > +	if (atomic_read(&vblank->refcount) == 0 && READ_ONCE(vblank->enabled)) {
> > 
> > Hmm. Aren't most of these accesses inside the lock? Looks like you're
> > marking everything READ/WRITE_ONCE()?
> 
> There's like 3 different locks here. Afaict, the correct one for
> serialising vblank->enabled was dev->vblank_time_lock. Every access
> outside of that lock, I marked up as READ_ONCE.
> 
> Oh, you are using vbl_lock as the barrier? That's not as clear for
> disable:
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 53a526c7b24d..f447ed07ef95 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -325,6 +325,8 @@ static void vblank_disable_and_save(struct drm_device *dev, unsigned int pipe)
>         struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>         unsigned long irqflags;
>  
> +       assert_spin_locked(&dev->vbl_lock);
> +
>         /* Prevent vblank irq processing while disabling vblank irqs,
>          * so no updates of timestamps or count can happen after we've
>          * disabled. Needed to prevent races in case of delayed irq's.
> 
> 
> > > @@ -1714,6 +1717,9 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
> > >  	if (WARN_ON(pipe >= dev->num_crtcs))
> > >  		return false;
> > >  
> > > +	if (!READ_ONCE(vblank->enabled))
> > > +		return false;
> > 
> > This to me looks like it could theoretically cause us to
> > miss an interrupt.
> > 
> > 1. enable_irq()
> > 2. drm_update_vblank_count()
> > 3. irq fires
> > 4. drm_handle_vblank() doesn't do anything
> > 5. enabled=true
> 
> Sure. There's a danger you miss the irq anyway, and so the last action
> after enabling the interrupt should be to process any completed events -
> that's implicitly handled by enabling the interrupt in advance of adding
> the first event. In the scenario above, there should be nothing to do.

That's a slightly different scenario since you're only thinking about 
actually enabling the interrupt vs. calling drm_update_vblank_count().
That is fine as is. But with the extra 'enabled' check in the interrupt
handler you're effectively reversing that order to enable the interrupt
after drm_update_vblank_count(). Hence we can lose an interrupt now.

Of course we would eventually get another interrupt, and thanks to the
hw frame counter and whatnot we wouldn't actually lose our place entirely,
but we would see the counter jump by two frames when we actually handle
the interrupt.

-- 
Ville Syrjälä
Intel OTC