[PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
Daniel Vetter
daniel at ffwll.ch
Tue Feb 9 09:56:38 UTC 2016
On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> This fixes a regression introduced by the new drm_update_vblank_count()
> implementation in Linux 4.4:
>
> Restrict the bump of the software vblank counter in drm_update_vblank_count()
> to a safe maximum value of +1 whenever there is the possibility that
> concurrent readers of vblank timestamps could be active at the moment,
> as the current implementation of the timestamp caching and updating is
> not safe against concurrent readers for calls to store_vblank() with a
> bump of anything but +1. A bump != 1 would very likely return corrupted
> timestamps to userspace, because the same slot in the cache could
> be concurrently written by store_vblank() and read by one of those
> readers in a non-atomic fashion and without the read-retry logic
> detecting this collision.
>
> Concurrent readers can exist while drm_update_vblank_count() is called
> from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> irq callers. However, all those calls are happening with the vbl_lock
> locked thereby preventing a drm_vblank_get(), so the vblank refcount
> can't increase while drm_update_vblank_count() is executing. Therefore
> a zero vblank refcount during execution of that function signals that
> is safe for arbitrary counter bumps if called from outside vblank irq,
> whereas a non-zero count is not safe.
>
> Whenever the function is called from vblank irq, we have to assume concurrent
> readers could show up any time during its execution, even if the refcount
> is currently zero, as vblank irqs are usually only enabled due to the
> presence of readers, and because when it is called from vblank irq it
> can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> Therefore also restrict bumps to +1 when the function is called from vblank
> irq.
>
> Such bumps of more than +1 can happen at other times than reenabling
> vblank irqs, e.g., when regular vblank interrupts get delayed by more
> than 1 frame due to long held locks, long irq off periods, realtime
> preemption on RT kernels, or system management interrupts.
>
> Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com>
> Cc: <stable at vger.kernel.org> # 4.4+
> Cc: michel at daenzer.net
> Cc: vbabka at suse.cz
> Cc: ville.syrjala at linux.intel.com
> Cc: daniel.vetter at ffwll.ch
> Cc: dri-devel at lists.freedesktop.org
> Cc: alexander.deucher at amd.com
> Cc: christian.koenig at amd.com
Imo this is duct-tape. If we want to fix this up properly I think we
should just use a full-blown seqlock instead of our hand-rolled one. And
that could handle any increment at all.
-Daniel
> ---
> drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index bcb8528..aa2c74b 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> }
>
> + /*
> + * Restrict the bump of the software vblank counter to a safe maximum
> + * value of +1 whenever there is the possibility that concurrent readers
> + * of vblank timestamps could be active at the moment, as the current
> + * implementation of the timestamp caching and updating is not safe
> + * against concurrent readers for calls to store_vblank() with a bump
> + * of anything but +1. A bump != 1 would very likely return corrupted
> + * timestamps to userspace, because the same slot in the cache could
> + * be concurrently written by store_vblank() and read by one of those
> + * readers without the read-retry logic detecting the collision.
> + *
> + * Concurrent readers can exist when we are called from the
> + * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> + * irq callers. However, all those calls to us are happening with the
> + * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> + * can't increase while we are executing. Therefore a zero refcount at
> + * this point is safe for arbitrary counter bumps if we are called
> + * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> + * we must also accept a refcount of 1, as whenever we are called from
> + * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> + * we must let that one pass through in order to not lose vblank counts
> + * during vblank irq off - which would completely defeat the whole
> + * point of this routine.
> + *
> + * Whenever we are called from vblank irq, we have to assume concurrent
> + * readers exist or can show up any time during our execution, even if
> + * the refcount is currently zero, as vblank irqs are usually only
> + * enabled due to the presence of readers, and because when we are called
> + * from vblank irq we can't hold the vbl_lock to protect us from sudden
> + * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> + * called from vblank irq.
> + */
> + if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> + (flags & DRM_CALLED_FROM_VBLIRQ))) {
> + DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> + "refcount %u, vblirq %u\n", pipe, diff,
> + atomic_read(&vblank->refcount),
> + (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> + diff = 1;
> + }
> +
> DRM_DEBUG_VBL("updating vblank count on crtc %u:"
> " current=%u, diff=%u, hw=%u hw_last=%u\n",
> pipe, vblank->count, diff, cur_vblank, vblank->last);
> --
> 1.9.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dri-devel
mailing list