[RFC v7 03/11] drm/vblank: Add vblank works

Fri Jun 26 21:18:29 UTC 2020

On Wed, Jun 24, 2020 at 07:03:10PM -0400, Lyude Paul wrote:
> Add some kind of vblank workers. The interface is similar to regular
> delayed works, and is mostly based off kthread_work. It allows for
> scheduling delayed works that execute once a particular vblank sequence
> has passed. It also allows for accurate flushing of scheduled vblank
> works - in that flushing waits for both the vblank sequence and job
> execution to complete, or for the work to get cancelled - whichever
> comes first.
> 
> Whatever hardware programming we do in the work must be fast (must at
> least complete during the vblank or scanout period, sometimes during the
> first few scanlines of the vblank). As such we use a high-priority
> per-CRTC thread to accomplish this.
> 
> Changes since v6:
> * Get rid of ->pending and seqcounts, and implement flushing through
>   simpler means - danvet
> * Get rid of work_lock, just use drm_device->event_lock
> * Move drm_vblank_work item cleanup into drm_crtc_vblank_off() so that
>   we ensure that all vblank work has finished before disabling vblanks
> * Add checks into drm_crtc_vblank_reset() so we yell if it gets called
>   while there's vblank workers active
> * Grab event_lock in both drm_crtc_vblank_on()/drm_crtc_vblank_off(),
>   the main reason for this is so that other threads calling
>   drm_vblank_work_schedule() are blocked from attempting to schedule
>   while we're in the middle of enabling/disabling vblanks.
> * Move drm_handle_vblank_works() call below drm_handle_vblank_events()
> * Simplify drm_vblank_work_cancel_sync()
> * Fix drm_vblank_work_cancel_sync() documentation
> * Move wake_up_all() calls out of spinlock where we can. The only one I
>   left was the call to wake_up_all() in drm_vblank_handle_works() as
>   this seemed like it made more sense just living in that function
>   (which is all technically under lock)
> * Move drm_vblank_work related functions into their own source files
> * Add drm_vblank_internal.h so we can export some functions we don't
>   want drivers using, but that we do need to use in drm_vblank_work.c
> * Add a bunch of documentation
> Changes since v4:
> * Get rid of kthread interfaces we tried adding and move all of the
>   locking into drm_vblank.c. For implementing drm_vblank_work_flush(),
>   we now use a wait_queue and sequence counters in order to
>   differentiate between multiple work item executions.
> * Get rid of drm_vblank_work_cancel() - this would have been pretty
>   difficult to actually reimplement and it occurred to me that neither
>   nouveau or i915 are even planning to use this function. Since there's
>   also no async cancel function for most of the work interfaces in the
>   kernel, it seems a bit unnecessary anyway.
> * Get rid of to_drm_vblank_work() since we now are also able to just
>   pass the struct drm_vblank_work to work item callbacks anyway
> Changes since v3:
> * Use our own spinlocks, don't integrate so tightly with kthread_works
> Changes since v2:
> * Use kthread_workers instead of reinventing the wheel.
> 
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: Tejun Heo <tj at kernel.org>
> Cc: dri-devel at lists.freedesktop.org
> Cc: nouveau at lists.freedesktop.org
> Co-developed-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Signed-off-by: Lyude Paul <lyude at redhat.com>

I found a bunch of tiny details below, but overall looks great and thanks
for polishing the kerneldoc.

With the details addressed one way or another:

Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>

But feel free to resend and poke me again if you want me to recheck the
details that needed changing.

Cheers, Daniel

> ---
>  Documentation/gpu/drm-kms.rst              |  15 ++
>  drivers/gpu/drm/Makefile                   |   2 +-
>  drivers/gpu/drm/drm_vblank.c               |  55 +++--
>  drivers/gpu/drm/drm_vblank_internal.h      |  19 ++
>  drivers/gpu/drm/drm_vblank_work.c          | 259 +++++++++++++++++++++
>  drivers/gpu/drm/drm_vblank_work_internal.h |  24 ++
>  include/drm/drm_vblank.h                   |  20 ++
>  include/drm/drm_vblank_work.h              |  71 ++++++
>  8 files changed, 447 insertions(+), 18 deletions(-)
>  create mode 100644 drivers/gpu/drm/drm_vblank_internal.h
>  create mode 100644 drivers/gpu/drm/drm_vblank_work.c
>  create mode 100644 drivers/gpu/drm/drm_vblank_work_internal.h
>  create mode 100644 include/drm/drm_vblank_work.h
> 
> diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> index 975cfeb8a3532..3c5ae4f6dfd23 100644
> --- a/Documentation/gpu/drm-kms.rst
> +++ b/Documentation/gpu/drm-kms.rst
> @@ -543,3 +543,18 @@ Vertical Blanking and Interrupt Handling Functions Reference
>  
>  .. kernel-doc:: drivers/gpu/drm/drm_vblank.c
>     :export:
> +
> +Vertical Blank Work
> +===================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_vblank_work.c
> +   :doc: vblank works
> +
> +Vertical Blank Work Functions Reference
> +---------------------------------------
> +
> +.. kernel-doc:: include/drm/drm_vblank_work.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_vblank_work.c
> +   :export:
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 2c0e5a7e59536..02ee5faf1a925 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -18,7 +18,7 @@ drm-y       :=	drm_auth.o drm_cache.o \
>  		drm_dumb_buffers.o drm_mode_config.o drm_vblank.o \
>  		drm_syncobj.o drm_lease.o drm_writeback.o drm_client.o \
>  		drm_client_modeset.o drm_atomic_uapi.o drm_hdcp.o \
> -		drm_managed.o
> +		drm_managed.o drm_vblank_work.o
>  
>  drm-$(CONFIG_DRM_LEGACY) += drm_legacy_misc.o drm_bufs.o drm_context.o drm_dma.o drm_scatter.o drm_lock.o
>  drm-$(CONFIG_DRM_LIB_RANDOM) += lib/drm_random.o
> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> index e895f5331fdb4..b353bc8328414 100644
> --- a/drivers/gpu/drm/drm_vblank.c
> +++ b/drivers/gpu/drm/drm_vblank.c
> @@ -25,6 +25,7 @@
>   */
>  
>  #include <linux/export.h>
> +#include <linux/kthread.h>
>  #include <linux/moduleparam.h>
>  
>  #include <drm/drm_crtc.h>
> @@ -37,6 +38,8 @@
>  
>  #include "drm_internal.h"
>  #include "drm_trace.h"
> +#include "drm_vblank_internal.h"
> +#include "drm_vblank_work_internal.h"

Feels mild overkill to have these files with 1-2 functions each, I'd stuff
them all into drm_internal.h. We do have other vblank stuff in there
already.

>  
>  /**
>   * DOC: vblank handling
> @@ -363,7 +366,7 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  	store_vblank(dev, pipe, diff, t_vblank, cur_vblank);
>  }
>  
> -static u64 drm_vblank_count(struct drm_device *dev, unsigned int pipe)
> +u64 drm_vblank_count(struct drm_device *dev, unsigned int pipe)
>  {
>  	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>  	u64 count;
> @@ -497,6 +500,7 @@ static void drm_vblank_init_release(struct drm_device *dev, void *ptr)
>  	drm_WARN_ON(dev, READ_ONCE(vblank->enabled) &&
>  		    drm_core_check_feature(dev, DRIVER_MODESET));
>  
> +	drm_vblank_destroy_worker(vblank);
>  	del_timer_sync(&vblank->disable_timer);
>  }
>  
> @@ -539,6 +543,10 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs)
>  					       vblank);
>  		if (ret)
>  			return ret;
> +
> +		ret = drm_vblank_worker_init(vblank);
> +		if (ret)
> +			return ret;
>  	}
>  
>  	return 0;
> @@ -1135,7 +1143,7 @@ static int drm_vblank_enable(struct drm_device *dev, unsigned int pipe)
>  	return ret;
>  }
>  
> -static int drm_vblank_get(struct drm_device *dev, unsigned int pipe)
> +int drm_vblank_get(struct drm_device *dev, unsigned int pipe)
>  {
>  	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>  	unsigned long irqflags;
> @@ -1178,7 +1186,7 @@ int drm_crtc_vblank_get(struct drm_crtc *crtc)
>  }
>  EXPORT_SYMBOL(drm_crtc_vblank_get);
>  
> -static void drm_vblank_put(struct drm_device *dev, unsigned int pipe)
> +void drm_vblank_put(struct drm_device *dev, unsigned int pipe)
>  {
>  	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>  
> @@ -1281,13 +1289,16 @@ void drm_crtc_vblank_off(struct drm_crtc *crtc)
>  	unsigned int pipe = drm_crtc_index(crtc);
>  	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>  	struct drm_pending_vblank_event *e, *t;
> -
>  	ktime_t now;
>  	u64 seq;
>  
>  	if (drm_WARN_ON(dev, pipe >= dev->num_crtcs))
>  		return;
>  
> +	/*
> +	 * Grab event_lock early to prevent vblank work from being scheduled
> +	 * while we're in the middle of shutting down vblank interrupts
> +	 */
>  	spin_lock_irq(&dev->event_lock);
>  
>  	spin_lock(&dev->vbl_lock);
> @@ -1324,11 +1335,18 @@ void drm_crtc_vblank_off(struct drm_crtc *crtc)
>  		drm_vblank_put(dev, pipe);
>  		send_vblank_event(dev, e, seq, now);
>  	}
> +
> +	/* Cancel any leftover pending vblank work */
> +	drm_vblank_cancel_pending_works(vblank);
> +
>  	spin_unlock_irq(&dev->event_lock);
>  
>  	/* Will be reset by the modeset helpers when re-enabling the crtc by
>  	 * calling drm_calc_timestamping_constants(). */
>  	vblank->hwmode.crtc_clock = 0;
> +
> +	/* Wait for any vblank work that's still executing to finish */
> +	drm_vblank_flush_worker(vblank);
>  }
>  EXPORT_SYMBOL(drm_crtc_vblank_off);
>  
> @@ -1363,6 +1381,7 @@ void drm_crtc_vblank_reset(struct drm_crtc *crtc)
>  	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
>  
>  	drm_WARN_ON(dev, !list_empty(&dev->vblank_event_list));
> +	drm_WARN_ON(dev, !list_empty(&vblank->pending_work));
>  }
>  EXPORT_SYMBOL(drm_crtc_vblank_reset);
>  
> @@ -1417,7 +1436,10 @@ void drm_crtc_vblank_on(struct drm_crtc *crtc)
>  	if (drm_WARN_ON(dev, pipe >= dev->num_crtcs))
>  		return;
>  
> -	spin_lock_irqsave(&dev->vbl_lock, irqflags);
> +	/* So vblank works can't be scheduled until we've finished */
> +	spin_lock_irqsave(&dev->event_lock, irqflags);

This smells fishy, why do we need this? drm_enable_vblank takes the
->vblank_time_lock spinlock, which is the first thing drm_handle_vblank
takes, so there's absolute no way for a vblank event or worker to get
ahead of this here.

Except if I'm missing something this isn't needed.

> +
> +	spin_lock(&dev->vbl_lock);
>  	drm_dbg_vbl(dev, "crtc %d, vblank enabled %d, inmodeset %d\n",
>  		    pipe, vblank->enabled, vblank->inmodeset);
>  
> @@ -1435,7 +1457,9 @@ void drm_crtc_vblank_on(struct drm_crtc *crtc)
>  	 */
>  	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0)
>  		drm_WARN_ON(dev, drm_vblank_enable(dev, pipe));
> -	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
> +
> +	spin_unlock(&dev->vbl_lock);
> +	spin_unlock_irqrestore(&dev->event_lock, irqflags);
>  }
>  EXPORT_SYMBOL(drm_crtc_vblank_on);
>  
> @@ -1589,11 +1613,6 @@ int drm_legacy_modeset_ctl_ioctl(struct drm_device *dev, void *data,
>  	return 0;
>  }
>  
> -static inline bool vblank_passed(u64 seq, u64 ref)
> -{
> -	return (seq - ref) <= (1 << 23);
> -}
> -
>  static int drm_queue_vblank_event(struct drm_device *dev, unsigned int pipe,
>  				  u64 req_seq,
>  				  union drm_wait_vblank *vblwait,
> @@ -1650,7 +1669,7 @@ static int drm_queue_vblank_event(struct drm_device *dev, unsigned int pipe,
>  	trace_drm_vblank_event_queued(file_priv, pipe, req_seq);
>  
>  	e->sequence = req_seq;
> -	if (vblank_passed(seq, req_seq)) {
> +	if (drm_vblank_passed(seq, req_seq)) {
>  		drm_vblank_put(dev, pipe);
>  		send_vblank_event(dev, e, seq, now);
>  		vblwait->reply.sequence = seq;
> @@ -1805,7 +1824,7 @@ int drm_wait_vblank_ioctl(struct drm_device *dev, void *data,
>  	}
>  
>  	if ((flags & _DRM_VBLANK_NEXTONMISS) &&
> -	    vblank_passed(seq, req_seq)) {
> +	    drm_vblank_passed(seq, req_seq)) {
>  		req_seq = seq + 1;
>  		vblwait->request.type &= ~_DRM_VBLANK_NEXTONMISS;
>  		vblwait->request.sequence = req_seq;
> @@ -1824,7 +1843,7 @@ int drm_wait_vblank_ioctl(struct drm_device *dev, void *data,
>  		drm_dbg_core(dev, "waiting on vblank count %llu, crtc %u\n",
>  			     req_seq, pipe);
>  		wait = wait_event_interruptible_timeout(vblank->queue,
> -			vblank_passed(drm_vblank_count(dev, pipe), req_seq) ||
> +			drm_vblank_passed(drm_vblank_count(dev, pipe), req_seq) ||
>  				      !READ_ONCE(vblank->enabled),
>  			msecs_to_jiffies(3000));
>  
> @@ -1873,7 +1892,7 @@ static void drm_handle_vblank_events(struct drm_device *dev, unsigned int pipe)
>  	list_for_each_entry_safe(e, t, &dev->vblank_event_list, base.link) {
>  		if (e->pipe != pipe)
>  			continue;
> -		if (!vblank_passed(seq, e->sequence))
> +		if (!drm_vblank_passed(seq, e->sequence))
>  			continue;
>  
>  		drm_dbg_core(dev, "vblank event on %llu, current %llu\n",
> @@ -1943,6 +1962,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
>  		       !atomic_read(&vblank->refcount));
>  
>  	drm_handle_vblank_events(dev, pipe);
> +	drm_handle_vblank_works(vblank);
>  
>  	spin_unlock_irqrestore(&dev->event_lock, irqflags);
>  
> @@ -2096,7 +2116,7 @@ int drm_crtc_queue_sequence_ioctl(struct drm_device *dev, void *data,
>  	if (flags & DRM_CRTC_SEQUENCE_RELATIVE)
>  		req_seq += seq;
>  
> -	if ((flags & DRM_CRTC_SEQUENCE_NEXT_ON_MISS) && vblank_passed(seq, req_seq))
> +	if ((flags & DRM_CRTC_SEQUENCE_NEXT_ON_MISS) && drm_vblank_passed(seq, req_seq))
>  		req_seq = seq + 1;
>  
>  	e->pipe = pipe;
> @@ -2125,7 +2145,7 @@ int drm_crtc_queue_sequence_ioctl(struct drm_device *dev, void *data,
>  
>  	e->sequence = req_seq;
>  
> -	if (vblank_passed(seq, req_seq)) {
> +	if (drm_vblank_passed(seq, req_seq)) {
>  		drm_crtc_vblank_put(crtc);
>  		send_vblank_event(dev, e, seq, now);
>  		queue_seq->sequence = seq;
> @@ -2145,3 +2165,4 @@ int drm_crtc_queue_sequence_ioctl(struct drm_device *dev, void *data,
>  	kfree(e);
>  	return ret;
>  }
> +
> diff --git a/drivers/gpu/drm/drm_vblank_internal.h b/drivers/gpu/drm/drm_vblank_internal.h
> new file mode 100644
> index 0000000000000..217ae5442ddce
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_vblank_internal.h
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: MIT
> +
> +#ifndef DRM_VBLANK_INTERNAL_H
> +#define DRM_VBLANK_INTERNAL_H
> +
> +#include <linux/types.h>
> +
> +#include <drm/drm_device.h>
> +
> +static inline bool drm_vblank_passed(u64 seq, u64 ref)
> +{
> +	return (seq - ref) <= (1 << 23);
> +}
> +
> +int drm_vblank_get(struct drm_device *dev, unsigned int pipe);
> +void drm_vblank_put(struct drm_device *dev, unsigned int pipe);
> +u64 drm_vblank_count(struct drm_device *dev, unsigned int pipe);
> +
> +#endif /* !DRM_VBLANK_INTERNAL_H */
> diff --git a/drivers/gpu/drm/drm_vblank_work.c b/drivers/gpu/drm/drm_vblank_work.c
> new file mode 100644
> index 0000000000000..0762ad34cdcc0
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_vblank_work.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: MIT
> +
> +#include <uapi/linux/sched/types.h>
> +
> +#include <drm/drm_print.h>
> +#include <drm/drm_vblank.h>
> +#include <drm/drm_vblank_work.h>
> +#include <drm/drm_crtc.h>
> +
> +#include "drm_vblank_internal.h"
> +#include "drm_vblank_work_internal.h"
> +
> +/**
> + * DOC: vblank works
> + *
> + * Many DRM drivers need to program hardware in a time-sensitive manner, many
> + * times with a deadline of starting and finishing within a certain region of
> + * the scanout. Most of the time the safest way to accomplish this is to
> + * simply do said time-sensitive programming in the driver's IRQ handler,
> + * which allows drivers to avoid being preempted during these critical
> + * regions. Or even better, the hardware may even handle applying such
> + * time-critical programming independently of the CPU.
> + *
> + * While there's a decent amount of hardware that's designed so that the CPU
> + * doesn't need to be concerned with extremely time-sensitive programming,
> + * there's a few situations where it can't be helped. Some unforgiving
> + * hardware may require that certain time-sensitive programming be handled
> + * completely by the CPU, and said programming may even take too long to
> + * handle in an IRQ handler. Another such situation would be where the driver
> + * needs to perform a task that needs to complete within a specific scanout
> + * period, but might possibly block and thus cannot be handled in an IRQ
> + * context. Both of these situations can't be solved perfectly in Linux since
> + * we're not a realtime kernel, and thus the scheduler may cause us to miss
> + * our deadline if it decides to preempt us. But for some drivers, it's good
> + * enough if we can lower our chance of being preempted to an absolute
> + * minimum.
> + *
> + * This is where &drm_vblank_work comes in. &drm_vblank_work provides a simple
> + * generic delayed work implementation which delays work execution until a
> + * particular vblank has passed, and then executes the work at realtime
> + * priority. This provides the best possible chance at performing
> + * time-sensitive hardware programming on time, even when the system is under
> + * heavy load. &drm_vblank_work also supports rescheduling, so that self
> + * re-arming work items can be easily implemented.
> + */
> +
> +void drm_handle_vblank_works(struct drm_vblank_crtc *vblank)
> +{
> +	struct drm_vblank_work *work, *next;
> +	u64 count = atomic64_read(&vblank->count);
> +	bool wake = false;
> +
> +	assert_spin_locked(&vblank->dev->event_lock);
> +
> +	list_for_each_entry_safe(work, next, &vblank->pending_work, node) {
> +		if (!drm_vblank_passed(count, work->count))
> +			continue;
> +
> +		list_del_init(&work->node);
> +		drm_vblank_put(vblank->dev, vblank->pipe);
> +		kthread_queue_work(vblank->worker, &work->base);
> +		wake = true;
> +	}
> +	if (wake)
> +		wake_up_all(&vblank->work_wait_queue);
> +}
> +
> +/* Handle cancelling any pending vblank work items and drop respective vblank
> + * references in response to vblank interrupts being disabled.
> + */
> +void drm_vblank_cancel_pending_works(struct drm_vblank_crtc *vblank)
> +{
> +	struct drm_vblank_work *work, *next;
> +
> +	assert_spin_locked(&vblank->dev->event_lock);
> +
> +	list_for_each_entry_safe(work, next, &vblank->pending_work, node) {
> +		list_del_init(&work->node);
> +		drm_vblank_put(vblank->dev, vblank->pipe);
> +	}
> +
> +	wake_up_all(&vblank->work_wait_queue);
> +}
> +
> +/**
> + * drm_vblank_work_schedule - schedule a vblank work
> + * @work: vblank work to schedule
> + * @count: target vblank count
> + * @nextonmiss: defer until the next vblank if target vblank was missed
> + *
> + * Schedule @work for execution once the crtc vblank count reaches @count.
> + *
> + * If the crtc vblank count has already reached @count and @nextonmiss is
> + * %false the work starts to execute immediately.
> + *
> + * If the crtc vblank count has already reached @count and @nextonmiss is
> + * %true the work is deferred until the next vblank (as if @count has been
> + * specified as crtc vblank count + 1).
> + *
> + * If @work is already scheduled, this function will reschedule said work
> + * using the new @count.

Maybe clarify here that "This can be use for self-rearming work items." or
something like that.

> + *
> + * Returns:
> + * 0 on success, error code on failure.
> + */
> +int drm_vblank_work_schedule(struct drm_vblank_work *work,
> +			     u64 count, bool nextonmiss)
> +{
> +	struct drm_vblank_crtc *vblank = work->vblank;
> +	struct drm_device *dev = vblank->dev;
> +	u64 cur_vbl;
> +	unsigned long irqflags;
> +	bool passed, rescheduling = false, wake = false;
> +	int ret = 0;
> +
> +	spin_lock_irqsave(&dev->event_lock, irqflags);
> +	if (!vblank->worker || vblank->inmodeset || work->cancelling)

Oh nice catch with ->inmodeset, I totally missed to check re-arming vs
drm_crtc_vblank_off races. Only problem I'm seeing is that we're holding
the wrong spinlock, this needs to be check under ->vbl_lock. But
->cancelling needs the event_lock, so I think you need to split this check
into two, and grab the ->vbl_lock around the ->inmodeset check.

The ->worker check otoh looks fishy, that should never happen. If you feel
like some defensive programming then I think that should be an

	if (WARN_ON(!vblank->worker))
		return;

> +		goto out;
> +
> +	if (list_empty(&work->node)) {
> +		ret = drm_vblank_get(dev, vblank->pipe);

Ok that kills the idea of converting the _irqsave to _irq in
drm_vblank_get. I do wonder whether it wouldn't be nicer to have the
vblank_get outside of the spinlock, and unconditional - would allow you to
drop the ->inmodeset check. But the end result in code flow cleanliness is
not any better, so not a good idea I think.

> +		if (ret < 0)
> +			goto out;
> +	} else if (work->count == count) {
> +		/* Already scheduled w/ same vbl count */
> +		goto out;
> +	} else {
> +		rescheduling = true;
> +	}
> +
> +	work->count = count;
> +	cur_vbl = drm_vblank_count(dev, vblank->pipe);
> +	passed = drm_vblank_passed(cur_vbl, count);
> +	if (passed)
> +		DRM_DEV_ERROR(dev->dev,
> +			      "crtc %d vblank %llu already passed (current %llu)\n",
> +			      vblank->pipe, count, cur_vbl);

This is a bit loud, I think that should be debug out most. You can't
really prevent races. I do wonder though whether we should do something
like 1 indicates that the work item has been scheduled, and 0 that it
hasn't been scheduled (aside from failure, which is negative).

> +
> +	if (!nextonmiss && passed) {
> +		drm_vblank_put(dev, vblank->pipe);
> +		kthread_queue_work(vblank->worker, &work->base);
> +
> +		if (rescheduling) {
> +			list_del_init(&work->node);
> +			wake = true;
> +		}
> +	} else if (!rescheduling) {
> +		list_add_tail(&work->node, &vblank->pending_work);
> +	}
> +
> +out:
> +	spin_unlock_irqrestore(&dev->event_lock, irqflags);
> +	if (wake)
> +		wake_up_all(&vblank->work_wait_queue);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_vblank_work_schedule);

I think the above control flow is all correct, but this is the kind of
stuff that's prime material for some selftests. But we don't have enough
ready-made mocking I think, so not going to ask for that. Just an idea.

> +
> +/**
> + * drm_vblank_work_cancel_sync - cancel a vblank work and wait for it to
> + * finish executing
> + * @work: vblank work to cancel
> + *
> + * Cancel an already scheduled vblank work and wait for its
> + * execution to finish.
> + *
> + * On return, @work is guaranteed to no longer be scheduled or running, even
> + * if it's self-arming.
> + *
> + * Returns:
> + * %True if the work was cancelled before it started to execute, %false
> + * otherwise.
> + */
> +bool drm_vblank_work_cancel_sync(struct drm_vblank_work *work)
> +{
> +	struct drm_vblank_crtc *vblank = work->vblank;
> +	struct drm_device *dev = vblank->dev;
> +	bool ret = false;
> +
> +	spin_lock_irq(&dev->event_lock);
> +	if (!list_empty(&work->node)) {
> +		list_del_init(&work->node);
> +		drm_vblank_put(vblank->dev, vblank->pipe);
> +		ret = true;
> +	}
> +
> +	work->cancelling++;
> +	spin_unlock_irq(&dev->event_lock);
> +
> +	wake_up_all(&vblank->work_wait_queue);
> +
> +	if (kthread_cancel_work_sync(&work->base))
> +		ret = true;
> +
> +	spin_lock_irq(&dev->event_lock);
> +	work->cancelling--;
> +	spin_unlock_irq(&dev->event_lock);

lgtm, everything looks ordered correctly to avoid a self-arming work
escaping.

> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_vblank_work_cancel_sync);
> +
> +/**
> + * drm_vblank_work_flush - wait for a scheduled vblank work to finish
> + * executing
> + * @work: vblank work to flush
> + *
> + * Wait until @work has finished executing once.
> + */
> +void drm_vblank_work_flush(struct drm_vblank_work *work)
> +{
> +	struct drm_vblank_crtc *vblank = work->vblank;
> +	struct drm_device *dev = vblank->dev;
> +
> +	spin_lock_irq(&dev->event_lock);
> +	wait_event_lock_irq(vblank->work_wait_queue, list_empty(&work->node),
> +			    dev->event_lock);
> +	spin_unlock_irq(&dev->event_lock);
> +
> +	kthread_flush_work(&work->base);

So much less magic here, I like.

> +}
> +EXPORT_SYMBOL(drm_vblank_work_flush);
> +
> +/**
> + * drm_vblank_work_init - initialize a vblank work item
> + * @work: vblank work item
> + * @crtc: CRTC whose vblank will trigger the work execution
> + * @func: work function to be executed
> + *
> + * Initialize a vblank work item for a specific crtc.
> + */
> +void drm_vblank_work_init(struct drm_vblank_work *work, struct drm_crtc *crtc,
> +			  void (*func)(struct kthread_work *work))
> +{
> +	kthread_init_work(&work->base, func);
> +	INIT_LIST_HEAD(&work->node);
> +	work->vblank = &crtc->dev->vblank[drm_crtc_index(crtc)];
> +}
> +EXPORT_SYMBOL(drm_vblank_work_init);
> +
> +int drm_vblank_worker_init(struct drm_vblank_crtc *vblank)
> +{
> +	struct sched_param param = {
> +		.sched_priority = MAX_RT_PRIO - 1,
> +	};
> +	struct kthread_worker *worker;
> +
> +	INIT_LIST_HEAD(&vblank->pending_work);
> +	init_waitqueue_head(&vblank->work_wait_queue);
> +	worker = kthread_create_worker(0, "card%d-crtc%d",
> +				       vblank->dev->primary->index,
> +				       vblank->pipe);
> +	if (IS_ERR(worker))
> +		return PTR_ERR(worker);
> +
> +	vblank->worker = worker;
> +
> +	return sched_setscheduler(vblank->worker->task, SCHED_FIFO, &param);
> +}
> diff --git a/drivers/gpu/drm/drm_vblank_work_internal.h b/drivers/gpu/drm/drm_vblank_work_internal.h
> new file mode 100644
> index 0000000000000..0a4abbc4ab295
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_vblank_work_internal.h
> @@ -0,0 +1,24 @@
> +// SPDX-License-Identifier: MIT
> +
> +#ifndef _DRM_VBLANK_WORK_INTERNAL_H_
> +#define _DRM_VBLANK_WORK_INTERNAL_H_
> +
> +#include <drm/drm_vblank.h>
> +
> +int drm_vblank_worker_init(struct drm_vblank_crtc *vblank);
> +void drm_vblank_cancel_pending_works(struct drm_vblank_crtc *vblank);
> +void drm_handle_vblank_works(struct drm_vblank_crtc *vblank);
> +
> +static inline void drm_vblank_flush_worker(struct drm_vblank_crtc *vblank)
> +{
> +	if (vblank->worker)

Is this check really required? We should always have a worker I thought?

> +		kthread_flush_worker(vblank->worker);
> +}
> +
> +static inline void drm_vblank_destroy_worker(struct drm_vblank_crtc *vblank)
> +{
> +	if (vblank->worker)

Same here.

> +		kthread_destroy_worker(vblank->worker);
> +}
> +
> +#endif /* !_DRM_VBLANK_WORK_INTERNAL_H_ */
> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> index dd9f5b9e56e4e..dd125f8c766cf 100644
> --- a/include/drm/drm_vblank.h
> +++ b/include/drm/drm_vblank.h
> @@ -27,12 +27,14 @@
>  #include <linux/seqlock.h>
>  #include <linux/idr.h>
>  #include <linux/poll.h>
> +#include <linux/kthread.h>
>  
>  #include <drm/drm_file.h>
>  #include <drm/drm_modes.h>
>  
>  struct drm_device;
>  struct drm_crtc;
> +struct drm_vblank_work;
>  
>  /**
>   * struct drm_pending_vblank_event - pending vblank event tracking
> @@ -203,6 +205,24 @@ struct drm_vblank_crtc {
>  	 * disabling functions multiple times.
>  	 */
>  	bool enabled;
> +
> +	/**
> +	 * @worker: The &kthread_worker used for executing vblank works.
> +	 */
> +	struct kthread_worker *worker;
> +
> +	/**
> +	 * @pending_work: A list of scheduled &drm_vblank_work items that are
> +	 * waiting for a future vblank.
> +	 */
> +	struct list_head pending_work;
> +
> +	/**
> +	 * @work_wait_queue: The wait queue used for signaling that a
> +	 * &drm_vblank_work item has either finished executing, or was
> +	 * cancelled.
> +	 */
> +	wait_queue_head_t work_wait_queue;
>  };
>  
>  int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> diff --git a/include/drm/drm_vblank_work.h b/include/drm/drm_vblank_work.h
> new file mode 100644
> index 0000000000000..f0439c039f7ce
> --- /dev/null
> +++ b/include/drm/drm_vblank_work.h
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: MIT
> +
> +#ifndef _DRM_VBLANK_WORK_H_
> +#define _DRM_VBLANK_WORK_H_
> +
> +#include <linux/kthread.h>
> +
> +struct drm_crtc;
> +
> +/**
> + * struct drm_vblank_work - A delayed work item which delays until a target
> + * vblank passes, and then executes at realtime priority outside of IRQ
> + * context.
> + *
> + * See also:
> + * drm_vblank_work_schedule()
> + * drm_vblank_work_init()
> + * drm_vblank_work_cancel_sync()
> + * drm_vblank_work_flush()
> + */
> +struct drm_vblank_work {
> +	/**
> +	 * @base: The base &kthread_work item which will be executed by
> +	 * &drm_vblank_crtc.worker. Drivers should not interact with this
> +	 * directly, and instead rely on drm_vblank_work_init() to initialize
> +	 * this.
> +	 */
> +	struct kthread_work base;
> +
> +	/**
> +	 * @vblank: A pointer to &drm_vblank_crtc this work item belongs to.
> +	 */
> +	struct drm_vblank_crtc *vblank;
> +
> +	/**
> +	 * @count: The target vblank this work will execute on. Drivers should
> +	 * not modify this value directly, and instead use
> +	 * drm_vblank_work_schedule()
> +	 */
> +	u64 count;
> +
> +	/**
> +	 * @cancelling: The number of drm_vblank_work_cancel_sync() calls that
> +	 * are currently running. A work item cannot be rescheduled until all
> +	 * calls have finished.
> +	 */
> +	int cancelling;
> +
> +	/**
> +	 * @node: The position of this work item in
> +	 * &drm_vblank_crtc.pending_work.
> +	 */
> +	struct list_head node;
> +};
> +
> +/**
> + * to_drm_vblank_work - Retrieve the respective &drm_vblank_work item from a
> + * &kthread_work
> + * @_work: The &kthread_work embedded inside a &drm_vblank_work
> + */
> +#define to_drm_vblank_work(_work) \
> +	container_of((_work), struct drm_vblank_work, base)
> +
> +int drm_vblank_work_schedule(struct drm_vblank_work *work,
> +			     u64 count, bool nextonmiss);
> +void drm_vblank_work_init(struct drm_vblank_work *work, struct drm_crtc *crtc,
> +			  void (*func)(struct kthread_work *work));
> +bool drm_vblank_work_cancel_sync(struct drm_vblank_work *work);
> +void drm_vblank_work_flush(struct drm_vblank_work *work);
> +
> +#endif /* !_DRM_VBLANK_WORK_H_ */
> -- 
> 2.26.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch