[Intel-gfx] [PATCH 1/2] drm/i915: Schedule the HPD poll init work on an unbound workqueue

Mon Oct 28 14:59:20 UTC 2024

On Mon, Oct 28, 2024 at 04:49:21PM +0200, Jani Nikula wrote:
> On Mon, 28 Oct 2024, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
> > +Jani
> >
> > On Fri, Sep 01, 2023 at 05:04:02PM +0300, Imre Deak wrote:
> >>Disabling HPD polling from i915_hpd_poll_init_work() involves probing
> >>all display connectors explicitly to account for lost hotplug
> >>interrupts. On some platforms (mostly pre-ICL) with HDMI connectors the
> >>I2C EDID bit-banging using udelay() triggers in turn the
> >>
> >> workqueue: i915_hpd_poll_init_work [i915] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
> >>
> >>warning.
> >>
> >>Fix the above by scheduling i915_hpd_poll_init_work() on a WQ_UNBOUND
> >>workqueue. It's ok to use a system WQ, since i915_hpd_poll_init_work()
> >>is properly flushed in intel_hpd_cancel_work().
> >>
> >>The connector probing from drm_mode_config::output_poll_work resulting
> >>in the same warning is fixed by the next patch.
> >>
> >>Cc: Tejun Heo <tj at kernel.org>
> >>Cc: Heiner Kallweit <hkallweit1 at gmail.com>
> >>CC: stable at vger.kernel.org # 6.5
> >>Suggested-by: Tejun Heo <tj at kernel.org>
> >>Suggested-by: Heiner Kallweit <hkallweit1 at gmail.com>
> >>Reported-by: Heiner Kallweit <hkallweit1 at gmail.com>
> >>Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9245
> >>Link: https://lore.kernel.org/all/f7e21caa-e98d-e5b5-932a-fe12d27fde9b@gmail.com
> >>Signed-off-by: Imre Deak <imre.deak at intel.com>
> >>---
> >> drivers/gpu/drm/i915/display/intel_hotplug.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/display/intel_hotplug.c b/drivers/gpu/drm/i915/display/intel_hotplug.c
> >>index e8562f6f8bb44..accc2fec562a0 100644
> >>--- a/drivers/gpu/drm/i915/display/intel_hotplug.c
> >>+++ b/drivers/gpu/drm/i915/display/intel_hotplug.c
> >>@@ -774,7 +774,7 @@ void intel_hpd_poll_enable(struct drm_i915_private *dev_priv)
> >> 	 * As well, there's no issue if we race here since we always reschedule
> >> 	 * this worker anyway
> >> 	 */
> >>-	queue_work(dev_priv->unordered_wq,
> >>+	queue_work(system_unbound_wq,
> >> 		   &dev_priv->display.hotplug.poll_init_work);
> >> }
> >>
> >>@@ -803,7 +803,7 @@ void intel_hpd_poll_disable(struct drm_i915_private *dev_priv)
> >> 		return;
> >>
> >> 	WRITE_ONCE(dev_priv->display.hotplug.poll_enabled, false);
> >>-	queue_work(dev_priv->unordered_wq,
> >>+	queue_work(system_unbound_wq,
> >
> > This 1y+ patch doesn't apply anymore and now we also have xe to account
> > for.  I'm reviving this since we are unifying the kernel config in CI
> > and now several machines testing i915 start to hit this warning.
> >
> > Looking at the current code for xe we have:
> >
> > 	drivers/gpu/drm/xe/xe_device_types.h:
> >
> >          /** @unordered_wq: used to serialize unordered work, mostly display */
> >          struct workqueue_struct *unordered_wq;
> >
> >
> > ... which is, actually, just display.
> >
> > Jani, should we move this wq to display where it belongs, with the right
> > flags, rather than queueing it on system_unbound_wq?
> 
> I think the general answer is:
> 
> 1. Never use i915 or xe core workqueues in display code.
> 
> 2. Use system workqueues where appropriate for display, and cancel the
>    individual work where needed. While there are legitimate uses for the
>    dedicated workqueues, I'm guessing a large portion of their use may
>    be cargo-culting and/or the result of system wq flush going away, and
>    not based on any real analysis.
> 
> 3. For display stuff, add the minimal necessary workqueues in display
>    code, and use them where appropriate. Also cancel the individual work
>    where needed, and flush the entire workqueue only when really
>    necessary. The entire workqueue flushing is also cargo-culting and/or
>    the result of system wq flush going away.
> 
> 4. Finally move the display wq init/cleanup to display code.
> 
> Now, I don't know what the answer in this particular case, or many of
> the other cases, is. And that's probably one of the reasons we have the
> status quo. :(

The work scheduled for hotplug events (like the above poll_init_work)
are flushed explicitly, so I think the system workqueues should be used
for those (as opposed to driver specific workqueues).

> BR,
> Jani.
> 
> 
> -- 
> Jani Nikula, Intel