[PATCH] drm/probe-helper: fix output polling not resuming after HPD IRQ storm
Dmitry Baryshkov
dmitry.baryshkov at oss.qualcomm.com
Wed Aug 6 12:33:51 UTC 2025
On Wed, Aug 06, 2025 at 12:02:02PM +0000, nicusor.huhulea at siemens.com wrote:
>
>
> >>-----Original Message-----
> >>From: Imre Deak <imre.deak at intel.com>
> >>Sent: Tuesday, August 5, 2025 6:20 PM
> >>To: Dmitry Baryshkov <dmitry.baryshkov at oss.qualcomm.com>
> >>Cc: Huhulea, Nicusor Liviu (FT FDS CES LX PBU 1)
> >><nicusor.huhulea at siemens.com>; stable at vger.kernel.org; dri-
> >>devel at lists.freedesktop.org; intel-gfx at lists.freedesktop.org; cip-dev at lists.cip-
> >>project.org; shradhagupta at linux.microsoft.com; jouni.hogander at intel.com;
> >>neil.armstrong at linaro.org; jani.nikula at linux.intel.com;
> >>maarten.lankhorst at linux.intel.com; mripard at kernel.org; tzimmermann at suse.de;
> >>airlied at gmail.com; daniel at ffwll.ch; joonas.lahtinen at linux.intel.com;
> >>rodrigo.vivi at intel.com; laurentiu.palcu at oss.nxp.com; Hombourger, Cedric (FT
> >>FDS CES LX) <cedric.hombourger at siemens.com>; Bobade, Shrikant Krishnarao
> >>(FT FDS CES LX PBU 1) <shrikant.bobade at siemens.com>
> >>Subject: Re: [PATCH] drm/probe-helper: fix output polling not resuming after HPD
> >>IRQ storm
> >>
> >>On Tue, Aug 05, 2025 at 01:46:51PM +0300, Dmitry Baryshkov wrote:
> >>> On Mon, Aug 04, 2025 at 11:13:59PM +0300, Nicusor Huhulea wrote:
> >>> > A regression in output polling was introduced by commit
> >>> > 4ad8d57d902fbc7c82507cfc1b031f3a07c3de6e
> >>> > ("drm: Check output polling initialized before disabling") in the 6.1.y stable
> >>tree.
> >>> > As a result, when the i915 driver detects an HPD IRQ storm and
> >>> > attempts to switch from IRQ-based hotplug detection to polling, output polling
> >>fails to resume.
> >>> >
> >>> > The root cause is the use of dev->mode_config.poll_running. Once
> >>> > poll_running is set (during the first connector detection) the calls
> >>> > to drm_kms_helper_poll_enable(), such as
> >>> > intel_hpd_irq_storm_switch_to_polling() fails to schedule output_poll_work as
> >>expected.
> >>> > Therefore, after an IRQ storm disables HPD IRQs, polling does not start,
> >>breaking hotplug detection.
> >>>
> >>> Why doesn't disable path use drm_kms_helper_poll_disable() ?
> >>
> >>In general i915 doesn't disable polling as a whole after an IRQ storm and a 2
> >>minute delay (or runtime resuming), since on some other connectors the polling
> >>may be still required.
> >>
> >>Also, in the 6.1.y stable tree drm_kms_helper_poll_disable() is:
> >>
> >> if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled))
> >> return;
> >>
> >> cancel_delayed_work_sync(&dev->mode_config.output_poll_work);
> >>
> >>so calling that wouldn't really fix the problem, which is clearly just an incorrect
> >>backport of the upstream commit 5abffb66d12bcac8 ("drm:
> >>Check output polling initialized before disabling") to the v6.1.y stable tree by
> >>commit 4ad8d57d902f ("drm: Check output polling initialized before disabling") in
> >>v6.1.y.
> >>
> >>The upstream commit did not add the check for
> >>dev->mode_config.poll_running in drm_kms_helper_poll_enable(), the
> >>condition was only part of the diff context in the commit. Hence adding the
> >>condition in the 4ad8d57d902f backport commit was incorrect.
> >>
> >>> > The fix is to remove the dev->mode_config.poll_running in the check
> >>> > condition, ensuring polling is always scheduled as requested.
> I'm not a full-time kernel developer, but I spent some time trying to really understand both the rationale and the effects of commit "Check output polling initialized before disabling."
> Here's how I see the issue:
> That commit was introduced mainly as a defensive measure, to protect
> drivers such as hyperv-drm that do not initialize connector polling.
> In those drivers, calling drm_kms_helper_poll_disable() without proper
> polling setup could trigger warnings or even stack traces, since there
> are no outputs to poll and the polling helpers don't apply. From what
> I understand, the poll_running variable was added to prevent cases
> where polling was accidentally disabled for drivers that were never
> set up for it. However, this fix ended up creating a new class of
> breakage, which I have observed and describe here.
It was added to implement the very simple logic: If something isn't
initialized, enabling or disabling it is an error. If something isn't
enabled, disabling it is an error. If something isn't disabled, enabling
it is an error.
>
> To me, the core logic should be simple: if polling is needed, then we should allow it to proceed (regardless of whether it's been scheduled previously).
>
> Looking at the drm_kms_helper_poll_disable()
> if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled))
> return;
>
> If the driver never enabled polling (that is, drm_kms_helper_poll_enable() was never called), then the disable operation is effectively a no-op-totally safe for drivers like hyperv-drm.
>
> And in the enable function drm_kms_helper_poll_enable(..):
> if (drm_WARN_ON_ONCE(dev, !dev->mode_config.poll_enabled) ||
> - !drm_kms_helper_poll || dev->mode_config.poll_running)
> + !drm_kms_helper_poll)
> return;
Why?
> The main thing being guarded here is whether polling has actually been initialized:
> 1.For polling drivers like i915, removing poll_running from the enable path is both safe and necessary: it fixes the regression with HPD polling after IRQ storms
I believe in paired calls. If you want to use
drm_kms_helper_poll_enable(), it should be previously
drm_kms_helper_poll_disable()d. If you have manually disabled the IRQ,
it should also be manually enabled.
Pairing the calls makes life much much easier.
> 2.For non-polling drivers like hyperv-drm, the existing checks on poll_enabled in both enable and disable functions are sufficient. Removing poll_running doesn't affect these drivers-they don't go through the polling code paths, so no polling gets scheduled or canceled by mistake
>
> Therefore based on my understanding and testing removing poll_running guard not only fixes a real bug but also does not introduce new regressions for drivers that don't use output polling.
--
With best wishes
Dmitry
More information about the Intel-gfx
mailing list