vc4: WARNING during suspend to idle of Raspberry Pi 3 Plus

Thu Jun 27 10:53:52 UTC 2024

Am 27.06.24 um 11:58 schrieb Maxime Ripard:
> On Tue, Jun 25, 2024 at 04:21:11PM GMT, Stefan Wahren wrote:
>> Am 24.06.24 um 14:57 schrieb Maxime Ripard:
>>> On Sun, Jun 16, 2024 at 11:27:08AM GMT, Stefan Wahren wrote:
>>>> Hi,
>>>> i'm currently experiment with suspend to idle on the Raspberry Pi [1].
>>>>
>>>> During my tests, i noticed a WARNING of the vc4 during suspend incl.
>>>> Runtime PM usage count underflow. It would be nice if someone can look
>>>> at it. In case you want to reproduce it, i can prepare a branch with
>>>> some improvements/hacks. For example i disabled dwc2/USB because it
>>>> cause a lot of trouble during resume.
>>>>
>>>> Here are the steps to trigger this issue:
>>>> - make sure necessary kernel options are enabled ( CONFIG_SUSPEND,
>>>> CONFIG_PM_DEBUG, CONFIG_PM_ADVANCED_DEBUG )
>>>> - make sure no HDMI cable is connected to Raspberry Pi 3 Plus
>> sorry, i forgot the specific type of the Raspberry Pi 3 B Plus, but the
>> issue on Pi 3 A Plus is the same.
>>>> - Add "no_console_suspend" to cmdline.txt and reboot
>>>> - Connect via Debug UART:
>>>>
>>>> echo 1 > /sys/power/pm_debug_messages
>>>> echo platform > /sys/power/pm_test
>>>> echo freeze > /sys/power/state
>>>>
>>>> Here is the output:
>>>>
>>>> [   75.538497] PM: suspend entry (s2idle)
>>>> [   76.915786] Filesystems sync: 1.377 seconds
>>>> [   79.678262] rpi_firmware_set_power: Set HDMI to 1
>>>> [   79.678634] rpi_firmware_set_power: Set HDMI to 0
>>>> [   79.850949] Freezing user space processes
>>>> [   79.852460] Freezing user space processes completed (elapsed 0.001
>>>> seconds)
>>>> [   79.852484] OOM killer disabled.
>>>> [   79.852493] Freezing remaining freezable tasks
>>>> [   79.853684] Freezing remaining freezable tasks completed (elapsed
>>>> 0.001 seconds)
>>>> [   80.892819] ieee80211 phy0: brcmf_fil_cmd_data: bus is down. we have
>>>> nothing to do.
>>>> [   80.892843] ieee80211 phy0: brcmf_cfg80211_get_tx_power: error (-5)
>>>> [   81.514053] PM: suspend of devices complete after 1659.336 msecs
>>>> [   81.514089] PM: start suspend of devices complete after 1660.386 msecs
>>>> [   81.515616] PM: late suspend of devices complete after 1.509 msecs
>>>> [   81.515938] rpi_firmware_set_power: Set VEC to 0
>>>> [   81.516010] rpi_firmware_set_power: Set V3D to 0
>>>> [   81.516998] PM: noirq suspend of devices complete after 1.239 msecs
>>>> [   81.517016] PM: suspend debug: Waiting for 5 second(s).
>>>> [   89.598310] rpi_firmware_set_power: Set V3D to 1
>>>> [   90.078228] ------------[ cut here ]------------
>>>> [   90.078240] WARNING: CPU: 1 PID: 216 at
>>>> drivers/gpu/drm/vc4/vc4_hdmi.c:477
>>>> vc4_hdmi_connector_detect_ctx+0x2e4/0x34c [vc4]
>>>> [   90.078344] Modules linked in: aes_arm aes_generic cbc crypto_simd
>>>> cryptd algif_skcipher af_alg brcmfmac_wcc brcmfmac vc4 brcmutil
>>>> snd_soc_hdmi_codec snd_soc_core ac97_bus sha256_generic
>>>> snd_pcm_dmaengine libsha256 snd_pcm sha256_arm snd_timer hci_uart
>>>> cfg80211 btbcm snd bluetooth soundcore drm_dma_helper crc32_arm_ce
>>>> ecdh_generic ecc raspberrypi_hwmon libaes bcm2835_thermal
>>>> [   90.078551] CPU: 1 PID: 216 Comm: kworker/1:2 Not tainted 6.9.3-dirty #30
>>>> [   90.078568] Hardware name: BCM2835
>>>> [   90.078580] Workqueue: events output_poll_execute
>>>> [   90.078610] Call trace:
>>>> [   90.078624]  unwind_backtrace from show_stack+0x10/0x14
>>>> [   90.078660]  show_stack from dump_stack_lvl+0x50/0x64
>>>> [   90.078688]  dump_stack_lvl from __warn+0x7c/0x124
>>>> [   90.078723]  __warn from warn_slowpath_fmt+0x170/0x178
>>>> [   90.078760]  warn_slowpath_fmt from
>>>> vc4_hdmi_connector_detect_ctx+0x2e4/0x34c [vc4]
>>>> [   90.078862]  vc4_hdmi_connector_detect_ctx [vc4] from
>>>> drm_helper_probe_detect_ctx+0x40/0x120
>>>> [   90.078951]  drm_helper_probe_detect_ctx from
>>>> output_poll_execute+0x160/0x24c
>>>> [   90.078982]  output_poll_execute from process_one_work+0x16c/0x3b4
>>>> [   90.079012]  process_one_work from worker_thread+0x270/0x488
>>>> [   90.079036]  worker_thread from kthread+0xe0/0xfc
>>>> [   90.079060]  kthread from ret_from_fork+0x14/0x28
>>>> [   90.079080] Exception stack(0xf0af9fb0 to 0xf0af9ff8)
>>>> [   90.079096] 9fa0:                                     00000000
>>>> 00000000 00000000 00000000
>>>> [   90.079113] 9fc0: 00000000 00000000 00000000 00000000 00000000
>>>> 00000000 00000000 00000000
>>>> [   90.079129] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>>> [   90.079141] ---[ end trace 0000000000000000 ]---
>>>> [   90.079155] vc4_hdmi 3f902000.hdmi: vc4_hdmi_connector_detect_ctx:
>>>> pm_runtime_resume_and_get failed: -13
>>>> [   92.638262] rpi_firmware_set_power: Set HDMI to 1
>>>> [   95.678251] rpi_firmware_set_power: Set VEC to 1
>>>> [   95.678380] PM: noirq resume of devices complete after 9160.930 msecs
>>>> [   95.679604] PM: early resume of devices complete after 1.069 msecs
>>>> [   95.812230] brcmfmac: brcmf_fw_alloc_request: using
>>>> brcm/brcmfmac43455-sdio for chip BCM4345/6
>>>> [   95.812282] PM: resume of devices complete after 132.657 msecs
>>>> [   95.813246] vc4_hdmi 3f902000.hdmi: Runtime PM usage count underflow!
>>>> [   95.814456] OOM killer enabled.
>>>> [   95.814466] Restarting tasks ... done.
>>>> [   95.817193] random: crng reseeded on system resumption
>>>> [   95.819813] rpi_firmware_set_power: Set HDMI to 0
>>>> [   95.827808] PM: suspend exit
>>>>
>>>> [1] - https://github.com/raspberrypi/firmware/issues/1894
>>> The code itself looks fine to me still, but It's not clear to me why it
>>> getting called during suspend in the first place.
>> This is a good question. I don't have idea why this should be necessary.
>>
>> But according to the kernel docs the worker output_poll_execute can be
>> disabled with drm_kms_helper_poll_disable().
> Yeah, we might need to disable polling during suspend.

Agree. Yesterday i managed to get s2idle working on a old Raspberry Pi 1
B, so i will prepare a patch for this. It's not the only driver for
BCM2835, which need suspend patches.

> I must admit I
> don't have much experience with suspend, so I won't be of much help.
>
>> Btw do we need to use SET_SYSTEM_SLEEP_PM_OPS here?
>>>    IIRC, it's in the HPD
>>> interrupt handler path, could it be that the interrupt fires during
>>> suspend?
>> Based on the trace and my further investigations the function is called
>> vc4_hdmi_connector_detect_ctx every 10s while no HDMI connector is
>> plugged by output_poll_execute. This has the advantage that also the
>> GPIOs from the expander could be used as HPD. Since the suspend test is
>> 5 sec long, there is a high chance for this warning.
> That's slightly unrelated, but we can actually use that GPIO for HPD
> since it supports interrupts. I did a patch quite some time ago that was
> never merged for a good reason, but I can't remember what it was
> exactly:
>
> https://github.com/raspberrypi/linux/pull/4327
Yes, the commit "pinctrl: bcm2835: Disable interrupts for the banks > 0"
is not really acceptable. From my suspend investigations so far, i can
tell there some issues with the irqchip driver(s). So this commit looks
like a workaround.
>
>> Here is my current workaround:
>>
>> drm/vc4: hdmi: Handle error case of pm_runtime_resume_and_get
>>
>> The commit 0f5251339eda ("drm/vc4: hdmi: Make sure the controller is
>> powered in detect") introduced the necessary power management handling
>> to avoid register access while controller is powered down.
>> Unfortunately it just print a warning if pm_runtime_resume_and_get()
>> fails and proceed anyway.
>>
>> This could happen during suspend to idle. So we must assume it is unsafe
>> to access the HDMI register. So bail out properly.
>>
>> Fixes: 0f5251339eda ("drm/vc4: hdmi: Make sure the controller is powered
>> in detect")
>> Signed-off-by: Stefan Wahren <wahrenst at gmx.net>
>> ---
>>   drivers/gpu/drm/vc4/vc4_hdmi.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
>> index d30f8e8e896717..2c1d59a181d833 100644
>> --- a/drivers/gpu/drm/vc4/vc4_hdmi.c
>> +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
>> @@ -462,6 +462,7 @@ static int vc4_hdmi_connector_detect_ctx(struct
>> drm_connector *connector,
>>   {
>>       struct vc4_hdmi *vc4_hdmi = connector_to_vc4_hdmi(connector);
>>       enum drm_connector_status status = connector_status_disconnected;
>> +    int ret;
>>
>>       /*
>>        * NOTE: This function should really take vc4_hdmi->mutex, but
>> @@ -474,7 +475,11 @@ static int vc4_hdmi_connector_detect_ctx(struct
>> drm_connector *connector,
>>        * the lock for now.
>>        */
>>
>> - WARN_ON(pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev));
>> +    ret = pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to retain HDMI power domain: %d\n", ret);
>> +        return status;
>> +    }
>>
>>       if (vc4_hdmi->hpd_gpio) {
>>           if (gpiod_get_value_cansleep(vc4_hdmi->hpd_gpio))
>>
> Yeah... I think we should really just fix the "why is detect even called
> in the first place" issue rather than merge something like that.
Yes, but i think the WARN_ON [1] and the missing error handling of
pm_runtime_resume_and_get is still an issue.

Best regards

[1] - https://lwn.net/Articles/969923/
>
> Maxime