[Nouveau] [PATCH v2 0/7] Modernize vga_switcheroo by using device link for HDA

Peter Wu peter at lekensteyn.nl
Mon Mar 5 20:58:31 UTC 2018


Hi Lukas,

Sorry for the delay, I finally found some time to reviewd and test the
patches and found some issues (some of them might already be present in
v4.15 without your patches though, I did not try).

Test environment:
- Branch switcheroo_devlink_v2 (commit v4.15-20-gb33d50c5c6ad)
- Laptop: Clevo P651RA (nouveau uses PR3), lspci:
  00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
  01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 965M] [10de:13d9] (rev a1)
  01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
- Distribution: Arch Linux x86_64
- Tested from (1) console and (2) Xorg 1.19.3-2 (Openbox)
- Booted with HDMI cable connected and nouveau/snd-hda-intel unloaded.

To check the runtime PM status, I used this "rpm-status" script:

    grep --color . /sys/bus/pci/devices/0000:00:01.0/{0000:01:00.0/,0000:01:00.1/,}power/control
    grep --color . /sys/bus/pci/devices/0000:00:01.0/{0000:01:00.0/,0000:01:00.1/,}power/runtime_{enabled,usage,active_kids,status}


To test audio output (via HDMI; DP port does not seem to support audio):

    speaker-test -Dhdmi:CARD=NVidia,DEV=0 -c2

Below I first list some issues, then some good news.

Issue 1 - GPU does not suspend on text console.
When present at the text console and an external monitor is connected
through HDMI or DP, the RPM counter is 1. Only when the cable is removed
(or when "echo off > /sys/class/drm/card1-HDMI-A-1/status"), the RPM
count drops to 0 and the GPU device suspends. When Xorg is started
(startx), the RPM counter also drops to 0 though.

Issue 2 - RPM counter for audio function drops below 0 on system sleep.
When both nouveau and snd-hda-intel are loaded and a HDMI (or DP?) cable
is connected, the RPM counter becomes one after suspend/resume. This
happens both with text console and Xorg.

Issue 3 - invalid PCI config reads to audio device if disconnected.
When no HDMI/DP cable is connected, the HDMI audio function will be
inaccessible after runtime/system resume. Assume nouveau loaded before
s/r, then loading snd-hda-intel will fail with:

    0]: pam_unix(sudo:session): session closed for user root
    hdaudio hdaudioC1D0: no AFG or MFG node found
    hdaudio hdaudioC1D1: no AFG or MFG node found
    hdaudio hdaudioC1D2: no AFG or MFG node found
    hdaudio hdaudioC1D3: no AFG or MFG node found
    hdaudio hdaudioC1D4: no AFG or MFG node found
    hdaudio hdaudioC1D5: no AFG or MFG node found
    hdaudio hdaudioC1D6: no AFG or MFG node found
    hdaudio hdaudioC1D7: no AFG or MFG node found
    snd_hda_intel 0000:01:00.1: no codecs initialized

After rmmod snd-hda-intel and system suspend/resume:

    pci 0000:01:00.1: restoring config space at offset 0x3c (was 0xffffffff, writing 0x200)
    pci 0000:01:00.1: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
    pci 0000:01:00.1: restoring config space at offset 0x34 (was 0xffffffff, writing 0x60)

This persists until removing both PCI devices and rescanning the root
port. (When no HDMI/DP cable is connected, the audio function will not
appear; remove+rescan is required to recover.)

Issue 4 - runtime_active_kids leak with audio function.
After the above issue, the audio device never entered the suspended
state even though the runtime_usage counter reached 0. It turned out
that runtime_active_kids was 4. Every time snd-hda-intel is loaded (and
fails to initialize due to the above issue), this counter is increased.

Issue 5 - audio breaks after system sleep or stopping Xorg.
When Xorg is stopped or the system sleep/resumes while speaker-test is
active (e.g. in GNU screen), audio stops playing and speaker-test exits.

Issue 6 - wrong pin status reported / no audio
(Possibly "working as expected" since audio is tied to GPU function.)
Scenario: HDMI cable is connected but GPU is unused
("echo off > /sys/class/drm/card1-HDMI-1-1/status" from console or
with "xrandr --output HDMI-A-1 --off"). hdajacksensetest reports no
HDMI pin presence even if connected, dmesg reports:

    nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for HDMI-A-1

Using "speaker-test", the program does not fail but no sound can be
heard either.

Issue 7 - nouveau: warning after unloading after stopping Xorg
(Issue in nouveau, likely not related to this patch set.)
After "xrandr --output HDMI-1-1 --mode 2560x1440" in Xorg, stopping Xorg
(and possibly "echo off > /sys/class/drm/card1-HDMI-A-1/status"),
removing nouveau triggered:

    WARNING: CPU: 7 PID: 5475 at drivers/gpu/drm/drm_mode_config.c:439 drm_mode_config_cleanup+0x1fa/0x260
    CPU: 7 PID: 5475 Comm: rmmod Not tainted 4.15.0testing-00020-gb33d50c5c6ad #55
    RIP: 0010:drm_mode_config_cleanup+0x1fa/0x260
    [..]
    Call Trace:
     nouveau_display_destroy+0x41/0x80 [nouveau]
     nouveau_drm_unload+0x6b/0xd0 [nouveau]
     drm_dev_unregister+0x3c/0xe0
     drm_put_dev+0x2e/0x60
     nouveau_drm_device_remove+0x37/0x50 [nouveau]
     pci_device_remove+0x36/0xb0
     device_release_driver_internal+0x160/0x230
     driver_detach+0x3a/0x70
     bus_remove_driver+0x58/0xd0
     pci_unregister_driver+0x3b/0x90
     nouveau_drm_exit+0x15/0x432 [nouveau]
     SyS_delete_module+0x16c/0x230

Issue 8 - acpi: sleeping function in atomic context.
(Issue is likely not related to this patch set.)
At some point I also got a BUG, nouveau was already unloaded and I ran:
"echo 1 | tee /sys/bus/pci/devices/0000:01:00.{0,1}/remove"

    BUG: sleeping function called from invalid context at /home/peter/linux/mm/slab.h:419
    in_atomic(): 1, irqs_disabled(): 0, pid: 4844, name: kworker/3:4
    INFO: lockdep is turned off.
    CPU: 3 PID: 4844 Comm: kworker/3:4 Tainted: G        W        4.15.0testing-00020-gb33d50c5c6ad #55
    Hardware name: Notebook                         P65_P67RGRERA/P65_P67RGRERA, BIOS 1.05.16 05/16/2016
    Workqueue: events_power_efficient srcu_invoke_callbacks
    Call Trace:
     dump_stack+0x5f/0x86
     ___might_sleep+0x20c/0x240
     kmem_cache_alloc_trace+0x4d/0x230
     acpi_ut_evaluate_object+0x68/0x23c
     ? srcu_invoke_callbacks+0xa2/0x150
     acpi_rs_get_prt_method_data+0x42/0xa2
     acpi_get_irq_routing_table+0x70/0x9f
     ? __slab_free+0x11c/0x380
     acpi_pci_irq_find_prt_entry+0x83/0x330
     ? srcu_invoke_callbacks+0xa2/0x150
     acpi_pci_irq_lookup+0x27/0x2e0
     acpi_pci_irq_disable+0x45/0xb0
     pci_release_dev+0x29/0x60
     device_release+0x2d/0x80
     kobject_put+0xb7/0x190
     __device_link_free_srcu+0x32/0x40
     srcu_invoke_callbacks+0xba/0x150
     process_one_work+0x273/0x670
     worker_thread+0x4a/0x400
     kthread+0x100/0x140
     ? process_one_work+0x670/0x670
     ? kthread_create_worker_on_cpu+0x50/0x50
     ? do_syscall_64+0x56/0x1a0
     ? SyS_exit_group+0x10/0x10

Issue 9 - potential memory corruption.
At some point (possibly after issue 7, but I am not fully sure), I saw
an artifact in the text console which would persist even when switching
between consoles. It was gone after system sleep/resume. If I remember
correctly, it looked like something from a Xorg session which I killed
before.

That was the bad news, the good news:
- Loading nouveau and snd-hda-intel (in any order) while RPM is enabled
  and the port was in D3cold works.
- RPM interaction between audio and GPU seems good, when audio resumes,
  the GPU RPM counter increments, when audio suspends it decrements.
- As the GPU enters D3cold, I can observe significant power savings
  through /sys/class/power_supply/BAT0/ (no regressions here).
- In a default configuration I have no audio function (see also nouveau
  bug https://bugs.freedesktop.org/show_bug.cgi?id=75985) so most of the
  above issues should not occur.

Hope it helps, and if desired you can add:
Tested-by: Peter Wu <peter at lekensteyn.nl>

For the following patches, you can also add my Reviewed-by:

    vga_switcheroo: Update PCI current_state on power change
    vga_switcheroo: Deduplicate power state tracking
    vga_switcheroo: Use device link for HDA controller
    vga_switcheroo: Let HDA autosuspend on mux change
    drm/nouveau: Runtime suspend despite HDA being unbound

The two other PCI patches look fine as well.

Kind regards,
Peter

On Sat, Mar 03, 2018 at 10:53:24AM +0100, Lukas Wunner wrote:
> Modernize vga_switcheroo by using a device link to enforce a runtime PM
> dependency from an HDA controller to the GPU it's integrated into, v2.
> 
> Changes since v1:
> 
> - Replace patch [1/7] to use pci_save_state() / pci_restore_state()
>   for consistency between runtime PM code path of bound and unbound
>   devices. (Rafael, Bjorn)
> 
> - Patch [5/7]: Drop an unnecessary initialization. (Bjorn)
>   Rephrase error message on failed link addition for clarity.
> 
> Link to v1:
> https://www.spinics.net/lists/dri-devel/msg165889.html
> 
> Testing on more machines would be greatly appreciated, particularly
> Nvidia Optimus or AMD PowerXpress.
> 
> The series is based on 4.16-rc3.  To test it on 4.15, you need to
> cherry-pick 7506dc798993 and 2fa6d6cdaf28.  For your convenience
> I've pushed a 4.15-based branch to:
> https://github.com/l1k/linux/commits/switcheroo_devlink_v2
> 
> Minimal test procedure:
> 
> - Note well: Recent Optimus require that a Mini-DP or HDMI cable is
>   plugged in on boot for the HDA device to be present.
> 
> - Check that HDA, GPU and root port autosuspend when not in use:
>   cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status  # HDA
>   cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status  # GPU
>   cat /sys/bus/pci/devices/0000:00:01.0/power/runtime_status  # Root Port
> 
> - Check that all three autoresume when accessing the HDA:
>   hdajacksensetest -c 1
> 
> - Unbind the HDA controller:
>   echo 0000:01:00.1 > /sys/bus/pci/drivers/snd_hda_intel/unbind
>   Wait for GPU to power off, then rebind the HDA controller:
>   echo 0000:01:00.1 > /sys/bus/pci/drivers/snd_hda_intel/bind
>   Check dmesg for errors, try accessing HDA with hdajacksensetest.
> 
> - If your laptop uses the root port's _PR3 to cut power to the GPU:
>   Unbind the GPU:
>   echo 0000:01:00.0 > /sys/bus/pci/drivers/{nouveau,amdgpu,radeon}/unbind
>   Allow runtime PM on the GPU:
>   echo auto > /sys/bus/pci/devices/0000:01:00.0/power/control
>   Wait for GPU to power off, then rebind it:
>   echo 0000:01:00.0 > /sys/bus/pci/drivers/{nouveau,amdgpu,radeon}/bind
>   Check dmesg for errors.  If you see any then we may need to perform
>   further actions in pci_pm_runtime_resume(), see patch [1/7].
> 
> Thanks,
> 
> Lukas
> 
> 
> Lukas Wunner (6):
>   PCI: Make pci_wakeup_bus() & pci_bus_set_current_state() public
>   vga_switcheroo: Update PCI current_state on power change
>   vga_switcheroo: Deduplicate power state tracking
>   vga_switcheroo: Use device link for HDA controller
>   vga_switcheroo: Let HDA autosuspend on mux change
>   drm/nouveau: Runtime suspend despite HDA being unbound
> 
> Rafael J. Wysocki (1):
>   PCI: Restore config space on runtime resume despite being unbound
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |   2 -
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  46 ----------
>  drivers/gpu/drm/nouveau/nouveau_drv.h   |   1 -
>  drivers/gpu/drm/radeon/radeon_drv.c     |   2 -
>  drivers/gpu/vga/vga_switcheroo.c        | 152 ++++++++------------------------
>  drivers/pci/pci-driver.c                |  17 ++--
>  drivers/pci/pci.c                       |   8 +-
>  drivers/pci/quirks.c                    |  39 ++++++++
>  include/linux/pci.h                     |   2 +
>  include/linux/pci_ids.h                 |   1 +
>  include/linux/vga_switcheroo.h          |   6 --
>  include/sound/hdaudio.h                 |   3 -
>  sound/pci/hda/hda_intel.c               |  36 +++++---
>  sound/pci/hda/hda_intel.h               |   3 -
>  14 files changed, 117 insertions(+), 201 deletions(-)
> 
> -- 
> 2.15.1
> 


More information about the Nouveau mailing list