[REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM

Lazar, Lijo Lijo.Lazar at amd.com
Mon Jan 24 14:21:11 UTC 2022


[Public]

Not able to relate to how it affects gfx/mem DPM alone. Unless Alex has other ideas, would you be able to enable drm debug messages and share the log?

	Enabling verbose debug messages is done through the drm.debug parameter, each category being enabled by a bit:

	drm.debug=0x1 will enable CORE messages
	drm.debug=0x2 will enable DRIVER messages
	drm.debug=0x3 will enable CORE and DRIVER messages
	...
	drm.debug=0x1ff will enable all messages
	An interesting feature is that it's possible to enable verbose logging at run-time by echoing the debug value in its sysfs node:

	# echo 0xf > /sys/module/drm/parameters/debug

Thanks,
Lijo

-----Original Message-----
From: James Turner <linuxkernel.foss at dmarc-none.turner.link> 
Sent: Sunday, January 23, 2022 2:41 AM
To: Lazar, Lijo <Lijo.Lazar at amd.com>
Cc: Alex Deucher <alexdeucher at gmail.com>; Thorsten Leemhuis <regressions at leemhuis.info>; Deucher, Alexander <Alexander.Deucher at amd.com>; regressions at lists.linux.dev; kvm at vger.kernel.org; Greg KH <gregkh at linuxfoundation.org>; Pan, Xinhui <Xinhui.Pan at amd.com>; LKML <linux-kernel at vger.kernel.org>; amd-gfx at lists.freedesktop.org; Alex Williamson <alex.williamson at redhat.com>; Koenig, Christian <Christian.Koenig at amd.com>
Subject: Re: [REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM

Hi Lijo,

> Could you provide the pp_dpm_* values in sysfs with and without the 
> patch? Also, could you try forcing PCIE to gen3 (through pp_dpm_pcie) 
> if it's not in gen3 when the issue happens?

AFAICT, I can't access those values while the AMD GPU PCI devices are bound to `vfio-pci`. However, I can at least access the link speed and width elsewhere in sysfs. So, I gathered what information I could for two different cases:

- With the PCI devices bound to `vfio-pci`. With this configuration, I
  can start the VM, but the `pp_dpm_*` values are not available since
  the devices are bound to `vfio-pci` instead of `amdgpu`.

- Without the PCI devices bound to `vfio-pci` (i.e. after removing the
  `vfio-pci.ids=...` kernel command line argument). With this
  configuration, I can access the `pp_dpm_*` values, since the PCI
  devices are bound to `amdgpu`. However, I cannot use the VM. If I try
  to start the VM, the display (both the external monitors attached to
  the AMD GPU and the built-in laptop display attached to the Intel
  iGPU) completely freezes.

The output shown below was identical for both the good commit:
f1688bd69ec4 ("drm/amd/amdgpu:save psp ring wptr to avoid attack") and the commit which introduced the issue:
f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)")

Note that the PCI link speed increased to 8.0 GT/s when the GPU was under heavy load for both versions, but the clock speeds of the GPU were different under load. (For the good commit, it was 1295 MHz; for the bad commit, it was 501 MHz.)


# With the PCI devices bound to `vfio-pci`

## Before starting the VM

% ls /sys/module/amdgpu/drivers/pci:amdgpu
module  bind  new_id  remove_id  uevent  unbind

% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
8.0 GT/s PCIe

## While running the VM, before placing the AMD GPU under heavy load

% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
2.5 GT/s PCIe

## While running the VM, with the AMD GPU under heavy load

% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
8.0 GT/s PCIe

## While running the VM, after stopping the heavy load on the AMD GPU

% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
2.5 GT/s PCIe

## After stopping the VM

% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
2.5 GT/s PCIe


# Without the PCI devices bound to `vfio-pci`

% ls /sys/module/amdgpu/drivers/pci:amdgpu
0000:01:00.0  module  bind  new_id  remove_id  uevent  unbind

% for f in /sys/module/amdgpu/drivers/pci:amdgpu/*/pp_dpm_*; do echo "$f"; cat "$f"; echo; done /sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_mclk
0: 300Mhz
1: 625Mhz
2: 1500Mhz *

/sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_pcie
0: 2.5GT/s, x8
1: 8.0GT/s, x16 *

/sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_sclk
0: 214Mhz
1: 501Mhz
2: 850Mhz
3: 1034Mhz
4: 1144Mhz
5: 1228Mhz
6: 1275Mhz
7: 1295Mhz *

% find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \; /sys/bus/pci/devices/0000:01:00.0/current_link_width
8
/sys/bus/pci/devices/0000:01:00.0/current_link_speed
8.0 GT/s PCIe


James


More information about the amd-gfx mailing list