[PATCH V4 17/17] drm/amd/pm: unified lock protections in amdgpu_dpm.c

Quan, Evan Evan.Quan at amd.com
Fri Apr 1 09:19:20 UTC 2022


[AMD Official Use Only]

Yes, as Christian mentioned, enabling CONFIG_LOCKDEP_SUPPORT will help debugging such deadlock issue.
Meanwhile, can you give the following change(drop the lock protections in amdgpu_dpm_compute_clocks) a try?

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index c73fb73e9628..50e89f5659fa 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -423,9 +423,7 @@ void amdgpu_dpm_compute_clocks(struct amdgpu_device *adev)
        if (!pp_funcs->pm_compute_clocks)
                return;

-       mutex_lock(&adev->pm.mutex);
        pp_funcs->pm_compute_clocks(adev->powerplay.pp_handle);
-       mutex_unlock(&adev->pm.mutex);
 }

 void amdgpu_dpm_enable_uvd(struct amdgpu_device *adev, bool enable)

BR
Evan
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Friday, April 1, 2022 4:56 PM
> To: Arthur Marsh <arthur.marsh at internode.on.net>; Quan, Evan
> <Evan.Quan at amd.com>
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Feng, Kenneth
> <Kenneth.Feng at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com>; amd-
> gfx at lists.freedesktop.org
> Subject: Re: [PATCH V4 17/17] drm/amd/pm: unified lock protections in
> amdgpu_dpm.c
> 
> Hi Arthur,
> 
> apart from blacklisting amdgpu I generally advise to SSH from another
> computer into the affected system if you have a problem like this.
> 
> Additionally to what Evan said I suggest that you enable
> CONFIG_LOCKDEP_SUPPORT in your kernel configuration. This will yield
> warnings in your system log in case of deadlocks or accidentally forgetting to
> unlock something.
> 
> Regards,
> Christian.
> 
> Am 01.04.22 um 10:49 schrieb Arthur Marsh:
> > Hi Evan, this is what was logged (filtering for drm and amdgpu) when I
> > blacklisted amdgpu then manually did:
> >
> > modprobe amdgpu si_support=1 gpu_recovery=1
> >
> > Apr  1 18:31:14 am64 kernel: [    0.000000] Command line:
> BOOT_IMAGE=/vmlinuz-5.17.0+ root=UUID=39706f53-7c27-4310-b22a-
> 36c7b042d1a1 ro amdgpu.audio=1 amdgpu.si_support=1
> radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1 udev.log-
> priority=info rd.udev.log-priority=info
> > Apr  1 18:31:14 am64 kernel: [    0.059624] Kernel command line:
> BOOT_IMAGE=/vmlinuz-5.17.0+ root=UUID=39706f53-7c27-4310-b22a-
> 36c7b042d1a1 ro amdgpu.audio=1 amdgpu.si_support=1
> radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1 udev.log-
> priority=info rd.udev.log-priority=info
> >
> > Apr  1 18:33:43 am64 kernel: [  245.724485] ACPI: bus type
> > drm_connector registered Apr  1 18:33:44 am64 kernel: [  245.945020] [drm]
> amdgpu kernel modesetting enabled.
> > Apr  1 18:33:44 am64 kernel: [  245.945140] amdgpu 0000:01:00.0:
> > vgaarb: deactivate vga console Apr  1 18:33:44 am64 kernel: [  245.946413]
> [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 0x1458:0x22CA
> 0x87).
> > Apr  1 18:33:44 am64 kernel: [  245.946423] amdgpu 0000:01:00.0:
> > amdgpu: Trusted Memory Zone (TMZ) feature not supported Apr  1
> > 18:33:44 am64 kernel: [  245.946448] [drm] register mmio base:
> > 0xFE8C0000 Apr  1 18:33:44 am64 kernel: [  245.946451] [drm] register
> > mmio size: 262144 Apr  1 18:33:44 am64 kernel: [  245.946642] [drm]
> > add ip block number 0 <si_common> Apr  1 18:33:44 am64 kernel: [
> > 245.946657] [drm] add ip block number 1 <gmc_v6_0> Apr  1 18:33:44
> > am64 kernel: [  245.946660] [drm] add ip block number 2 <si_ih> Apr  1
> > 18:33:44 am64 kernel: [  245.946663] [drm] add ip block number 3
> > <gfx_v6_0> Apr  1 18:33:44 am64 kernel: [  245.946666] [drm] add ip
> > block number 4 <si_dma> Apr  1 18:33:44 am64 kernel: [  245.946668]
> > [drm] add ip block number 5 <si_dpm> Apr  1 18:33:44 am64 kernel: [
> > 245.946671] [drm] add ip block number 6 <dce_v6_0> Apr  1 18:33:44
> > am64 kernel: [  245.946674] [drm] add ip block number 7 <uvd_v3_1> Apr
> > 1 18:33:44 am64 kernel: [  245.990113] [drm] BIOS signature incorrect
> > 20 7 Apr  1 18:33:44 am64 kernel: [  245.990146] amdgpu 0000:01:00.0:
> > No more image in the PCI ROM Apr  1 18:33:44 am64 kernel: [
> > 245.991510] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
> > Apr  1 18:33:44 am64 kernel: [  245.991516] amdgpu: ATOM BIOS:
> > xxx-xxx-xxx Apr  1 18:33:44 am64 kernel: [  245.991539] amdgpu
> > 0000:01:00.0: amdgpu: PCIE atomic ops is not supported Apr  1 18:33:44
> > am64 kernel: [  245.991841] [drm] vm size is 64 GB, 2 levels, block
> > size is 10-bit, fragment size is 9-bit Apr  1 18:33:44 am64 kernel:
> [  246.045705] amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M
> 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used) Apr  1 18:33:44
> am64 kernel: [  246.045719] amdgpu 0000:01:00.0: amdgpu: GART: 1024M
> 0x000000FF00000000 - 0x000000FF3FFFFFFF Apr  1 18:33:44 am64 kernel:
> [  246.045736] [drm] Detected VRAM RAM=2048M, BAR=256M Apr  1 18:33:44
> am64 kernel: [  246.045739] [drm] RAM width 128bits DDR3 Apr  1 18:33:44
> am64 kernel: [  246.045825] [drm] amdgpu: 2048M of VRAM memory ready
> Apr  1 18:33:44 am64 kernel: [  246.045829] [drm] amdgpu: 3072M of GTT
> memory ready.
> > Apr  1 18:33:44 am64 kernel: [  246.045854] [drm] GART: num cpu pages
> > 262144, num gpu pages 262144 Apr  1 18:33:44 am64 kernel: [  246.046180]
> amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at
> 0x000000F400900000).
> > Apr  1 18:33:44 am64 kernel: [  246.084159] [drm] Internal thermal
> > controller with fan control Apr  1 18:33:44 am64 kernel: [
> > 246.084180] [drm] amdgpu: dpm initialized Apr  1 18:33:44 am64 kernel:
> > [  246.084264] [drm] AMDGPU Display Connectors Apr  1 18:33:44 am64
> kernel: [  246.084268] [drm] Connector 0:
> > Apr  1 18:33:44 am64 kernel: [  246.084270] [drm]   HDMI-A-1
> > Apr  1 18:33:44 am64 kernel: [  246.084272] [drm]   HPD1
> > Apr  1 18:33:44 am64 kernel: [  246.084274] [drm]   DDC: 0x194c 0x194c
> 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
> > Apr  1 18:33:44 am64 kernel: [  246.084279] [drm]   Encoders:
> > Apr  1 18:33:44 am64 kernel: [  246.084281] [drm]     DFP1:
> INTERNAL_UNIPHY
> > Apr  1 18:33:44 am64 kernel: [  246.084283] [drm] Connector 1:
> > Apr  1 18:33:44 am64 kernel: [  246.084285] [drm]   DVI-D-1
> > Apr  1 18:33:44 am64 kernel: [  246.084287] [drm]   HPD2
> > Apr  1 18:33:44 am64 kernel: [  246.084289] [drm]   DDC: 0x1950 0x1950
> 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
> > Apr  1 18:33:44 am64 kernel: [  246.084293] [drm]   Encoders:
> > Apr  1 18:33:44 am64 kernel: [  246.084295] [drm]     DFP2:
> INTERNAL_UNIPHY
> > Apr  1 18:33:44 am64 kernel: [  246.084297] [drm] Connector 2:
> > Apr  1 18:33:44 am64 kernel: [  246.084299] [drm]   VGA-1
> > Apr  1 18:33:44 am64 kernel: [  246.084301] [drm]   DDC: 0x1970 0x1970
> 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
> > Apr  1 18:33:44 am64 kernel: [  246.084305] [drm]   Encoders:
> > Apr  1 18:33:44 am64 kernel: [  246.084307] [drm]     CRT1:
> INTERNAL_KLDSCP_DAC1
> > Apr  1 18:33:44 am64 kernel: [  246.135615] [drm] Found UVD firmware
> > Version: 64.0 Family ID: 13 Apr  1 18:33:44 am64 kernel: [
> > 246.137371] [drm] PCIE gen 2 link speeds already enabled Apr  1 18:33:44
> am64 kernel: [  246.674277] [drm] UVD initialized successfully.
> > Apr  1 18:33:44 am64 kernel: [  246.674849] amdgpu 0000:01:00.0:
> > amdgpu: SE 1, SH per SE 2, CU per SH 5, active_cu_number 8 Apr  1
> > 18:33:45 am64 kernel: [  247.008964] [drm] Initialized amdgpu 3.46.0
> > 20150101 for 0000:01:00.0 on minor 0 Apr  1 18:33:45 am64 kernel: [
> > 247.068412] fbcon: amdgpudrmfb (fb0) is primary device
> >
> > The monitor still went blank but the magic sysreq sync and boot
> > worked, allowing capture of the above log but nothing after the line above.
> >
> > Regards,
> >
> > Arthur Marsh.


More information about the amd-gfx mailing list