[PATCH] amdgpu uses raw rlc_hdr values, causing kernel OOPS on big endian architectures

Sun Sep 16 22:52:38 UTC 2018

adev->gfx.rlc in gfx_v8_0_init_microcode has the values from rlc_hdr
already processed by le32_to_cpu.  Using the rlc_hdr values on
big-endian machines causes a kernel Oops due to writing well outside of
the array (0x24000000 instead of 0x24).  gfx_v9_0 had the same issue and
was fixed in the same manner (but was not tested locally; I do not have
a v9 card).

[    8.143396] Unable to handle kernel paging request for data at
address 0xc00800000615b000
[    8.143396] Faulting instruction address: 0xc008000005f063c8
[    8.143399] Oops: Kernel access of bad area, sig: 11 [#1]
[    8.143429] BE SMP NR_CPUS=256 NUMA PowerNV
[    8.143461] Modules linked in: binfmt_misc amdgpu(+) ast ttm
drm_kms_helper sysimgblt syscopyarea sysfillrect fb_sys_fops drm joydev
mac_hid tg3 ipmi_powernv ipmi_msghandler agpgart i2c_algo_bit shpchp
[    8.143615] CPU: 0 PID: 2402 Comm: kworker/0:3 Not tainted
4.14.48-mc8-easy #1
[    8.143679] Workqueue: events .work_for_cpu_fn
[    8.143728] task: c0000003e7fd8000 task.stack: c0000003e7fe0000
[    8.143783] NIP:  c008000005f063c8 LR: c008000005f06388 CTR:
c00000000027efd0
[    8.143869] REGS: c0000003e7fe3430 TRAP: 0300   Not tainted
(4.14.48-mc8-easy)
[    8.143950] MSR:  9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:
28002444  XER: 20040000
[    8.144040] CFAR: c008000005f063d0 DAR: c00800000615b000 DSISR:
40000000 SOFTE: 1
[    8.144787] NIP [c008000005f063c8] .gfx_v8_0_sw_init+0x5a8/0x15b0
[amdgpu]
[    8.144911] LR [c008000005f06388] .gfx_v8_0_sw_init+0x568/0x15b0 [amdgpu]
[    8.144976] Call Trace:
[    8.145049] [c0000003e7fe36b0] [c008000005f06388]
.gfx_v8_0_sw_init+0x568/0x15b0 [amdgpu] (unreliable)
[    8.145194] [c0000003e7fe37c0] [c008000005e484c4]
.amdgpu_device_init+0xf34/0x1750 [amdgpu]
[    8.145302] [c0000003e7fe38f0] [c008000005e4a94c]
.amdgpu_driver_load_kms+0x9c/0x2a0 [amdgpu]
[    8.145420] [c0000003e7fe3980] [c008000004f4d200]
.drm_dev_register+0x1c0/0x250 [drm]
[    8.145548] [c0000003e7fe3a30] [c008000005e416c4]
.amdgpu_pci_probe+0x164/0x1a0 [amdgpu]
[    8.145601] [c0000003e7fe3ac0] [c000000000618ed0]
.local_pci_probe+0x60/0x130
[    8.145683] [c0000003e7fe3b60] [c0000000000e8780]
.work_for_cpu_fn+0x30/0x50
[    8.145774] [c0000003e7fe3be0] [c0000000000ecea8]
.process_one_work+0x2a8/0x550
[    8.145854] [c0000003e7fe3c80] [c0000000000ed430]
.worker_thread+0x2e0/0x600
[    8.145924] [c0000003e7fe3d70] [c0000000000f5338] .kthread+0x158/0x1a0
[    8.145973] [c0000003e7fe3e30] [c00000000000bd4c]
.ret_from_kernel_thread+0x58/0x8c
[    8.146054] Instruction dump:
[    8.146089] 38db004c 39200000 7d00342c 7d08da14 815b0048 554af0be
7f8a4840 409d004c 792a1764 e8ff46f0 39290001 79290020 <7cc8542c>
7cc7512e 4bffffd8 60000000
[    8.146252] ---[ end trace 8b0ede048bbb20ae ]---

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org
-------------- next part --------------
From b92303fbefdb34d3c55ae223ca9b02e60625aae5 Mon Sep 17 00:00:00 2001
From: "A. Wilcox" <AWilcox at Wilcox-Tech.com>
Date: Sun, 1 Jul 2018 22:44:52 -0500
Subject: [PATCH] drm/amdgpu: use processed values for counting

adev->gfx.rlc has the values from rlc_hdr already processed by
le32_to_cpu.  Using the rlc_hdr values on big-endian machines causes
a kernel Oops due to writing well outside of the array (0x24000000
instead of 0x24).

Signed-off-by: A. Wilcox <AWilcox at Wilcox-Tech.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 818874b13c99..112dfbf7a1fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1094,14 +1094,14 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device *adev)
 
 	tmp = (unsigned int *)((uintptr_t)rlc_hdr +
 			le32_to_cpu(rlc_hdr->reg_list_format_array_offset_bytes));
-	for (i = 0 ; i < (rlc_hdr->reg_list_format_size_bytes >> 2); i++)
+	for (i = 0 ; i < (adev->gfx.rlc.reg_list_format_size_bytes >> 2); i++)
 		adev->gfx.rlc.register_list_format[i] =	le32_to_cpu(tmp[i]);
 
 	adev->gfx.rlc.register_restore = adev->gfx.rlc.register_list_format + i;
 
 	tmp = (unsigned int *)((uintptr_t)rlc_hdr +
 			le32_to_cpu(rlc_hdr->reg_list_array_offset_bytes));
-	for (i = 0 ; i < (rlc_hdr->reg_list_size_bytes >> 2); i++)
+	for (i = 0 ; i < (adev->gfx.rlc.reg_list_size_bytes >> 2); i++)
 		adev->gfx.rlc.register_restore[i] = le32_to_cpu(tmp[i]);
 
 	if (adev->asic_type >= CHIP_POLARIS10 && adev->asic_type <= CHIP_POLARIS12) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index a69153435ea7..d5e1f80adbc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -577,14 +577,14 @@ static int gfx_v9_0_init_microcode(struct amdgpu_device *adev)
 
 	tmp = (unsigned int *)((uintptr_t)rlc_hdr +
 			le32_to_cpu(rlc_hdr->reg_list_format_array_offset_bytes));
-	for (i = 0 ; i < (rlc_hdr->reg_list_format_size_bytes >> 2); i++)
+	for (i = 0 ; i < (adev->gfx.rlc.reg_list_format_size_bytes >> 2); i++)
 		adev->gfx.rlc.register_list_format[i] =	le32_to_cpu(tmp[i]);
 
 	adev->gfx.rlc.register_restore = adev->gfx.rlc.register_list_format + i;
 
 	tmp = (unsigned int *)((uintptr_t)rlc_hdr +
 			le32_to_cpu(rlc_hdr->reg_list_array_offset_bytes));
-	for (i = 0 ; i < (rlc_hdr->reg_list_size_bytes >> 2); i++)
+	for (i = 0 ; i < (adev->gfx.rlc.reg_list_size_bytes >> 2); i++)
 		adev->gfx.rlc.register_restore[i] = le32_to_cpu(tmp[i]);
 
 	if (adev->gfx.rlc.is_rlc_v2_1)
-- 
2.17.1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180916/ddbeb174/attachment-0001.sig>