[Nouveau] [PATCH v2] Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64"

Mon Jun 6 07:11:52 UTC 2016

From: Robin Murphy <robin.murphy at arm.com>

This reverts commit 1733a2ad36741b1812cf8b3f3037c28d0af53f50.

There is apparently something amiss with the way the TTM code handles
DMA buffers, which the above commit was attempting to work around for
arm64 systems with non-coherent PCI. Unfortunately, this completely
breaks systems *with* coherent PCI (which appear to be the majority).

Booting a plain arm64 defconfig + CONFIG_DRM + CONFIG_DRM_NOUVEAU on
a machine with a PCI GPU having coherent dma_map_ops (in this case a
7600GT card plugged into an ARM Juno board) results in a fatal crash:

[    2.803438] nouveau 0000:06:00.0: DRM: allocated 1024x768 fb: 0x9000, bo ffffffc976141c00
[    2.897662] Unable to handle kernel NULL pointer dereference at virtual address 000001ac
[    2.897666] pgd = ffffff8008e00000
[    2.897675] [000001ac] *pgd=00000009ffffe003, *pud=00000009ffffe003, *pmd=0000000000000000
[    2.897680] Internal error: Oops: 96000045 [#1] PREEMPT SMP
[    2.897685] Modules linked in:
[    2.897692] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc5+ #543
[    2.897694] Hardware name: ARM Juno development board (r1) (DT)
[    2.897699] task: ffffffc9768a0000 ti: ffffffc9768a8000 task.ti: ffffffc9768a8000
[    2.897711] PC is at __memcpy+0x7c/0x180
[    2.897719] LR is at OUT_RINGp+0x34/0x70
[    2.897724] pc : [<ffffff80083465fc>] lr : [<ffffff800854248c>] pstate: 80000045
[    2.897726] sp : ffffffc9768ab360
[    2.897732] x29: ffffffc9768ab360 x28: 0000000000000001
[    2.897738] x27: ffffffc97624c000 x26: 0000000000000000
[    2.897744] x25: 0000000000000080 x24: 0000000000006c00
[    2.897749] x23: 0000000000000005 x22: ffffffc97624c010
[    2.897755] x21: 0000000000000004 x20: 0000000000000004
[    2.897761] x19: ffffffc9763da000 x18: ffffffc976b2491c
[    2.897766] x17: 0000000000000007 x16: 0000000000000006
[    2.897771] x15: 0000000000000001 x14: 0000000000000001
[    2.897777] x13: 0000000000e31b70 x12: ffffffc9768a0080
[    2.897783] x11: 0000000000000000 x10: fffffffffffffb00
[    2.897788] x9 : 0000000000000000 x8 : 0000000000000000
[    2.897793] x7 : 0000000000000000 x6 : 00000000000001ac
[    2.897799] x5 : 00000000ffffffff x4 : 0000000000000000
[    2.897804] x3 : 0000000000000010 x2 : 0000000000000010
[    2.897810] x1 : ffffffc97624c010 x0 : 00000000000001ac
...
[    2.898494] Call trace:
[    2.898499] Exception stack(0xffffffc9768ab1a0 to 0xffffffc9768ab2c0)
[    2.898506] b1a0: ffffffc9763da000 0000000000000004 ffffffc9768ab360 ffffff80083465fc
[    2.898513] b1c0: ffffffc976801e00 ffffffc9762b8000 ffffffc9768ab1f0 ffffff80080ec158
[    2.898520] b1e0: ffffffc9768ab230 ffffff8008496d04 ffffffc975ce6d80 ffffffc9768ab36e
[    2.898527] b200: ffffffc9768ab36f ffffffc9768ab29d ffffffc9768ab29e ffffffc9768a0000
[    2.898533] b220: ffffffc9768ab250 ffffff80080e70c0 ffffffc9768ab270 ffffff8008496e44
[    2.898540] b240: 00000000000001ac ffffffc97624c010 0000000000000010 0000000000000010
[    2.898546] b260: 0000000000000000 00000000ffffffff 00000000000001ac 0000000000000000
[    2.898552] b280: 0000000000000000 0000000000000000 fffffffffffffb00 0000000000000000
[    2.898558] b2a0: ffffffc9768a0080 0000000000e31b70 0000000000000001 0000000000000001
[    2.898566] [<ffffff80083465fc>] __memcpy+0x7c/0x180
[    2.898574] [<ffffff800853e164>] nv04_fbcon_imageblit+0x1d4/0x2e8
[    2.898582] [<ffffff800853d6d0>] nouveau_fbcon_imageblit+0xd8/0xe0
[    2.898591] [<ffffff80083c4db4>] soft_cursor+0x154/0x1d8
[    2.898598] [<ffffff80083c47b4>] bit_cursor+0x4fc/0x538
[    2.898605] [<ffffff80083c0cfc>] fbcon_cursor+0x134/0x1a8
[    2.898613] [<ffffff800841c280>] hide_cursor+0x38/0xa0
[    2.898620] [<ffffff800841d420>] redraw_screen+0x120/0x228
[    2.898628] [<ffffff80083bf268>] fbcon_prepare_logo+0x370/0x3f8
[    2.898635] [<ffffff80083bf640>] fbcon_init+0x350/0x560
[    2.898641] [<ffffff800841c634>] visual_init+0xac/0x108
[    2.898648] [<ffffff800841df14>] do_bind_con_driver+0x1c4/0x3a8
[    2.898655] [<ffffff800841e4f4>] do_take_over_console+0x174/0x1e8
[    2.898662] [<ffffff80083bf8c4>] do_fbcon_takeover+0x74/0x100
[    2.898669] [<ffffff80083c3e44>] fbcon_event_notify+0x8cc/0x920
[    2.898680] [<ffffff80080d7e38>] notifier_call_chain+0x50/0x90
[    2.898685] [<ffffff80080d8214>] __blocking_notifier_call_chain+0x4c/0x90
[    2.898691] [<ffffff80080d826c>] blocking_notifier_call_chain+0x14/0x20
[    2.898696] [<ffffff80083c5e1c>] fb_notifier_call_chain+0x1c/0x28
[    2.898703] [<ffffff80083c81ac>] register_framebuffer+0x1cc/0x2e0
[    2.898712] [<ffffff800845da80>] drm_fb_helper_initial_config+0x288/0x3e8
[    2.898719] [<ffffff800853da20>] nouveau_fbcon_init+0xe0/0x118
[    2.898727] [<ffffff800852d2f8>] nouveau_drm_load+0x268/0x890
[    2.898734] [<ffffff8008466e24>] drm_dev_register+0xbc/0xc8
[    2.898740] [<ffffff8008468a88>] drm_get_pci_dev+0xa0/0x180
[    2.898747] [<ffffff800852cb28>] nouveau_drm_probe+0x1a0/0x1e0
[    2.898755] [<ffffff80083a32e0>] pci_device_probe+0x98/0x110
[    2.898763] [<ffffff800858e434>] driver_probe_device+0x204/0x2b0
[    2.898770] [<ffffff800858e58c>] __driver_attach+0xac/0xb0
[    2.898777] [<ffffff800858c3e0>] bus_for_each_dev+0x60/0xa0
[    2.898783] [<ffffff800858dbc0>] driver_attach+0x20/0x28
[    2.898789] [<ffffff800858d7b0>] bus_add_driver+0x1d0/0x238
[    2.898796] [<ffffff800858ed50>] driver_register+0x60/0xf8
[    2.898802] [<ffffff80083a20dc>] __pci_register_driver+0x3c/0x48
[    2.898809] [<ffffff8008468eb4>] drm_pci_init+0xf4/0x120
[    2.898818] [<ffffff8008c56fc0>] nouveau_drm_init+0x21c/0x230
[    2.898825] [<ffffff80080829d4>] do_one_initcall+0x8c/0x190
[    2.898832] [<ffffff8008c31af4>] kernel_init_freeable+0x14c/0x1f0
[    2.898839] [<ffffff80088a0c20>] kernel_init+0x10/0x100
[    2.898845] [<ffffff8008085e10>] ret_from_fork+0x10/0x40
[    2.898853] Code: a88120c7 a8c12027 a88120c7 a8c12027 (a88120c7)
[    2.898871] ---[ end trace d5713dcad023ee04 ]---
[    2.898888] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

In a toss-up between the GPU seeing stale data artefacts on some systems
vs. catastrophic kernel crashes on other systems, the latter would seem
to take precedence, so revert this change until the real underlying
problem can be fixed.

Signed-off-by: Robin Murphy <robin.murphy at arm.com>
Acked-by: Alexandre Courbot <acourbot at nvidia.com>
[acourbot at nvidia.com: port to Nouveau tree, remove bits in lib/]
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
Hi Ben,

I have ported this patch to your tree - could you take it for 4.7? We definitely want
to avoid these crashes. I am working on a final solution for this that will allow us
to remove that cpu_coherent flag altogether.

 drm/nouveau/nvkm/engine/device/pci.c | 2 +-
 lib/include/nvif/os.h                | 6 ------
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/drm/nouveau/nvkm/engine/device/pci.c b/drm/nouveau/nvkm/engine/device/pci.c
index 18fab3973ce5..62ad0300cfa5 100644
--- a/drm/nouveau/nvkm/engine/device/pci.c
+++ b/drm/nouveau/nvkm/engine/device/pci.c
@@ -1614,7 +1614,7 @@ nvkm_device_pci_func = {
 	.fini = nvkm_device_pci_fini,
 	.resource_addr = nvkm_device_pci_resource_addr,
 	.resource_size = nvkm_device_pci_resource_size,
-	.cpu_coherent = !IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_ARM64),
+	.cpu_coherent = !IS_ENABLED(CONFIG_ARM),
 };
 
 int
diff --git a/lib/include/nvif/os.h b/lib/include/nvif/os.h
index 831110904fee..1eda53aa8f45 100644
--- a/lib/include/nvif/os.h
+++ b/lib/include/nvif/os.h
@@ -130,12 +130,6 @@ typedef dma_addr_t resource_size_t;
 #define IS_ENABLED_CONFIG_ARM 0
 #endif
 
-#if defined(CONFIG_ARM64)
-#define IS_ENABLED_CONFIG_ARM64 1
-#else
-#define IS_ENABLED_CONFIG_ARM64 0
-#endif
-
 #if defined(CONFIG_IOMMU_API)
 #define IS_ENABLED_CONFIG_IOMMU_API 1
 #else
-- 
2.8.3