[Nouveau] [PATCH v2] Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64"

Alexandre Courbot gnurou at gmail.com
Tue Jun 7 03:40:03 UTC 2016


On Mon, Jun 6, 2016 at 6:25 PM, Robin Murphy <robin.murphy at arm.com> wrote:
> On 06/06/16 08:11, Alexandre Courbot wrote:
>>
>> From: Robin Murphy <robin.murphy at arm.com>
>>
>> This reverts commit 1733a2ad36741b1812cf8b3f3037c28d0af53f50.
>>
>> There is apparently something amiss with the way the TTM code handles
>> DMA buffers, which the above commit was attempting to work around for
>> arm64 systems with non-coherent PCI. Unfortunately, this completely
>> breaks systems *with* coherent PCI (which appear to be the majority).
>>
>> Booting a plain arm64 defconfig + CONFIG_DRM + CONFIG_DRM_NOUVEAU on
>> a machine with a PCI GPU having coherent dma_map_ops (in this case a
>> 7600GT card plugged into an ARM Juno board) results in a fatal crash:
>>
>> [    2.803438] nouveau 0000:06:00.0: DRM: allocated 1024x768 fb: 0x9000,
>> bo ffffffc976141c00
>> [    2.897662] Unable to handle kernel NULL pointer dereference at virtual
>> address 000001ac
>> [    2.897666] pgd = ffffff8008e00000
>> [    2.897675] [000001ac] *pgd=00000009ffffe003, *pud=00000009ffffe003,
>> *pmd=0000000000000000
>> [    2.897680] Internal error: Oops: 96000045 [#1] PREEMPT SMP
>> [    2.897685] Modules linked in:
>> [    2.897692] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc5+ #543
>> [    2.897694] Hardware name: ARM Juno development board (r1) (DT)
>> [    2.897699] task: ffffffc9768a0000 ti: ffffffc9768a8000 task.ti:
>> ffffffc9768a8000
>> [    2.897711] PC is at __memcpy+0x7c/0x180
>> [    2.897719] LR is at OUT_RINGp+0x34/0x70
>> [    2.897724] pc : [<ffffff80083465fc>] lr : [<ffffff800854248c>] pstate:
>> 80000045
>> [    2.897726] sp : ffffffc9768ab360
>> [    2.897732] x29: ffffffc9768ab360 x28: 0000000000000001
>> [    2.897738] x27: ffffffc97624c000 x26: 0000000000000000
>> [    2.897744] x25: 0000000000000080 x24: 0000000000006c00
>> [    2.897749] x23: 0000000000000005 x22: ffffffc97624c010
>> [    2.897755] x21: 0000000000000004 x20: 0000000000000004
>> [    2.897761] x19: ffffffc9763da000 x18: ffffffc976b2491c
>> [    2.897766] x17: 0000000000000007 x16: 0000000000000006
>> [    2.897771] x15: 0000000000000001 x14: 0000000000000001
>> [    2.897777] x13: 0000000000e31b70 x12: ffffffc9768a0080
>> [    2.897783] x11: 0000000000000000 x10: fffffffffffffb00
>> [    2.897788] x9 : 0000000000000000 x8 : 0000000000000000
>> [    2.897793] x7 : 0000000000000000 x6 : 00000000000001ac
>> [    2.897799] x5 : 00000000ffffffff x4 : 0000000000000000
>> [    2.897804] x3 : 0000000000000010 x2 : 0000000000000010
>> [    2.897810] x1 : ffffffc97624c010 x0 : 00000000000001ac
>> ...
>> [    2.898494] Call trace:
>> [    2.898499] Exception stack(0xffffffc9768ab1a0 to 0xffffffc9768ab2c0)
>> [    2.898506] b1a0: ffffffc9763da000 0000000000000004 ffffffc9768ab360
>> ffffff80083465fc
>> [    2.898513] b1c0: ffffffc976801e00 ffffffc9762b8000 ffffffc9768ab1f0
>> ffffff80080ec158
>> [    2.898520] b1e0: ffffffc9768ab230 ffffff8008496d04 ffffffc975ce6d80
>> ffffffc9768ab36e
>> [    2.898527] b200: ffffffc9768ab36f ffffffc9768ab29d ffffffc9768ab29e
>> ffffffc9768a0000
>> [    2.898533] b220: ffffffc9768ab250 ffffff80080e70c0 ffffffc9768ab270
>> ffffff8008496e44
>> [    2.898540] b240: 00000000000001ac ffffffc97624c010 0000000000000010
>> 0000000000000010
>> [    2.898546] b260: 0000000000000000 00000000ffffffff 00000000000001ac
>> 0000000000000000
>> [    2.898552] b280: 0000000000000000 0000000000000000 fffffffffffffb00
>> 0000000000000000
>> [    2.898558] b2a0: ffffffc9768a0080 0000000000e31b70 0000000000000001
>> 0000000000000001
>> [    2.898566] [<ffffff80083465fc>] __memcpy+0x7c/0x180
>> [    2.898574] [<ffffff800853e164>] nv04_fbcon_imageblit+0x1d4/0x2e8
>> [    2.898582] [<ffffff800853d6d0>] nouveau_fbcon_imageblit+0xd8/0xe0
>> [    2.898591] [<ffffff80083c4db4>] soft_cursor+0x154/0x1d8
>> [    2.898598] [<ffffff80083c47b4>] bit_cursor+0x4fc/0x538
>> [    2.898605] [<ffffff80083c0cfc>] fbcon_cursor+0x134/0x1a8
>> [    2.898613] [<ffffff800841c280>] hide_cursor+0x38/0xa0
>> [    2.898620] [<ffffff800841d420>] redraw_screen+0x120/0x228
>> [    2.898628] [<ffffff80083bf268>] fbcon_prepare_logo+0x370/0x3f8
>> [    2.898635] [<ffffff80083bf640>] fbcon_init+0x350/0x560
>> [    2.898641] [<ffffff800841c634>] visual_init+0xac/0x108
>> [    2.898648] [<ffffff800841df14>] do_bind_con_driver+0x1c4/0x3a8
>> [    2.898655] [<ffffff800841e4f4>] do_take_over_console+0x174/0x1e8
>> [    2.898662] [<ffffff80083bf8c4>] do_fbcon_takeover+0x74/0x100
>> [    2.898669] [<ffffff80083c3e44>] fbcon_event_notify+0x8cc/0x920
>> [    2.898680] [<ffffff80080d7e38>] notifier_call_chain+0x50/0x90
>> [    2.898685] [<ffffff80080d8214>]
>> __blocking_notifier_call_chain+0x4c/0x90
>> [    2.898691] [<ffffff80080d826c>] blocking_notifier_call_chain+0x14/0x20
>> [    2.898696] [<ffffff80083c5e1c>] fb_notifier_call_chain+0x1c/0x28
>> [    2.898703] [<ffffff80083c81ac>] register_framebuffer+0x1cc/0x2e0
>> [    2.898712] [<ffffff800845da80>]
>> drm_fb_helper_initial_config+0x288/0x3e8
>> [    2.898719] [<ffffff800853da20>] nouveau_fbcon_init+0xe0/0x118
>> [    2.898727] [<ffffff800852d2f8>] nouveau_drm_load+0x268/0x890
>> [    2.898734] [<ffffff8008466e24>] drm_dev_register+0xbc/0xc8
>> [    2.898740] [<ffffff8008468a88>] drm_get_pci_dev+0xa0/0x180
>> [    2.898747] [<ffffff800852cb28>] nouveau_drm_probe+0x1a0/0x1e0
>> [    2.898755] [<ffffff80083a32e0>] pci_device_probe+0x98/0x110
>> [    2.898763] [<ffffff800858e434>] driver_probe_device+0x204/0x2b0
>> [    2.898770] [<ffffff800858e58c>] __driver_attach+0xac/0xb0
>> [    2.898777] [<ffffff800858c3e0>] bus_for_each_dev+0x60/0xa0
>> [    2.898783] [<ffffff800858dbc0>] driver_attach+0x20/0x28
>> [    2.898789] [<ffffff800858d7b0>] bus_add_driver+0x1d0/0x238
>> [    2.898796] [<ffffff800858ed50>] driver_register+0x60/0xf8
>> [    2.898802] [<ffffff80083a20dc>] __pci_register_driver+0x3c/0x48
>> [    2.898809] [<ffffff8008468eb4>] drm_pci_init+0xf4/0x120
>> [    2.898818] [<ffffff8008c56fc0>] nouveau_drm_init+0x21c/0x230
>> [    2.898825] [<ffffff80080829d4>] do_one_initcall+0x8c/0x190
>> [    2.898832] [<ffffff8008c31af4>] kernel_init_freeable+0x14c/0x1f0
>> [    2.898839] [<ffffff80088a0c20>] kernel_init+0x10/0x100
>> [    2.898845] [<ffffff8008085e10>] ret_from_fork+0x10/0x40
>> [    2.898853] Code: a88120c7 a8c12027 a88120c7 a8c12027 (a88120c7)
>> [    2.898871] ---[ end trace d5713dcad023ee04 ]---
>> [    2.898888] Kernel panic - not syncing: Attempted to kill init!
>> exitcode=0x0000000b
>>
>> In a toss-up between the GPU seeing stale data artefacts on some systems
>> vs. catastrophic kernel crashes on other systems, the latter would seem
>> to take precedence, so revert this change until the real underlying
>> problem can be fixed.
>>
>> Signed-off-by: Robin Murphy <robin.murphy at arm.com>
>> Acked-by: Alexandre Courbot <acourbot at nvidia.com>
>> [acourbot at nvidia.com: port to Nouveau tree, remove bits in lib/]
>> Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
>> ---
>> Hi Ben,
>>
>> I have ported this patch to your tree - could you take it for 4.7? We
>> definitely want
>> to avoid these crashes. I am working on a final solution for this that
>> will allow us
>> to remove that cpu_coherent flag altogether.
>
>
> Cheers Alex! Should this also go to stable for 4.6?

That would be good, yes.


More information about the Nouveau mailing list