[Bug 204181] NULL pointer dereference regression in amdgpu

Fri Sep 27 03:50:30 UTC 2019

https://bugzilla.kernel.org/show_bug.cgi?id=204181

--- Comment #53 from Sergey Kondakov (virtuousfox at gmail.com) ---
Created attachment 285209
  --> https://bugzilla.kernel.org/attachment.cgi?id=285209&action=edit
dmesg_2019-09-26-amdgpu-old_dereference_on_patched_5.3.1

After about a day of uptime my patched 5.3.1 hanged during hours-long Youtube
video with dereference that is almost identical to the original one:
BUG: unable to handle page fault for address: 00000008000001b4
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 396 Comm: kworker/u16:2 Tainted: G        W IO     
5.3.1-1482.g27a0123-HSF #1 openSUSE Tumbleweed
Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS
F14e 09/09/2014
Workqueue: events_unbound commit_work
RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2ee/0xfd0 [amdgpu]
…
Call Trace:
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? _raw_spin_unlock_irq+0x29/0x50
 ? trace_hardirqs_on+0x2c/0xf0
 ? _raw_spin_unlock_irq+0x3a/0x50
 ? finish_task_switch+0xa3/0x2e0
 ? finish_task_switch+0x75/0x2e0
 ? __switch_to+0x152/0x4e0
 ? __switch_to_asm+0x34/0x70
 ? __schedule+0x353/0x900
 ? wait_for_completion_timeout+0x31/0x110
 ? _raw_spin_unlock_irq+0x29/0x50
 ? preempt_count_sub+0x9b/0xd0
 ? _raw_spin_unlock_irq+0x3a/0x50
 ? wait_for_completion_timeout+0xe9/0x110
 ? commit_tail+0x3c/0x70
 commit_tail+0x3c/0x70
 process_one_work+0x271/0x5b0
 worker_thread+0x4a/0x3d0
 ? process_one_work+0x5b0/0x5b0
 kthread+0x118/0x140
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x27/0x50
…
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or
flip_done timed out

Could this be due to these additional patches ?
https://patchwork.freedesktop.org/series/64614/
https://patchwork.freedesktop.org/series/65192/

Or the fact that I patched kwin-5.16.5 with https://phabricator.kde.org/T11071
and added KWIN_USE_INTEL_SWAP_EVENT=1 & KWIN_USE_BUFFER_AGE=3, so it works with
tighter timings now ?

Or any of these ?
options amdgpu cik_support=1 si_support=1 msi=1 disp_priority=2 dpm=1 runpm=1
sched_policy=1 compute_multipipe=1 vm_fragment_size=9 gartsize=1024
max_num_of_queues_per_device=65536 sched_hw_submission=32 sched_jobs=1024
job_hang_limit=8000 halt_if_hws_hang=1 vm_fault_stop=0 vm_update_mode=0
deep_color=1 gpu_recovery=1 lockup_timeout=2500,5000,8000,1000 ras_enable=1
mcbp=1 queue_preemption_timeout_ms=48 mes=1 hws_gws_support=1 discovery=1

-- 
You are receiving this mail because:
You are watching the assignee of the bug.