[Intel-xe] [PATCH 0/2] drm/xe: Revert coalescing of GGTT invalidations

Matt Roper matthew.d.roper at intel.com
Thu Apr 6 21:36:25 UTC 2023


On Thu, Apr 06, 2023 at 01:48:39PM -0700, Niranjana Vishwanathapura wrote:
> Causing a bunch of hangs driver load and in user space.
> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/232
> 

Tested-by: Matt Roper <matthew.d.roper at intel.com>  # ADL-P

Previously ADL-P was failing driver load about 50% of the time with the
signature in the gitlab issue; with this series, I can load 30-40 times
in a row without issue.

If I load/unload in a loop enough times (usually 40-50 times), I do
encounter a different problem, but I suspect this is a distinct issue
that's just being uncovered now; it probably isn't caused by these
patches:

[Thu Apr  6 21:32:02 2023] xe 0000:00:02.0: [drm:xe_guc_ct_enable [xe]] GuC CT communication channel enabled                              [5/46010]
[Thu Apr  6 21:32:02 2023] BUG: kernel NULL pointer dereference, address: 0000000000000000
[Thu Apr  6 21:32:02 2023] #PF: supervisor read access in kernel mode                                                                              
[Thu Apr  6 21:32:02 2023] #PF: error_code(0x0000) - not-present page                                                                              
[Thu Apr  6 21:32:02 2023] PGD 0 P4D 0                                                                                                             
[Thu Apr  6 21:32:02 2023] Oops: 0000 [#1] PREEMPT SMP NOPTI                                                                                       
[Thu Apr  6 21:32:02 2023] CPU: 9 PID: 10370 Comm: kworker/u32:2 Tainted: G        W          6.3.0-rc4-CI_DRM_12746-g6ce36b596fa7+ #499           
[Thu Apr  6 21:32:02 2023] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P LP4x RVP, BIOS ADLPFWI1.R00.3323.A00.2208030835 
08/03/2022
[Thu Apr  6 21:32:02 2023] Workqueue: events_unbound g2h_worker_func [xe] 
[Thu Apr  6 21:32:02 2023] RIP: 0010:__wake_up_common+0x5b/0x1b0
[Thu Apr  6 21:32:02 2023] Code: 85 0a 01 00 00 4d 85 e4 74 0b 41 f6 04 24 04 0f 85 a3 00 00 00 48 8b 43 40 4c 8d 40 e8 48 83 c3 40 49 8d 40 18 48 
39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
[Thu Apr  6 21:32:02 2023] RSP: 0018:ffffc900018dfd20 EFLAGS: 00010086
[Thu Apr  6 21:32:02 2023] RAX: 0000000000000000 RBX: ffffc9000182fad8 RCX: 0000000000000000
[Thu Apr  6 21:32:02 2023] RDX: 00000000ffffffff RSI: ffffffff823cc0f8 RDI: ffffffff823ec02c
[Thu Apr  6 21:32:02 2023] RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc900018dfd78
[Thu Apr  6 21:32:02 2023] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc900018dfd78
[Thu Apr  6 21:32:02 2023] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Thu Apr  6 21:32:02 2023] FS:  0000000000000000(0000) GS:ffff88849f880000(0000) knlGS:0000000000000000
[Thu Apr  6 21:32:02 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Apr  6 21:32:02 2023] CR2: 0000000000000000 CR3: 0000000006640002 CR4: 0000000000f70ee0
[Thu Apr  6 21:32:02 2023] PKRU: 55555554
[Thu Apr  6 21:32:02 2023] Call Trace:
[Thu Apr  6 21:32:02 2023]  <TASK>
[Thu Apr  6 21:32:02 2023]  __wake_up_common_lock+0x81/0xd0
[Thu Apr  6 21:32:02 2023]  dequeue_one_g2h+0x15d/0x460 [xe]
[Thu Apr  6 21:32:02 2023]  g2h_worker_func+0x5e/0xe0 [xe]
[Thu Apr  6 21:32:02 2023]  process_one_work+0x287/0x520
[Thu Apr  6 21:32:02 2023]  worker_thread+0x53/0x3a0
[Thu Apr  6 21:32:02 2023]  ? __pfx_worker_thread+0x10/0x10
[Thu Apr  6 21:32:02 2023]  kthread+0xf6/0x120
[Thu Apr  6 21:32:02 2023]  ? __pfx_kthread+0x10/0x10
[Thu Apr  6 21:32:02 2023]  ret_from_fork+0x29/0x50
[Thu Apr  6 21:32:02 2023]  </TASK>
[Thu Apr  6 21:32:02 2023] Modules linked in: xe(+) drm_ttm_helper gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy video ttm drm_display_helpe
r drm_kms_helper syscopyarea sysfillrect sysimgblt fuse x86_pkg_temp_thermal coretemp kvm_intel mei_pxp mei_hdcp kvm irqbypass wmi_bmof mei_me mei 
e1000e crct10dif_pclmul crc32_pclmul ptp ghash_clmulni_intel i2c_i801 i2c_smbus pps_core intel_lpss_pci wmi [last unloaded: ttm]
[Thu Apr  6 21:32:02 2023] CR2: 0000000000000000
[Thu Apr  6 21:32:02 2023] ---[ end trace 0000000000000000 ]---


Matt

> Benefit is rather small too, so revert it to stablize the stack.
> 
> Reverts below changes
> drm/xe: Pad GGTT mapping with an extra page pointing to scratch
> drm/xe: Coalesce GGTT invalidations
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
> 
> Niranjana Vishwanathapura (2):
>   Revert "drm/xe: Pad GGTT mapping with an extra page pointing to
>     scratch"
>   Revert "drm/xe: Coalesce GGTT invalidations"
> 
>  drivers/gpu/drm/xe/xe_bo.c         |  1 -
>  drivers/gpu/drm/xe/xe_bo.h         |  1 +
>  drivers/gpu/drm/xe/xe_bo_types.h   |  4 +---
>  drivers/gpu/drm/xe/xe_ggtt.c       | 35 +++++++-----------------------
>  drivers/gpu/drm/xe/xe_ggtt_types.h |  2 --
>  5 files changed, 10 insertions(+), 33 deletions(-)
> 
> -- 
> 2.21.0.rc0.32.g243a4c7e27
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list