[Intel-xe] [PATCH 0/2] drm/xe: Revert coalescing of GGTT invalidations

Niranjana Vishwanathapura niranjana.vishwanathapura at intel.com
Fri Apr 7 03:32:23 UTC 2023


On Thu, Apr 06, 2023 at 02:36:25PM -0700, Matt Roper wrote:
>On Thu, Apr 06, 2023 at 01:48:39PM -0700, Niranjana Vishwanathapura wrote:
>> Causing a bunch of hangs driver load and in user space.
>> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/232
>>
>
>Tested-by: Matt Roper <matthew.d.roper at intel.com>  # ADL-P
>
>Previously ADL-P was failing driver load about 50% of the time with the
>signature in the gitlab issue; with this series, I can load 30-40 times
>in a row without issue.
>
>If I load/unload in a loop enough times (usually 40-50 times), I do
>encounter a different problem, but I suspect this is a distinct issue
>that's just being uncovered now; it probably isn't caused by these
>patches:
>

Thanks, applied.
Yah, this seems to be same as the other KASAN reported issue we are
looking at.

Niranjana

>[Thu Apr  6 21:32:02 2023] xe 0000:00:02.0: [drm:xe_guc_ct_enable [xe]] GuC CT communication channel enabled                              [5/46010]
>[Thu Apr  6 21:32:02 2023] BUG: kernel NULL pointer dereference, address: 0000000000000000
>[Thu Apr  6 21:32:02 2023] #PF: supervisor read access in kernel mode
>[Thu Apr  6 21:32:02 2023] #PF: error_code(0x0000) - not-present page
>[Thu Apr  6 21:32:02 2023] PGD 0 P4D 0
>[Thu Apr  6 21:32:02 2023] Oops: 0000 [#1] PREEMPT SMP NOPTI
>[Thu Apr  6 21:32:02 2023] CPU: 9 PID: 10370 Comm: kworker/u32:2 Tainted: G        W          6.3.0-rc4-CI_DRM_12746-g6ce36b596fa7+ #499
>[Thu Apr  6 21:32:02 2023] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P LP4x RVP, BIOS ADLPFWI1.R00.3323.A00.2208030835
>08/03/2022
>[Thu Apr  6 21:32:02 2023] Workqueue: events_unbound g2h_worker_func [xe]
>[Thu Apr  6 21:32:02 2023] RIP: 0010:__wake_up_common+0x5b/0x1b0
>[Thu Apr  6 21:32:02 2023] Code: 85 0a 01 00 00 4d 85 e4 74 0b 41 f6 04 24 04 0f 85 a3 00 00 00 48 8b 43 40 4c 8d 40 e8 48 83 c3 40 49 8d 40 18 48
>39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
>[Thu Apr  6 21:32:02 2023] RSP: 0018:ffffc900018dfd20 EFLAGS: 00010086
>[Thu Apr  6 21:32:02 2023] RAX: 0000000000000000 RBX: ffffc9000182fad8 RCX: 0000000000000000
>[Thu Apr  6 21:32:02 2023] RDX: 00000000ffffffff RSI: ffffffff823cc0f8 RDI: ffffffff823ec02c
>[Thu Apr  6 21:32:02 2023] RBP: 0000000000000246 R08: ffffffffffffffe8 R09: ffffc900018dfd78
>[Thu Apr  6 21:32:02 2023] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc900018dfd78
>[Thu Apr  6 21:32:02 2023] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>[Thu Apr  6 21:32:02 2023] FS:  0000000000000000(0000) GS:ffff88849f880000(0000) knlGS:0000000000000000
>[Thu Apr  6 21:32:02 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[Thu Apr  6 21:32:02 2023] CR2: 0000000000000000 CR3: 0000000006640002 CR4: 0000000000f70ee0
>[Thu Apr  6 21:32:02 2023] PKRU: 55555554
>[Thu Apr  6 21:32:02 2023] Call Trace:
>[Thu Apr  6 21:32:02 2023]  <TASK>
>[Thu Apr  6 21:32:02 2023]  __wake_up_common_lock+0x81/0xd0
>[Thu Apr  6 21:32:02 2023]  dequeue_one_g2h+0x15d/0x460 [xe]
>[Thu Apr  6 21:32:02 2023]  g2h_worker_func+0x5e/0xe0 [xe]
>[Thu Apr  6 21:32:02 2023]  process_one_work+0x287/0x520
>[Thu Apr  6 21:32:02 2023]  worker_thread+0x53/0x3a0
>[Thu Apr  6 21:32:02 2023]  ? __pfx_worker_thread+0x10/0x10
>[Thu Apr  6 21:32:02 2023]  kthread+0xf6/0x120
>[Thu Apr  6 21:32:02 2023]  ? __pfx_kthread+0x10/0x10
>[Thu Apr  6 21:32:02 2023]  ret_from_fork+0x29/0x50
>[Thu Apr  6 21:32:02 2023]  </TASK>
>[Thu Apr  6 21:32:02 2023] Modules linked in: xe(+) drm_ttm_helper gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy video ttm drm_display_helpe
>r drm_kms_helper syscopyarea sysfillrect sysimgblt fuse x86_pkg_temp_thermal coretemp kvm_intel mei_pxp mei_hdcp kvm irqbypass wmi_bmof mei_me mei
>e1000e crct10dif_pclmul crc32_pclmul ptp ghash_clmulni_intel i2c_i801 i2c_smbus pps_core intel_lpss_pci wmi [last unloaded: ttm]
>[Thu Apr  6 21:32:02 2023] CR2: 0000000000000000
>[Thu Apr  6 21:32:02 2023] ---[ end trace 0000000000000000 ]---
>
>
>Matt
>
>> Benefit is rather small too, so revert it to stablize the stack.
>>
>> Reverts below changes
>> drm/xe: Pad GGTT mapping with an extra page pointing to scratch
>> drm/xe: Coalesce GGTT invalidations
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
>>
>> Niranjana Vishwanathapura (2):
>>   Revert "drm/xe: Pad GGTT mapping with an extra page pointing to
>>     scratch"
>>   Revert "drm/xe: Coalesce GGTT invalidations"
>>
>>  drivers/gpu/drm/xe/xe_bo.c         |  1 -
>>  drivers/gpu/drm/xe/xe_bo.h         |  1 +
>>  drivers/gpu/drm/xe/xe_bo_types.h   |  4 +---
>>  drivers/gpu/drm/xe/xe_ggtt.c       | 35 +++++++-----------------------
>>  drivers/gpu/drm/xe/xe_ggtt_types.h |  2 --
>>  5 files changed, 10 insertions(+), 33 deletions(-)
>>
>> --
>> 2.21.0.rc0.32.g243a4c7e27
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation


More information about the Intel-xe mailing list