[RFC 00/34] Kill mem_access v2
Rodrigo Vivi
rodrigo.vivi at intel.com
Fri Jan 26 20:30:09 UTC 2024
Hi all,
First of all, thank you so much for the good feedback and ideas on the
v1 of this RFC series:
v1: lore.kernel.org/all/20231228021232.2366249-1-rodrigo.vivi at intel.com
First of all, this v2 has a more organized/split patches. So I'd like
to ask you to start the reviews of the simple ones already, so I can
try to split the series and merge a little by little.
I have 2 pending issues on this series that I couldn't solve yet.
Matt Brost is already helping me on these, but any help is welcomed.
But as I told, I'd like to start the review of the simplest ones first
anyway, please!
Details of the current issues:
1. Underflow on a gpu-hang test coming from d3cold. The culprit is the
pm_runtime_{get,put} around g2h_outstanding.
[ 476.450482] [IGT] xe_exec_threads: starting subtest threads-hang-basic
[ 476.507039] xe 0000:03:00.0: [drm] Engine reset: guc_id=78
[snip]
[ 476.518814] xe 0000:03:00.0: [drm] Timedout job: seqno=4294967169, guc_id=78, flags=0x8
[ 476.524595] xe 0000:03:00.0: [drm] Engine reset: guc_id=78
[ 476.532615] xe 0000:03:00.0: [drm] Xe device coredump has been created[ 476.801805] [IGT] xe_exec_threads: finished subtest threads-hang-basic, SUCCESS
[ 476.808391] xe 0000:03:00.0: Runtime PM usage count underflow!
[ 476.813455] [IGT] xe_exec_threads: exiting, ret=0
[ 476.819146] xe 0000:03:00.0: Runtime PM usage count underflow!
[ 476.829816] xe 0000:03:00.0: Runtime PM usage count underflow!
[and on, and on]
2. Failing rmmod due to an invalidation that happens at xe_pci_removal
(a case that fails only coming from D3cold and with display_enabled, but
on idle/blank-screen)
[ 326.857464] xe 0000:03:00.0: [drm] GT0: resumed
[ 327.135455] show_signal_msg: 126 callbacks suppressed
[ 327.135467] gnome-shell[2488]: segfault at 0 ip 00007fad50e315cc sp 00007ffd6f04a360 error 4 in libmutter-clutter-11.so.0.0.0[7fad50dbd000+97000] likely on CPU 15 (core 28, socket 0)
[ 327.157020] Code: e9 6f ff ff ff 66 0f 1f 84 00 00 00 00 00 48 8b 05 49 12 07 00 48 85 c0 74 4c 48 8b 38 e8 fc 49 f9 ff 48 89 c5 e8 14 75 f9 ff <48> 8b 55 00 48 8b 92 d0 00 00 00 48 85 d2 74 07 89 c6 48 89 ef ff
[ 328.160905] xe 0000:03:00.0: [drm] Xe device coredump has been deleted.
[ 329.099696] pci 0000:03:00.0: [drm] *ERROR* GT0: TLB invalidation time'd out, seqno=1668, recv=1667
[sip]
[ 329.312763] ------------[ cut here ]------------
[ 329.317417] pci 0000:03:00.0: [drm] Assertion `ct->g2h_outstanding == 0 || state == XE_GUC_CT_STATE_STOPPED` failed!
platform: 7 subplatform: 4
graphics: Xe_HPG 12.55 step C0
media: Xe_HPM 12.55 step C0
tile: 0 VRAM 8.00 GiB
GT: 0 type 1
Thanks in advance,
Rodrigo.
Rodrigo Vivi (34):
Revert "drm/xe/uc: Store firmware binary in system-memory backed BO"
drm/xe: Document Xe PM component
drm/xe: Fix display runtime_pm handling
drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
drm/xe: Convert xe_pm_runtime_{get,put} to void and protect from
recursion
drm/xe: Prepare display for D3Cold
drm/xe: Convert mem_access assertion towards the runtime_pm state
drm/xe: Runtime PM wake on every IOCTL
drm/xe: Convert kunit tests from mem_access to xe_pm_runtime
drm/xe: Convert scheduler towards direct pm_runtime
drm/xe: Runtime PM wake on every sysfs call
drm/xe: Ensure device is awake before removing it
drm/xe: Remove mem_access from guc_pc calls
drm/xe: Runtime PM wake on every debugfs call
drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
drm/xe: Removing extra mem_access protection from runtime pm
drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
drm/xe: Move lockdep protection from mem_access to xe_pm_runtime
drm/xe: Remove pm_runtime lockdep
drm/xe: Stop checking for power_lost on D3Cold
drm/xe: Convert GuC CT paths from mem_access to xe_pm_runtime
drm/xe: Keep D0 for the entire duration of a LR VM
drm/xe: Ensure D0 on TLB invalidation
drm/xe: Remove useless mem_access protection for query ioctls
drm/xe: Convert gsc_work from mem_access to xe_pm_runtime
drm/xe: VMs don't need the mem_access protection anymore
drm/xe: Remove useless mem_access during probe
drm/xe: Remove mem_access from suspend and resume functions
drm/xe: Convert gt_reset from mem_access to xe_pm_runtime
drm/xe: Remove useless mem_access on PAT dumps
drm/xe: Remove inner mem_access protections
drm/xe: Kill xe_device_mem_access_{get*,put}
drm/xe: Remove unused runtime pm helper
drm/xe: Enable D3Cold on 'low' VRAM utilization
.../gpu/drm/xe/compat-i915-headers/i915_drv.h | 8 +-
drivers/gpu/drm/xe/display/xe_fb_pin.c | 7 +-
drivers/gpu/drm/xe/tests/xe_bo.c | 8 +-
drivers/gpu/drm/xe/tests/xe_migrate.c | 7 +-
drivers/gpu/drm/xe/tests/xe_mocs.c | 14 +-
drivers/gpu/drm/xe/xe_bo.c | 10 +-
drivers/gpu/drm/xe/xe_debugfs.c | 13 +-
drivers/gpu/drm/xe/xe_device.c | 129 ++++------
drivers/gpu/drm/xe/xe_device.h | 9 -
drivers/gpu/drm/xe/xe_device_sysfs.c | 4 +
drivers/gpu/drm/xe/xe_device_types.h | 6 -
drivers/gpu/drm/xe/xe_display.c | 22 ++
drivers/gpu/drm/xe/xe_display.h | 2 +
drivers/gpu/drm/xe/xe_dma_buf.c | 5 +-
drivers/gpu/drm/xe/xe_exec_queue.c | 19 --
drivers/gpu/drm/xe/xe_ggtt.c | 6 -
drivers/gpu/drm/xe/xe_gpu_scheduler.c | 8 +-
drivers/gpu/drm/xe/xe_gpu_scheduler.h | 3 +-
drivers/gpu/drm/xe/xe_gpu_scheduler_types.h | 2 +
drivers/gpu/drm/xe/xe_gsc.c | 5 +-
drivers/gpu/drm/xe/xe_gt.c | 21 +-
drivers/gpu/drm/xe/xe_gt_debugfs.c | 53 ++++-
drivers/gpu/drm/xe/xe_gt_freq.c | 38 ++-
drivers/gpu/drm/xe/xe_gt_idle.c | 23 +-
drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c | 3 +
drivers/gpu/drm/xe/xe_guc_ct.c | 79 +++----
drivers/gpu/drm/xe/xe_guc_ct_types.h | 2 +
drivers/gpu/drm/xe/xe_guc_debugfs.c | 9 +-
drivers/gpu/drm/xe/xe_guc_pc.c | 62 +----
drivers/gpu/drm/xe/xe_guc_submit.c | 2 +-
drivers/gpu/drm/xe/xe_huc_debugfs.c | 5 +-
drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c | 58 ++++-
drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h | 7 +
drivers/gpu/drm/xe/xe_hwmon.c | 25 +-
drivers/gpu/drm/xe/xe_pat.c | 10 -
drivers/gpu/drm/xe/xe_pci.c | 2 +-
drivers/gpu/drm/xe/xe_pm.c | 223 +++++++++++++-----
drivers/gpu/drm/xe/xe_pm.h | 16 +-
drivers/gpu/drm/xe/xe_pt.c | 3 +
drivers/gpu/drm/xe/xe_query.c | 4 -
drivers/gpu/drm/xe/xe_sched_job.c | 12 +-
drivers/gpu/drm/xe/xe_tile.c | 10 +-
drivers/gpu/drm/xe/xe_tile_sysfs.c | 1 +
drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 5 +-
drivers/gpu/drm/xe/xe_uc_fw.c | 4 +-
drivers/gpu/drm/xe/xe_vm.c | 10 +-
46 files changed, 565 insertions(+), 409 deletions(-)
--
2.43.0
More information about the Intel-xe
mailing list