[RFC 00/34] Kill mem_access v2

Rodrigo Vivi rodrigo.vivi at intel.com
Fri Jan 26 20:30:09 UTC 2024


Hi all,

First of all, thank you so much for the good feedback and ideas on the
v1 of this RFC series:

v1: lore.kernel.org/all/20231228021232.2366249-1-rodrigo.vivi at intel.com

First of all, this v2 has a more organized/split patches. So I'd like
to ask you to start the reviews of the simple ones already, so I can
try to split the series and merge a little by little.

I have 2 pending issues on this series that I couldn't solve yet.
Matt Brost is already helping me on these, but any help is welcomed.

But as I told, I'd like to start the review of the simplest ones first
anyway, please!

Details of the current issues:

1. Underflow on a gpu-hang test coming from d3cold. The culprit is the
pm_runtime_{get,put} around g2h_outstanding.

[  476.450482] [IGT] xe_exec_threads: starting subtest threads-hang-basic
[  476.507039] xe 0000:03:00.0: [drm] Engine reset: guc_id=78
[snip]
[  476.518814] xe 0000:03:00.0: [drm] Timedout job: seqno=4294967169, guc_id=78, flags=0x8
[  476.524595] xe 0000:03:00.0: [drm] Engine reset: guc_id=78
[  476.532615] xe 0000:03:00.0: [drm] Xe device coredump has been created[  476.801805] [IGT] xe_exec_threads: finished subtest threads-hang-basic, SUCCESS
[  476.808391] xe 0000:03:00.0: Runtime PM usage count underflow!
[  476.813455] [IGT] xe_exec_threads: exiting, ret=0
[  476.819146] xe 0000:03:00.0: Runtime PM usage count underflow!
[  476.829816] xe 0000:03:00.0: Runtime PM usage count underflow!
[and on, and on]

2. Failing rmmod due to an invalidation that happens at xe_pci_removal
(a case that fails only coming from D3cold and with display_enabled, but
on idle/blank-screen)

[  326.857464] xe 0000:03:00.0: [drm] GT0: resumed
[  327.135455] show_signal_msg: 126 callbacks suppressed
[  327.135467] gnome-shell[2488]: segfault at 0 ip 00007fad50e315cc sp 00007ffd6f04a360 error 4 in libmutter-clutter-11.so.0.0.0[7fad50dbd000+97000] likely on CPU 15 (core 28, socket 0)
[  327.157020] Code: e9 6f ff ff ff 66 0f 1f 84 00 00 00 00 00 48 8b 05 49 12 07 00 48 85 c0 74 4c 48 8b 38 e8 fc 49 f9 ff 48 89 c5 e8 14 75 f9 ff <48> 8b 55 00 48 8b 92 d0 00 00 00 48 85 d2 74 07 89 c6 48 89 ef ff
[  328.160905] xe 0000:03:00.0: [drm] Xe device coredump has been deleted.
[  329.099696] pci 0000:03:00.0: [drm] *ERROR* GT0: TLB invalidation time'd out, seqno=1668, recv=1667
[sip]
[  329.312763] ------------[ cut here ]------------
[  329.317417] pci 0000:03:00.0: [drm] Assertion `ct->g2h_outstanding == 0 || state == XE_GUC_CT_STATE_STOPPED` failed!
               platform: 7 subplatform: 4
               graphics: Xe_HPG 12.55 step C0
               media: Xe_HPM 12.55 step C0
               tile: 0 VRAM 8.00 GiB
               GT: 0 type 1

Thanks in advance,
Rodrigo.

Rodrigo Vivi (34):
  Revert "drm/xe/uc: Store firmware binary in system-memory backed BO"
  drm/xe: Document Xe PM component
  drm/xe: Fix display runtime_pm handling
  drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
  drm/xe: Convert xe_pm_runtime_{get,put} to void and protect from
    recursion
  drm/xe: Prepare display for D3Cold
  drm/xe: Convert mem_access assertion towards the runtime_pm state
  drm/xe: Runtime PM wake on every IOCTL
  drm/xe: Convert kunit tests from mem_access to xe_pm_runtime
  drm/xe: Convert scheduler towards direct pm_runtime
  drm/xe: Runtime PM wake on every sysfs call
  drm/xe: Ensure device is awake before removing it
  drm/xe: Remove mem_access from guc_pc calls
  drm/xe: Runtime PM wake on every debugfs call
  drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
  drm/xe: Removing extra mem_access protection from runtime pm
  drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
  drm/xe: Move lockdep protection from mem_access to xe_pm_runtime
  drm/xe: Remove pm_runtime lockdep
  drm/xe: Stop checking for power_lost on D3Cold
  drm/xe: Convert GuC CT paths from mem_access to xe_pm_runtime
  drm/xe: Keep D0 for the entire duration of a LR VM
  drm/xe: Ensure D0 on TLB invalidation
  drm/xe: Remove useless mem_access protection for query ioctls
  drm/xe: Convert gsc_work from mem_access to xe_pm_runtime
  drm/xe: VMs don't need the mem_access protection anymore
  drm/xe: Remove useless mem_access during probe
  drm/xe: Remove mem_access from suspend and resume functions
  drm/xe: Convert gt_reset from mem_access to xe_pm_runtime
  drm/xe: Remove useless mem_access on PAT dumps
  drm/xe: Remove inner mem_access protections
  drm/xe: Kill xe_device_mem_access_{get*,put}
  drm/xe: Remove unused runtime pm helper
  drm/xe: Enable D3Cold on 'low' VRAM utilization

 .../gpu/drm/xe/compat-i915-headers/i915_drv.h |   8 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |   7 +-
 drivers/gpu/drm/xe/tests/xe_bo.c              |   8 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c         |   7 +-
 drivers/gpu/drm/xe/tests/xe_mocs.c            |  14 +-
 drivers/gpu/drm/xe/xe_bo.c                    |  10 +-
 drivers/gpu/drm/xe/xe_debugfs.c               |  13 +-
 drivers/gpu/drm/xe/xe_device.c                | 129 ++++------
 drivers/gpu/drm/xe/xe_device.h                |   9 -
 drivers/gpu/drm/xe/xe_device_sysfs.c          |   4 +
 drivers/gpu/drm/xe/xe_device_types.h          |   6 -
 drivers/gpu/drm/xe/xe_display.c               |  22 ++
 drivers/gpu/drm/xe/xe_display.h               |   2 +
 drivers/gpu/drm/xe/xe_dma_buf.c               |   5 +-
 drivers/gpu/drm/xe/xe_exec_queue.c            |  19 --
 drivers/gpu/drm/xe/xe_ggtt.c                  |   6 -
 drivers/gpu/drm/xe/xe_gpu_scheduler.c         |   8 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h         |   3 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h   |   2 +
 drivers/gpu/drm/xe/xe_gsc.c                   |   5 +-
 drivers/gpu/drm/xe/xe_gt.c                    |  21 +-
 drivers/gpu/drm/xe/xe_gt_debugfs.c            |  53 ++++-
 drivers/gpu/drm/xe/xe_gt_freq.c               |  38 ++-
 drivers/gpu/drm/xe/xe_gt_idle.c               |  23 +-
 drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |   3 +
 drivers/gpu/drm/xe/xe_guc_ct.c                |  79 +++----
 drivers/gpu/drm/xe/xe_guc_ct_types.h          |   2 +
 drivers/gpu/drm/xe/xe_guc_debugfs.c           |   9 +-
 drivers/gpu/drm/xe/xe_guc_pc.c                |  62 +----
 drivers/gpu/drm/xe/xe_guc_submit.c            |   2 +-
 drivers/gpu/drm/xe/xe_huc_debugfs.c           |   5 +-
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c |  58 ++++-
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h |   7 +
 drivers/gpu/drm/xe/xe_hwmon.c                 |  25 +-
 drivers/gpu/drm/xe/xe_pat.c                   |  10 -
 drivers/gpu/drm/xe/xe_pci.c                   |   2 +-
 drivers/gpu/drm/xe/xe_pm.c                    | 223 +++++++++++++-----
 drivers/gpu/drm/xe/xe_pm.h                    |  16 +-
 drivers/gpu/drm/xe/xe_pt.c                    |   3 +
 drivers/gpu/drm/xe/xe_query.c                 |   4 -
 drivers/gpu/drm/xe/xe_sched_job.c             |  12 +-
 drivers/gpu/drm/xe/xe_tile.c                  |  10 +-
 drivers/gpu/drm/xe/xe_tile_sysfs.c            |   1 +
 drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   5 +-
 drivers/gpu/drm/xe/xe_uc_fw.c                 |   4 +-
 drivers/gpu/drm/xe/xe_vm.c                    |  10 +-
 46 files changed, 565 insertions(+), 409 deletions(-)

-- 
2.43.0



More information about the Intel-xe mailing list