[Intel-xe] [PATCH v3 0/7] PAT and cache coherency support

Souza, Jose jose.souza at intel.com
Tue Sep 26 18:03:17 UTC 2023


On Tue, 2023-09-26 at 09:23 +0100, Matthew Auld wrote:
> On 25/09/2023 20:47, Souza, Jose wrote:
> > On Mon, 2023-09-25 at 14:21 +0100, Matthew Auld wrote:
> > > Branch available here (lightly tested):
> > > https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
> > > 
> > > Series still needs some more testing. Also note that the series directly depends
> > > on the WIP patch here: https://patchwork.freedesktop.org/series/122708/
> > > 
> > > Goal here is to allow userspace to directly control the pat_index when mapping
> > > memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
> > > is very much needed on newer igpu platforms which allow incoherent GT access,
> > > where the choice over the cache level and expected coherency is best left to
> > > userspace depending on their usecase.  In the future there may also be other
> > > stuff encoded in the pat_index, so giving userspace direct control will also be
> > > needed there.
> > > 
> > > To support this we added new gem_create uAPI for selecting the CPU cache
> > > mode to use for system memory, including the expected GPU coherency mode. There
> > > are various restrictions here for the selected coherency mode and compatible CPU
> > > cache modes.  With that in place the actual pat_index can now be provided as
> > > part of vm_bind. The only restriction is that the coherency mode of the
> > > pat_index must be at least as coherent as the gem_create coherency mode. There
> > > are also some special cases like with userptr and dma-buf.
> > > 
> > > v2:
> > >    - Loads of improvements/tweaks. Main changes are to now allow
> > >      gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
> > >      exactly. This simplifies the dma-buf policy from userspace pov. Also we now
> > >      only consider COH_NONE and COH_AT_LEAST_1WAY.
> > > v3:
> > >    - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
> > >      fixes.
> > > 
> > 
> > Thanks for the fixes, display is now working in TGL and DG2 but getting a new crash in MTL:
> 
> Is the MTL bug present on the same base branch. i.e if you drop all the 
> patches in this series?

Also happens without your patches.
Found CI bug with the same signature: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/606


> 
> > 
> > 
> > [  259.478814] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]  blocks   16,  97,  97, 129, 129, 161,   0,   0,  30,  33,   47 ->   62,
> > 93,  93, 123, 123, 154,   0,   0, 137,  62,  137
> > [  259.478936] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] min_ddb   19, 108, 108, 143, 143, 179,   0,   0,  31,  38,   48 ->  123,
> > 184, 184, 184, 184, 245,   0,   0, 138, 123,  138
> > [  259.479089] ------------[ cut here ]------------
> > [  259.479093] WARNING: CPU: 2 PID: 2057 at drivers/gpu/drm/xe/display/xe_fb_pin.c:199 __xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.479239] Modules linked in: xe drm_ttm_helper drm_exec gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy ttm drm_display_helper
> > x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul wmi_bmof pmt_telemetry pmt_class ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg
> > snd_hda_codec kvm_intel snd_hwdep snd_hda_core e1000e mei_me ptp snd_pcm i2c_i801 mei i2c_smbus pps_core intel_vsec video wmi pinctrl_meteorlake fuse
> > [  259.479327] CPU: 2 PID: 2057 Comm: gnome-shell Tainted: G        W          6.5.0-rc7+zeh-xe+ #1109
> > [  259.479333] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-M LP5x CONF1 RVP, BIOS MTLMFWI1.R00.3323.D84.2308220916 08/22/2023
> > [  259.479337] RIP: 0010:__xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.479498] Code: 4d 89 f4 48 8b 44 24 08 49 8d b4 24 28 03 00 00 b9 16 00 00 00 4c 89 60 08 48 8d 78 10 f3 48 a5 4c 8b 6c 24 08 e9 2c fd ff ff
> > <0f> 0b 49 c7 c5 ed ff ff ff e9 14 fd ff ff 48 8b 7c 24 28 89 14 24
> > [  259.479503] RSP: 0018:ffffc9000604bb88 EFLAGS: 00010246
> > [  259.479509] RAX: ffff888196c9f190 RBX: ffff8881a222dc00 RCX: 0000000000000001
> > [  259.479513] RDX: 0000000000000000 RSI: ffffffff826a896e RDI: ffffffff826ac710
> > [  259.479517] RBP: ffff888183823800 R08: 0000000000000128 R09: ffff8881b2eff4d8
> > [  259.479521] R10: ffffc9000604bac8 R11: 0000000000000002 R12: ffff8881a222dc00
> > [  259.479526] R13: ffff888102ab0000 R14: 0000000000000000 R15: 0000563e9575fa00
> > [  259.479530] FS:  00007f4dbdf5f5c0(0000) GS:ffff88846e100000(0000) knlGS:0000000000000000
> > [  259.479535] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  259.479540] CR2: 00000e17254bc000 CR3: 0000000117bb8005 CR4: 0000000000770ee0
> > [  259.479545] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  259.479549] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> > [  259.479552] PKRU: 55555554
> > [  259.479556] Call Trace:
> > [  259.479560]  <TASK>
> > [  259.479564]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.479708]  ? __warn+0x7c/0x170
> > [  259.479716]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.479855]  ? report_bug+0x18d/0x1c0
> > [  259.479865]  ? handle_bug+0x3a/0x70
> > [  259.479873]  ? exc_invalid_op+0x13/0x60
> > [  259.479880]  ? asm_exc_invalid_op+0x16/0x20
> > [  259.479894]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.480030]  ? __xe_pin_fb_vma+0x34/0x840 [xe]
> > [  259.480160]  ? lock_acquire+0xd3/0x2d0
> > [  259.480170]  ? find_held_lock+0x2b/0x80
> > [  259.480179]  intel_plane_pin_fb+0x34/0x90 [xe]
> > [  259.480314]  intel_prepare_plane_fb+0x2c/0x70 [xe]
> > [  259.480469]  drm_atomic_helper_prepare_planes+0x6b/0x210
> > [  259.480481]  intel_atomic_commit+0x4d/0x360 [xe]
> > [  259.480666]  drm_mode_atomic_ioctl+0x7c7/0xbd0
> > [  259.480688]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
> > [  259.480696]  drm_ioctl_kernel+0xc0/0x170
> > [  259.480705]  drm_ioctl+0x212/0x470
> > [  259.480711]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
> > [  259.480729]  __x64_sys_ioctl+0x8d/0xb0
> > [  259.480739]  do_syscall_64+0x38/0x90
> > [  259.480746]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > [  259.480752] RIP: 0033:0x7f4dc211aaff
> > [  259.480756] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05
> > <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> > [  259.480761] RSP: 002b:00007ffd6d2b7940 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [  259.480767] RAX: ffffffffffffffda RBX: 00007ffd6d2b79e0 RCX: 00007f4dc211aaff
> > [  259.480771] RDX: 00007ffd6d2b79e0 RSI: 00000000c03864bc RDI: 0000000000000009
> > [  259.480774] RBP: 00000000c03864bc R08: 0000000000000000 R09: 0000000000000000
> > [  259.480777] R10: 00007f4dc221a2f0 R11: 0000000000000246 R12: 0000563e988b4590
> > [  259.480781] R13: 0000000000000009 R14: 0000563e988b4650 R15: 0000563e9814f060
> > [  259.480794]  </TASK>
> > [  259.480797] irq event stamp: 2049057
> > [  259.480800] hardirqs last  enabled at (2049063): [<ffffffff811e2369>] __up_console_sem+0x59/0x80
> > [  259.480808] hardirqs last disabled at (2049068): [<ffffffff811e234e>] __up_console_sem+0x3e/0x80
> > [  259.480815] softirqs last  enabled at (2048430): [<ffffffff8114f3aa>] irq_exit_rcu+0x8a/0xe0
> > [  259.480821] softirqs last disabled at (2048423): [<ffffffff8114f3aa>] irq_exit_rcu+0x8a/0xe0
> > [  259.480826] ---[ end trace 0000000000000000 ]---
> > [  259.494838] xe 0000:00:02.0: [drm:drm_mode_addfb2] [FB:219]
> > [  259.494943] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLAN
> > 
> > That is: __xe_pin_fb_vma()
> > if (XE_WARN_ON(view->type == I915_GTT_VIEW_REMAPPED)) {
> > 
> > 
> > 



More information about the Intel-xe mailing list