[PATCH v4 0/2] Fixes for MI_REPORT_PERF_COUNT
Dixit, Ashutosh
ashutosh.dixit at intel.com
Fri Dec 20 17:19:24 UTC 2024
On Fri, 20 Dec 2024 08:16:45 -0800, Souza, Jose wrote:
>
Hi Jose,
> On Thu, 2024-12-19 at 16:22 -0800, Umesh Nerlige Ramappa wrote:
> > OA programming sequence for query mode or MI_REPORT_PERF_COUNT requires
> > modifying some HW registers in the same hw context as the user exec
> > queue. User passes the exec_queue to the OA interface and OA
> > implementation submits an MI_LOAD_REGISTER_IMM to this queue to modify
> > the registers.
> >
> > The OA implementation submits a batch mapped in GGTT to the user exec
> > queue and hence, some plumbing is added into relevant code to enable
> > that (as per suggestions from Matthew Brost).
> >
> > v2: review rework
> > v3:
> > - review rework
> > - original patches squashed for porting to stable
> > - code cleanup
> >
> > v4:
> > - review rework/fixes
>
> Got this oops with this version:
>
> [ 176.066578] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [ 176.068577] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [ 176.072629] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [ 176.078117] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [ 176.081285] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [ 176.093564] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [ 176.102886] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [ 194.119229] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ba3: 0000 [#1] PREEMPT SMP
> [ 194.130187] CPU: 3 UID: 1000 PID: 2240 Comm: ReplayManager Not tainted 6.13.0-rc3-zeh-xe+ #1454
> [ 194.138931] Hardware name: Intel Corporation Lunar Lake Client Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.3152.D83.2404190622 04/19/2024
> [ 194.151258] RIP: 0010:xe_sync_entry_add_deps+0x1c/0x60 [xe]
> [ 194.157013] Code: c7 43 18 f4 ff ff ff e9 9b fe ff ff 66 90 55 53 48 8b 5f 08 48 85 db 75 05 31 c0 5b 5d c3 48 89 f5 48 8d 7b 38 b8 01 00 00 00
> <f0> 0f c1 43 38 85 c0 74 20 8d 50 01 09 c2 78 0d 48 89 de 48 89 ef
> [ 194.175863] RSP: 0018:ffffc90001f93de8 EFLAGS: 00010202
> [ 194.181136] RAX: 0000000000000001 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
> [ 194.188331] RDX: ffff88815ee8edc0 RSI: ffff88814ebb0840 RDI: 6b6b6b6b6b6b6ba3
> [ 194.195520] RBP: ffff88814ebb0840 R08: 0000000000000001 R09: 0000000000000000
> [ 194.202707] R10: 0000000000000001 R11: 0000000000000003 R12: ffff88814ebb0840
> [ 194.209889] R13: ffff8881457f9900 R14: ffff888173075800 R15: 0000000000000000
> [ 194.217071] FS: 00007f6c80db9640(0000) GS:ffff88885e580000(0000) knlGS:0000000000000000
> [ 194.225216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 194.231014] CR2: 00007f6bdb33a000 CR3: 0000000144f44001 CR4: 0000000000772ef0
> [ 194.238201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 194.245386] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [ 194.252575] PKRU: 55555554
> [ 194.255315] Call Trace:
> [ 194.257794] <TASK>
> [ 194.259932] ? __die_body.cold+0x19/0x21
> [ 194.263899] ? die_addr+0x33/0x50
> [ 194.267256] ? exc_general_protection+0x19e/0x450
> [ 194.272002] ? asm_exc_general_protection+0x22/0x30
> [ 194.276930] ? xe_sync_entry_add_deps+0x1c/0x60 [xe]
Looks related to this inadvertent change I noticed yesterday and pointed
out in the thread:
>> static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *reg_lri)
>> {
>> ...
>> - fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_NO_DEPS, bb);
>> + fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_ADD_DEPS, bb);
>
> This looks like a copy-paste error, could you please change this back to
> XE_OA_SUBMIT_NO_DEPS as it used to be.
Sorry you ran into this. We'll fix this and ask your help to test again.
> [ 194.282052] xe_oa_submit_bb.constprop.0+0x9d/0x1c0 [xe]
> [ 194.287517] xe_oa_load_with_lri.constprop.0+0xc4/0x130 [xe]
> [ 194.293313] xe_oa_configure_oa_context+0x1fd/0x210 [xe]
> [ 194.298770] xe_oa_disable_metric_set+0x4b/0xc0 [xe]
> [ 194.303857] xe_oa_stream_destroy+0x3a/0x140 [xe]
> [ 194.308698] xe_oa_release+0x3a/0xe0 [xe]
> [ 194.312833] __fput+0xee/0x2a0
> [ 194.315934] __x64_sys_close+0x49/0xb0
> [ 194.319722] do_syscall_64+0x64/0x130
> [ 194.323417] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [ 194.328511] RIP: 0033:0x7f6ca8b14f8b
Thanks.
--
Ashutosh
More information about the Intel-xe
mailing list