[PATCH v4 0/2] Fixes for MI_REPORT_PERF_COUNT

Dixit, Ashutosh ashutosh.dixit at intel.com
Fri Dec 20 17:19:24 UTC 2024


On Fri, 20 Dec 2024 08:16:45 -0800, Souza, Jose wrote:
>

Hi Jose,

> On Thu, 2024-12-19 at 16:22 -0800, Umesh Nerlige Ramappa wrote:
> > OA programming sequence for query mode or MI_REPORT_PERF_COUNT requires
> > modifying some HW registers in the same hw context as the user exec
> > queue. User passes the exec_queue to the OA interface and OA
> > implementation submits an MI_LOAD_REGISTER_IMM to this queue to modify
> > the registers.
> >
> > The OA implementation submits a batch mapped in GGTT to the user exec
> > queue and hence, some plumbing is added into relevant code to enable
> > that (as per suggestions from Matthew Brost).
> >
> > v2: review rework
> > v3:
> > - review rework
> > - original patches squashed for porting to stable
> > - code cleanup
> >
> > v4:
> > - review rework/fixes
>
> Got this oops with this version:
>
> [  176.066578] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [  176.068577] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [  176.072629] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [  176.078117] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [  176.081285] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [  176.093564] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [  176.102886] xe 0000:00:02.0: [drm:xe_oa_config_locked [xe]] changed to oa config uuid=4ccd6535-fb9a-440f-b0f5-882879dc4cb0
> [  194.119229] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ba3: 0000 [#1] PREEMPT SMP
> [  194.130187] CPU: 3 UID: 1000 PID: 2240 Comm: ReplayManager Not tainted 6.13.0-rc3-zeh-xe+ #1454
> [  194.138931] Hardware name: Intel Corporation Lunar Lake Client Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.3152.D83.2404190622 04/19/2024
> [  194.151258] RIP: 0010:xe_sync_entry_add_deps+0x1c/0x60 [xe]
> [  194.157013] Code: c7 43 18 f4 ff ff ff e9 9b fe ff ff 66 90 55 53 48 8b 5f 08 48 85 db 75 05 31 c0 5b 5d c3 48 89 f5 48 8d 7b 38 b8 01 00 00 00
> <f0> 0f c1 43 38 85 c0 74 20 8d 50 01 09 c2 78 0d 48 89 de 48 89 ef
> [  194.175863] RSP: 0018:ffffc90001f93de8 EFLAGS: 00010202
> [  194.181136] RAX: 0000000000000001 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
> [  194.188331] RDX: ffff88815ee8edc0 RSI: ffff88814ebb0840 RDI: 6b6b6b6b6b6b6ba3
> [  194.195520] RBP: ffff88814ebb0840 R08: 0000000000000001 R09: 0000000000000000
> [  194.202707] R10: 0000000000000001 R11: 0000000000000003 R12: ffff88814ebb0840
> [  194.209889] R13: ffff8881457f9900 R14: ffff888173075800 R15: 0000000000000000
> [  194.217071] FS:  00007f6c80db9640(0000) GS:ffff88885e580000(0000) knlGS:0000000000000000
> [  194.225216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  194.231014] CR2: 00007f6bdb33a000 CR3: 0000000144f44001 CR4: 0000000000772ef0
> [  194.238201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  194.245386] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [  194.252575] PKRU: 55555554
> [  194.255315] Call Trace:
> [  194.257794]  <TASK>
> [  194.259932]  ? __die_body.cold+0x19/0x21
> [  194.263899]  ? die_addr+0x33/0x50
> [  194.267256]  ? exc_general_protection+0x19e/0x450
> [  194.272002]  ? asm_exc_general_protection+0x22/0x30
> [  194.276930]  ? xe_sync_entry_add_deps+0x1c/0x60 [xe]

Looks related to this inadvertent change I noticed yesterday and pointed
out in the thread:

>> static int xe_oa_load_with_lri(struct xe_oa_stream *stream, struct xe_oa_reg *reg_lri)
>> {
>>      ...
>> -	fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_NO_DEPS, bb);
>> +	fence = xe_oa_submit_bb(stream, XE_OA_SUBMIT_ADD_DEPS, bb);
>
> This looks like a copy-paste error, could you please change this back to
> XE_OA_SUBMIT_NO_DEPS as it used to be.

Sorry you ran into this. We'll fix this and ask your help to test again.

> [  194.282052]  xe_oa_submit_bb.constprop.0+0x9d/0x1c0 [xe]
> [  194.287517]  xe_oa_load_with_lri.constprop.0+0xc4/0x130 [xe]
> [  194.293313]  xe_oa_configure_oa_context+0x1fd/0x210 [xe]
> [  194.298770]  xe_oa_disable_metric_set+0x4b/0xc0 [xe]
> [  194.303857]  xe_oa_stream_destroy+0x3a/0x140 [xe]
> [  194.308698]  xe_oa_release+0x3a/0xe0 [xe]
> [  194.312833]  __fput+0xee/0x2a0
> [  194.315934]  __x64_sys_close+0x49/0xb0
> [  194.319722]  do_syscall_64+0x64/0x130
> [  194.323417]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [  194.328511] RIP: 0033:0x7f6ca8b14f8b

Thanks.
--
Ashutosh


More information about the Intel-xe mailing list