[PATCH 1/1] drm/xe: uninitialized fence causing null ptr dereference

Matthew Brost matthew.brost at intel.com
Wed Jun 5 18:19:03 UTC 2024


On Wed, Jun 05, 2024 at 11:15:28AM -0700, fei.yang at intel.com wrote:
> From: Fei Yang <fei.yang at intel.com>
> 
> [  141.256160] BUG: kernel NULL pointer dereference, address: 0000000000000028
> [  141.257162] #PF: supervisor read access in kernel mode
> [  141.257943] #PF: error_code(0x0000) - not-present page
> [  141.258722] PGD 800000018c95c067 P4D 800000018c95c067 PUD 18c95d067 PMD 0
> [  141.259751] Oops: 0000 [#1] PREEMPT SMP PTI
> [  141.260409] CPU: 0 PID: 7277 Comm: gemm_bf16 Kdump: loaded Tainted: G     U             6.9.0-xe-474+ #1
> [  141.261812] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [  141.262669] RIP: 0010:trace_event_raw_event_xe_sched_job+0x50/0x100 [xe]
> [  141.263644] Code: 02 00 00 0f 85 ad 00 00 00 ba 30 00 00 00 4c 89 e6 48 8d 7d b8 e8 a0 c4 78 e0 48 85 c0 74 7b 48 8b 93 18 01 00 00 48 8d 7d b8 <48> 8b 52 28 89 50 08 8b 93 38 01 00 00 89 50 0c 48 8b 93 08 01 00
> [  141.266281] RSP: 0000:ffffc900017ff1c0 EFLAGS: 00010282
> [  141.267075] RAX: ffff8881001c4208 RBX: ffff888188499380 RCX: 00000000000007a3
> [  141.268100] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc900017ff1c0
> [  141.269123] RBP: ffffc900017ff208 R08: 0000000000000002 R09: 0000000000000001
> [  141.270145] R10: 0000000000000034 R11: c0673accd9eb118e R12: ffff888157969908
> [  141.271166] R13: ffff888188499380 R14: ffff888188499380 R15: 0000000000000001
> [  141.272187] FS:  00007f38147d4780(0000) GS:ffff888237e00000(0000) knlGS:0000000000000000
> [  141.273402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  141.274250] CR2: 0000000000000028 CR3: 0000000188490005 CR4: 0000000000570ef0
> [  141.275268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  141.276284] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [  141.277297] PKRU: 55555554
> [  141.277758] Call Trace:
> [  141.278186]  <TASK>
> [  141.278571]  ? show_regs+0x67/0x70
> [  141.279114]  ? __die_body+0x20/0x70
> [  141.279666]  ? __die+0x2b/0x40
> [  141.280164]  ? page_fault_oops+0x153/0x4b0
> [  141.280782]  ? search_bpf_extables+0x96/0x160
> [  141.281439]  ? trace_event_raw_event_xe_sched_job+0x50/0x100 [xe]
> [  141.282317]  ? search_exception_tables+0x5f/0x70
> [  141.283004]  ? kernelmode_fixup_or_oops.isra.0+0x61/0x80
> [  141.283771]  ? __bad_area_nosemaphore+0x18e/0x290
> [  141.284466]  ? __lock_acquire+0xa22/0x30a0
> [  141.285080]  ? bad_area_nosemaphore+0x16/0x20
> [  141.285733]  ? do_user_addr_fault+0x338/0xa80
> [  141.286384]  ? trace_clock_local+0x10/0x30
> [  141.286993]  ? __rb_reserve_next+0x62/0x4c0
> [  141.287611]  ? exc_page_fault+0x87/0x2a0
> [  141.288197]  ? asm_exc_page_fault+0x27/0x30
> [  141.288813]  ? trace_event_raw_event_xe_sched_job+0x50/0x100 [xe]
> [  141.289678]  xe_sched_job_create+0x29d/0x2e0 [xe]
> [  141.290373]  __xe_bb_create_job+0x93/0x220 [xe]
> 
> Fixes: 0ac7a2c745e8 ("drm/xe: Don't initialize fences at xe_sched_job_create()")
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>

Thanks for the patch, noticed this too [1].

Since I'm hear and our patches are the same:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>

Will merge once this CI passes.

[1] https://patchwork.freedesktop.org/series/134484/

> Signed-off-by: Fei Yang <fei.yang at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_sched_job.h | 2 +-
>  drivers/gpu/drm/xe/xe_trace.h     | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h
> index 002c3b5c0a5c..0c3ddbb7e25f 100644
> --- a/drivers/gpu/drm/xe/xe_sched_job.h
> +++ b/drivers/gpu/drm/xe/xe_sched_job.h
> @@ -70,7 +70,7 @@ to_xe_sched_job(struct drm_sched_job *drm)
>  
>  static inline u32 xe_sched_job_seqno(struct xe_sched_job *job)
>  {
> -	return job->fence->seqno;
> +	return (job->fence) ? job->fence->seqno : 0;
>  }
>  
>  static inline u32 xe_sched_job_lrc_seqno(struct xe_sched_job *job)
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 450f407c66e8..ea61387e0f5e 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -270,7 +270,7 @@ DECLARE_EVENT_CLASS(xe_sched_job,
>  			   __entry->guc_state =
>  			   atomic_read(&job->q->guc->state);
>  			   __entry->flags = job->q->flags;
> -			   __entry->error = job->fence->error;
> +			   __entry->error = (job->fence) ? job->fence->error : 0;
>  			   __entry->fence = job->fence;
>  			   __entry->batch_addr = (u64)job->ptrs[0].batch_addr;
>  			   ),
> -- 
> 2.25.1
> 


More information about the Intel-xe mailing list