[PATCH 1/1] drm/xe: uninitialized fence causing null ptr dereference

Wed Jun 5 18:28:43 UTC 2024

> On Wed, Jun 05, 2024 at 11:15:28AM -0700, fei.yang at intel.com wrote:
>> From: Fei Yang <fei.yang at intel.com>
>>
>> [  141.256160] BUG: kernel NULL pointer dereference, address:
>> 0000000000000028 [  141.257162] #PF: supervisor read access in kernel
>> mode [  141.257943] #PF: error_code(0x0000) - not-present page [
>> 141.258722] PGD 800000018c95c067 P4D 800000018c95c067 PUD 18c95d067
>> PMD 0 [  141.259751] Oops: 0000 [#1] PREEMPT SMP PTI
>> [  141.260409] CPU: 0 PID: 7277 Comm: gemm_bf16 Kdump: loaded Tainted: G     U             6.9.0-xe-474+ #1
>> [  141.261812] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [
>> 141.262669] RIP: 0010:trace_event_raw_event_xe_sched_job+0x50/0x100
>> [xe] [  141.263644] Code: 02 00 00 0f 85 ad 00 00 00 ba 30 00 00 00 4c
>> 89 e6 48 8d 7d b8 e8 a0 c4 78 e0 48 85 c0 74 7b 48 8b 93 18 01 00 00
>> 48 8d 7d b8 <48> 8b 52 28 89 50 08 8b 93 38 01 00 00 89 50 0c 48 8b 93
>> 08 01 00 [  141.266281] RSP: 0000:ffffc900017ff1c0 EFLAGS: 00010282 [
>> 141.267075] RAX: ffff8881001c4208 RBX: ffff888188499380 RCX:
>> 00000000000007a3 [  141.268100] RDX: 0000000000000000 RSI:
>> 0000000000000000 RDI: ffffc900017ff1c0 [  141.269123] RBP:
>> ffffc900017ff208 R08: 0000000000000002 R09: 0000000000000001 [
>> 141.270145] R10: 0000000000000034 R11: c0673accd9eb118e R12:
>> ffff888157969908 [  141.271166] R13: ffff888188499380 R14:
>> ffff888188499380 R15: 0000000000000001 [  141.272187] FS:
>> 00007f38147d4780(0000) GS:ffff888237e00000(0000)
>> knlGS:0000000000000000 [  141.273402] CS:  0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033 [  141.274250] CR2: 0000000000000028 CR3: 0000000188490005 CR4: 0000000000570ef0 [  141.275268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [  141.276284] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [  141.277297] PKRU: 55555554 [  141.277758] Call Trace:
>> [  141.278186]  <TASK>
>> [  141.278571]  ? show_regs+0x67/0x70
>> [  141.279114]  ? __die_body+0x20/0x70 [  141.279666]  ?
>> __die+0x2b/0x40 [  141.280164]  ? page_fault_oops+0x153/0x4b0 [
>> 141.280782]  ? search_bpf_extables+0x96/0x160 [  141.281439]  ?
>> trace_event_raw_event_xe_sched_job+0x50/0x100 [xe] [  141.282317]  ?
>> search_exception_tables+0x5f/0x70 [  141.283004]  ?
>> kernelmode_fixup_or_oops.isra.0+0x61/0x80
>> [  141.283771]  ? __bad_area_nosemaphore+0x18e/0x290
>> [  141.284466]  ? __lock_acquire+0xa22/0x30a0 [  141.285080]  ?
>> bad_area_nosemaphore+0x16/0x20 [  141.285733]  ?
>> do_user_addr_fault+0x338/0xa80 [  141.286384]  ?
>> trace_clock_local+0x10/0x30 [  141.286993]  ?
>> __rb_reserve_next+0x62/0x4c0 [  141.287611]  ?
>> exc_page_fault+0x87/0x2a0 [  141.288197]  ?
>> asm_exc_page_fault+0x27/0x30 [  141.288813]  ?
>> trace_event_raw_event_xe_sched_job+0x50/0x100 [xe] [  141.289678]
>> xe_sched_job_create+0x29d/0x2e0 [xe] [  141.290373]
>> __xe_bb_create_job+0x93/0x220 [xe]
>>
>> Fixes: 0ac7a2c745e8 ("drm/xe: Don't initialize fences at
>> xe_sched_job_create()")
>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>
> Thanks for the patch, noticed this too [1].
>
> Since I'm hear and our patches are the same:
> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
>
> Will merge once this CI passes.
>
> [1] https://patchwork.freedesktop.org/series/134484/

Oh, thanks Matt! I didn't notice you had a patch already, should have been
checking the mailing list more often.

>> Signed-off-by: Fei Yang <fei.yang at intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_sched_job.h | 2 +-
>>  drivers/gpu/drm/xe/xe_trace.h     | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_sched_job.h
>> b/drivers/gpu/drm/xe/xe_sched_job.h
>> index 002c3b5c0a5c..0c3ddbb7e25f 100644
>> --- a/drivers/gpu/drm/xe/xe_sched_job.h
>> +++ b/drivers/gpu/drm/xe/xe_sched_job.h
>> @@ -70,7 +70,7 @@ to_xe_sched_job(struct drm_sched_job *drm)
>>
>>  static inline u32 xe_sched_job_seqno(struct xe_sched_job *job)  {
>> -    return job->fence->seqno;
>> +    return (job->fence) ? job->fence->seqno : 0;
>>  }
>>
>>  static inline u32 xe_sched_job_lrc_seqno(struct xe_sched_job *job)
>> diff --git a/drivers/gpu/drm/xe/xe_trace.h
>> b/drivers/gpu/drm/xe/xe_trace.h index 450f407c66e8..ea61387e0f5e
>> 100644
>> --- a/drivers/gpu/drm/xe/xe_trace.h
>> +++ b/drivers/gpu/drm/xe/xe_trace.h
>> @@ -270,7 +270,7 @@ DECLARE_EVENT_CLASS(xe_sched_job,
>>                         __entry->guc_state =
>>                         atomic_read(&job->q->guc->state);
>>                         __entry->flags = job->q->flags;
>> -                       __entry->error = job->fence->error;
>> +                       __entry->error = (job->fence) ? job->fence->error : 0;
>>                         __entry->fence = job->fence;
>>                         __entry->batch_addr = (u64)job->ptrs[0].batch_addr;
>>                         ),
>> --
>> 2.25.1
>>