[PATCH v11 02/10] drm/sched: Store the drm client_id in drm_sched_fence
Lucas De Marchi
lucas.demarchi at intel.com
Wed May 28 19:07:34 UTC 2025
On Mon, May 26, 2025 at 02:54:44PM +0200, Pierre-Eric Pelloux-Prayer wrote:
> drivers/gpu/drm/xe/xe_sched_job.c | 3 ++-
>diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
>index f0a6ce610948..5921293b25db 100644
>--- a/drivers/gpu/drm/xe/xe_sched_job.c
>+++ b/drivers/gpu/drm/xe/xe_sched_job.c
>@@ -113,7 +113,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
> kref_init(&job->refcount);
> xe_exec_queue_get(job->q);
>
>- err = drm_sched_job_init(&job->drm, q->entity, 1, NULL);
>+ err = drm_sched_job_init(&job->drm, q->entity, 1, NULL,
>+ q->xef->drm->client_id);
you can't do this here. xef is only !NULL if it's a job from userspace.
For in-kernel jobs, xef is NULL and this explodes. Right now this
completely breaks xe since one of the very first things we do is
to submit a job to save the default context. Example:
https://intel-gfx-ci.01.org/tree/intel-xe/xe-3151-56d2b14961751a677ff1f7ff8b93a6c814ce2be3/bat-bmg-1/igt@xe_module_load@load.html
<4> [] RIP: 0010:xe_sched_job_create+0xbd/0x390 [xe]
<4> [] Code: c1 43 18 85 c0 0f 84 6f 02 00 00 8d 50 01 09 c2 0f 88 3e 02 00 00 48 8b 03 48 8b b3 d8 00 00 00 31 c9 4c 89 ef ba 01 00 00 00 <48> 8b 40 08 4c 8b 40 60 e8 86 64 7c ff 41 89 c4 85 c0 0f 85 9b 01
<4> [] RSP: 0018:ffffc900031972d8 EFLAGS: 00010246
<4> [] RAX: 0000000000000000 RBX: ffff88815fc40d00 RCX: 0000000000000000
<4> [] RDX: 0000000000000001 RSI: ffff88812e6552a8 RDI: ffff88815f939c40
<4> [] RBP: ffffc90003197318 R08: 0000000000000000 R09: 0000000000000000
<4> [] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003197428
<4> [] R13: ffff88815f939c40 R14: ffff88811f054000 R15: ffff88815fc40d00
<4> [] FS: 00007681f2948940(0000) GS:ffff8888daf14000(0000) knlGS:0000000000000000
<4> [] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [] CR2: 0000000000000008 CR3: 0000000118315004 CR4: 0000000000f72ef0
<4> [] PKRU: 55555554
<4> [] Call Trace:
<4> [] <TASK>
<4> [] __xe_bb_create_job+0xa2/0x240 [xe]
<4> [] ? find_held_lock+0x31/0x90
<4> [] ? xa_find_after+0x12c/0x250
<4> [] xe_bb_create_job+0x6e/0x380 [xe]
<4> [] ? xa_find_after+0x136/0x250
<4> [] ? __drm_dev_dbg+0x7d/0xb0
<4> [] xe_gt_record_default_lrcs+0x542/0xb00 [xe]
Can we use 0 for in-kernel client since drm_file starts them from 1?
Like this:
| diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
| index 5921293b25db3..d21bf8f269640 100644
| --- a/drivers/gpu/drm/xe/xe_sched_job.c
| +++ b/drivers/gpu/drm/xe/xe_sched_job.c
| @@ -114,7 +114,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
| xe_exec_queue_get(job->q);
|
| err = drm_sched_job_init(&job->drm, q->entity, 1, NULL,
| - q->xef->drm->client_id);
| + q->xef ? q->xef->drm->client_id : 0);
| if (err)
| goto err_free;
I tested with the above diff and it at least loads...
Also, I see this in intel-xe mailing list, but I'm not sure why we
didn't have any CI results... I will check that.
Lucas De Marchi
More information about the dri-devel
mailing list