[PATCH v11 02/10] drm/sched: Store the drm client_id in drm_sched_fence

Pierre-Eric Pelloux-Prayer pierre-eric at damsy.net
Wed May 28 20:46:19 UTC 2025


Hi,

Le 28/05/2025 à 21:07, Lucas De Marchi a écrit :
> On Mon, May 26, 2025 at 02:54:44PM +0200, Pierre-Eric Pelloux-Prayer wrote:
>> drivers/gpu/drm/xe/xe_sched_job.c                |  3 ++-
>> diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
>> index f0a6ce610948..5921293b25db 100644
>> --- a/drivers/gpu/drm/xe/xe_sched_job.c
>> +++ b/drivers/gpu/drm/xe/xe_sched_job.c
>> @@ -113,7 +113,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
>>     kref_init(&job->refcount);
>>     xe_exec_queue_get(job->q);
>>
>> -    err = drm_sched_job_init(&job->drm, q->entity, 1, NULL);
>> +    err = drm_sched_job_init(&job->drm, q->entity, 1, NULL,
>> +                 q->xef->drm->client_id);
> 
> you can't do this here. xef is only !NULL if it's a job from userspace.

sorry about that.

> For in-kernel jobs, xef is NULL and this explodes. Right now this
> completely breaks xe since one of the very first things we do is
> to submit a job to save the default context. Example:
> https://intel-gfx-ci.01.org/tree/intel-xe/xe-3151-56d2b14961751a677ff1f7ff8b93a6c814ce2be3/bat- 
> bmg-1/igt at xe_module_load@load.html
> 
>      <4> [] RIP: 0010:xe_sched_job_create+0xbd/0x390 [xe]
>      <4> [] Code: c1 43 18 85 c0 0f 84 6f 02 00 00 8d 50 01 09 c2 0f 88 3e 02 00 00 48 8b 03 48 8b 
> b3 d8 00 00 00 31 c9 4c 89 ef ba 01 00 00 00 <48> 8b 40 08 4c 8b 40 60 e8 86 64 7c ff 41 89 c4 85 c0 
> 0f 85 9b 01
>      <4> [] RSP: 0018:ffffc900031972d8 EFLAGS: 00010246
>      <4> [] RAX: 0000000000000000 RBX: ffff88815fc40d00 RCX: 0000000000000000
>      <4> [] RDX: 0000000000000001 RSI: ffff88812e6552a8 RDI: ffff88815f939c40
>      <4> [] RBP: ffffc90003197318 R08: 0000000000000000 R09: 0000000000000000
>      <4> [] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003197428
>      <4> [] R13: ffff88815f939c40 R14: ffff88811f054000 R15: ffff88815fc40d00
>      <4> [] FS:  00007681f2948940(0000) GS:ffff8888daf14000(0000) knlGS:0000000000000000
>      <4> [] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      <4> [] CR2: 0000000000000008 CR3: 0000000118315004 CR4: 0000000000f72ef0
>      <4> [] PKRU: 55555554
>      <4> [] Call Trace:
>      <4> []  <TASK>
>      <4> []  __xe_bb_create_job+0xa2/0x240 [xe]
>      <4> []  ? find_held_lock+0x31/0x90
>      <4> []  ? xa_find_after+0x12c/0x250
>      <4> []  xe_bb_create_job+0x6e/0x380 [xe]
>      <4> []  ? xa_find_after+0x136/0x250
>      <4> []  ? __drm_dev_dbg+0x7d/0xb0
>      <4> []  xe_gt_record_default_lrcs+0x542/0xb00 [xe]
> 
> Can we use 0 for in-kernel client since drm_file starts them from 1?

Yes, this is what amdgpu does.

> Like this:
> 
> | diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
> | index 5921293b25db3..d21bf8f269640 100644
> | --- a/drivers/gpu/drm/xe/xe_sched_job.c
> | +++ b/drivers/gpu/drm/xe/xe_sched_job.c
> | @@ -114,7 +114,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q,
> |         xe_exec_queue_get(job->q);
> | |         err = drm_sched_job_init(&job->drm, q->entity, 1, NULL,
> | -                                q->xef->drm->client_id);
> | +                                q->xef ? q->xef->drm->client_id : 0);
> |         if (err)
> |                 goto err_free;
> 
> I tested with the above diff and it at least loads...

Thanks for looking into this, the change looks fine to me.

Pierre-Eric

> 
> Also, I see this in intel-xe mailing list, but I'm not sure why we
> didn't have any CI results... I will check that.
> 
> Lucas De Marchi


More information about the dri-devel mailing list