[Intel-xe] XE sched review

Matthew Brost matthew.brost at intel.com
Tue Nov 7 08:24:08 UTC 2023


On Thu, Nov 02, 2023 at 10:40:38AM +0000, Dani Liberman wrote:
> On 01/11/2023 17:33, Dani Liberman wrote:
> Hi,
> 
> As part of the upstream review of the Xe driver, I reviewed the XE code which relates to the DRM scheduler. Here are the issues/questions:
> 
> guc exec queue states:
> 1. Please elaborate on the states as they are very fundamental part of the guc submission (banned, destroyed, etc..).

A lot of states are to track the state of a exec queue (resolves to
guc_id) in the GuC firmware to send the proper H2G to interact with the
GuC firmware. I can take an AR to post a patch explaining these states.
Also will post a kernel doc patch for GuC submission in general.

> 2. guc state is an atomic variable but there is no locking in the sched related code and the state can be changed right after getting it. Why not change it to non atomic variable?

The only reason this is atomic as these bits are in a single field (e.g.
if each state was a bool this would work too). Another option we
introduce a lock too. I can scrub this as part of writing the kernel doc
and write up an explaination on this.

> 
> guc_exec_queue_run_job():
> 1. if (!exec_queue_killed_or_banned(q) && !xe_sched_job_is_error(job)): in case this condition is false, and the job is not submitted. Why still returning fence instead of NULL?

The TDR is used to flush out all jobs + signal fences when the exec
queue is in a killed_or_banned state. We return the fence here which the
TDR will signal.

> 2. I think that register_engine() shouldn't be part of run_job:
>    - It is part of the init procedure so I think it is more appropriate to put it in guc_exec_queue_init().
>    - Not sure if it's possible but the guc status updates asynchronously and if exec_queue got unregistered (after g2h XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE) while in the middle of run_job, we might register it again.

After a GT reset we'd have to potentially register anyways so we'd
always need this code path regardless if we registered this in
guc_exec_queue_init(). I'd prefer just to leave it as is.

BTW, s/register_engine/register_exec_queue. A few other instances of
inconsistent naming in this xe_guc_submit.c too. Will take an AR to post
a patch to fix this.

> 3. LR: AFAIU LR exec_queue will use the drm sched only once and we don't care about the result (from drm POV). Maybe in LR mode, it's better to move the submit_exec_queue() right after emit_job() in xe_exec_ioctl() and
>        not using drm sched at all.
>

This is true but for simplicity I'd still have everything routed through
DRM scheduler even for LR jobs.

> guc_exec_queue_start(): using deprecated function drm_sched_resubmit_jobs().
>

I am aware of this. I forwarded you an email chain about
drm_sched_resubmit_jobs(). A few other drivers want to use this as well
so I believe this in fact is not being deprecated. Will keep on eye on
this though.

Matt

> Thanks,
> Dani
> 


More information about the Intel-xe mailing list