[Intel-xe] [RFC PATCH 0/1] Long running jobs ideas
Matthew Brost
matthew.brost at intel.com
Wed Mar 29 00:09:57 UTC 2023
We discussed 2 options in a meeting today for long running jobs:
1. Return NULL in run_job, find another way to flow control ring and
clean up after engine errors.
2. Don't export dma-fences from the DRM scheduler or xe_sched_job, use
lockdep to enforce these rules (we don't export currently but do not
have lockdep asserts).
This series is an attenpt to code option #1, specifically the flow
controling of the ring. This series is not tested at all.
I've reached the conclusion that option #1 doesn't work as all it does
is move a dma-fence from run_job to prepare_job and in either case the
dma-fences from the DRM scheduler can be endless. Based on the
discussion in [1], the only proper way to throttle submission is through
dma-fences too.
Based on this IMO the only sensible solution for long running jobs in
option #2. We likely should update the DRM scheduler with a flag so when
it creates fences it uses lockdep to enforce that these fences cannot be
exported. Likewise we should also apply these rules to job->fence in Xe.
Matt
[1] https://patchwork.freedesktop.org/patch/525461/?series=114772&rev=2
Matthew Brost (1):
drm/xe: Return NULL in run_job for long running jobs
drivers/gpu/drm/xe/xe_engine.c | 54 ++++++++++++++++++++++++-
drivers/gpu/drm/xe/xe_engine.h | 6 +++
drivers/gpu/drm/xe/xe_engine_types.h | 3 ++
drivers/gpu/drm/xe/xe_guc_submit.c | 31 ++++++++++++--
drivers/gpu/drm/xe/xe_sched_job.c | 5 +++
drivers/gpu/drm/xe/xe_sched_job_types.h | 2 +
6 files changed, 96 insertions(+), 5 deletions(-)
--
2.34.1
More information about the Intel-xe
mailing list