[PATCH] drm/xe: Make sure scheduler is ready before submission

Mon Feb 24 10:22:03 UTC 2025

> -----Original Message-----
> From: Brost, Matthew <matthew.brost at intel.com>
> Sent: Monday, February 24, 2025 9:31 AM
> To: Upadhyay, Tejas <tejas.upadhyay at intel.com>
> Cc: intel-xe at lists.freedesktop.org; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray at intel.com>; Nilawar, Badal
> <badal.nilawar at intel.com>
> Subject: Re: [PATCH] drm/xe: Make sure scheduler is ready before submission
> 
> On Sun, Feb 23, 2025 at 07:50:20PM -0800, Matthew Brost wrote:
> > On Sat, Feb 22, 2025 at 12:17:51AM -0700, Upadhyay, Tejas wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Brost, Matthew <matthew.brost at intel.com>
> > > > Sent: Friday, February 21, 2025 10:54 PM
> > > > To: Upadhyay, Tejas <tejas.upadhyay at intel.com>
> > > > Cc: intel-xe at lists.freedesktop.org; Ghimiray, Himal Prasad
> > > > <himal.prasad.ghimiray at intel.com>; Nilawar, Badal
> > > > <badal.nilawar at intel.com>
> > > > Subject: Re: [PATCH] drm/xe: Make sure scheduler is ready before
> > > > submission
> > > >
> > > > On Thu, Feb 20, 2025 at 06:08:59PM +0530, Tejas Upadhyay wrote:
> > > > > Async call to __guc_exec_queue_fini_async frees scheduler while
> > > > > some scheduler submission would have already started.
> > > > > To handle such small window race case, submission should only
> > > > > start when scheduler is ready.
> > > > >
> > > > > It will help to solve below which is not easily reproducible,
> > > > > https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4223
> > > > >
> > > >
> > > > I don't think this help. We can only enter
> > > > xe_sched_submission_start if exec queue is in
> > > > &guc->submission_state.exec_queue_lookup and remove the exec queue
> > > > under a mutex in release_guc_id before we call xe_sched_fini which
> > > > flips the ready state to false. We then immediately free the scheduler -
> we'd be getting a UAF in xe_sched_submission_start anyways.
> >
> > This assessment isn't exactly correct. Rather we should not be calling
> > start after we call stop in xe_exec_queue_fini.
> >
> > > >
> > > > The gitlab issue looks like a memory corruption. Basically it is
> > > > saying a variable which is a bool is reading 107 which shouldn't be
> possible.
> > >
> > > It gets issue, while accessing drm_sched_rq *rq, look at below,
> > >
> > > static struct drm_sched_entity *
> > > drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
> > >                                 struct drm_sched_rq *rq) {
> > >         struct rb_node *rb;
> > >
> > >         spin_lock(&rq->lock); --> here it crashes
> > >
> > > Async free of sched_rq happens at,
> > >  void xe_sched_fini(struct xe_gpu_scheduler *sched) {
> > >         xe_sched_submission_stop(sched);
> > >         drm_sched_fini(&sched->base); --> here it frees sched_rq and
> > > marks sched_ready = false) }
> > >
> > > It looks like inbetween drm_sched_fini and xe_exec_queue_fini we are
> > > hitting this issue, which might be avoided with check mentioned here
> > > in patch. I saw similar check being done but other drivers look
> > > below in drivers/gpu/drm/amd/amdgpu/amdgpu_job.c,
> >
> > We shouldn't be calling start after xe_exec_queue_fini here. If we
> > are, that is the bug.
> >
> > >                         if (amdgpu_ring_sched_ready(ring))
> > >                                 drm_sched_start(&ring->sched, 0);
> > >
> > >
> > > Dump stack details :
> > > [ 1481.378550]
> > >
> ==================================================================
> > > [ 1481.385791] BUG: KASAN: slab-use-after-free in __lock_acquire
> > > (kernel/locking/lockdep.c:5089) [ 1481.392598] Read of size 8 at
> > > addr ffff88812352a7a0 by task kworker/u128:11/2827
> > >
> > > [ 1481.401506] CPU: 17 UID: 0 PID: 2827 Comm: kworker/u128:11 Not
> > > tainted 6.13.0xe-ptl+ #1 [ 1481.409475] Hardware name: Intel
> > > Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB,
> > > BIOS RPLSFWI1.R00.5045.A00.2401260733 01/26/2024 [ 1481.422824]
> Workqueue:  drm_sched_run_job_work [gpu_sched] [ 1481.428348] Call
> Trace:
> > > [ 1481.430803]  <TASK>
> > > [ 1481.432913] dump_stack_lvl (lib/dump_stack.c:123) [ 1481.436589]
> > > print_report (mm/kasan/report.c:379 mm/kasan/report.c:489) [
> > > 1481.440206] ? __virt_addr_valid
> > > (./arch/x86/include/asm/preempt.h:94 ./include/linux/rcupdate.h:964
> > > ./include/linux/mmzone.h:2058 arch/x86/mm/physaddr.c:65) [
> > > 1481.444511] ? __lock_acquire (kernel/locking/lockdep.c:5089) [
> > > 1481.448527] kasan_report (mm/kasan/report.c:604) [ 1481.452141] ?
> > > __lock_acquire (kernel/locking/lockdep.c:5089) [ 1481.456181]
> > > __lock_acquire (kernel/locking/lockdep.c:5089) [ 1481.460030] ?
> > > check_chain_key (kernel/locking/lockdep.c:3967) [ 1481.464137] ?
> > > __lock_acquire (kernel/locking/lockdep.c:5240) [ 1481.468240] ?
> > > __pfx___lock_acquire (kernel/locking/lockdep.c:5077) [ 1481.472604]
> > > lock_acquire (kernel/locking/lockdep.c:467
> > > kernel/locking/lockdep.c:5851 kernel/locking/lockdep.c:5814) [
> > > 1481.476331] ? drm_sched_run_job_work
> > > (drivers/gpu/drm/scheduler/sched_main.c:344
> > > drivers/gpu/drm/scheduler/sched_main.c:1068
> > > drivers/gpu/drm/scheduler/sched_main.c:1199) gpu_sched [
> > > 1481.482099] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5817) [
> > > 1481.486292] ? lock_acquire (kernel/locking/lockdep.c:467
> > > kernel/locking/lockdep.c:5851 kernel/locking/lockdep.c:5814) [
> > > 1481.490143] ? process_one_work (kernel/workqueue.c:3212) [
> > > 1481.494360] _raw_spin_lock (./include/linux/spinlock_api_smp.h:134
> > > kernel/locking/spinlock.c:154) [ 1481.498061] ?
> > > drm_sched_run_job_work (drivers/gpu/drm/scheduler/sched_main.c:344
> > > drivers/gpu/drm/scheduler/sched_main.c:1068
> > > drivers/gpu/drm/scheduler/sched_main.c:1199) gpu_sched [
> > > 1481.503825] drm_sched_run_job_work
> > > (drivers/gpu/drm/scheduler/sched_main.c:344
> > > drivers/gpu/drm/scheduler/sched_main.c:1068
> > > drivers/gpu/drm/scheduler/sched_main.c:1199) gpu_sched
> >
> > This worker is flushed via xe_sched_submission_stop in xe_sched_fini.
> > Again we shouldn't be calling xe_sched_submission_start after this. If
> > we are, this is a bug. We'd still get a UAF if we did this too as the
> > scheduler memory is immediately freed after calling xe_sched_fini too.
> >
> 
> Maybe add a cancel_work_sync(&ge->sched.base.work_tdr) in
> __guc_exec_queue_fini_asyns before calling xe_sched_fini.

Hmm, this looks to be good solution, as I am suspecting Job_timeout is only triggering the submission start again in that narrow race area before exec_queue_fini and after sched_fini. Looks like this should also help. Let me test this.

Tejas
> 
> Matt
> 
> > Matt
> >
> > > [ 1481.509428] process_one_work (kernel/workqueue.c:3236) [
> > > 1481.513477] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5817) [
> > > 1481.517693] ? __pfx_process_one_work (kernel/workqueue.c:3138) [
> > > 1481.522275] ? __list_add_valid_or_report (lib/list_debug.c:32) [
> > > 1481.527155] worker_thread (kernel/workqueue.c:3311
> > > kernel/workqueue.c:3398) [ 1481.530966] ? __pfx_worker_thread
> > > (kernel/workqueue.c:3344) [ 1481.535243] kthread
> > > (kernel/kthread.c:389) [ 1481.538510] ? kthread
> > > (kernel/kthread.c:374) [ 1481.541824] ? __pfx_kthread
> > > (kernel/kthread.c:342) [ 1481.545569] ret_from_fork
> > > (arch/x86/kernel/process.c:147) [ 1481.549203] ? __pfx_kthread
> > > (kernel/kthread.c:342) [ 1481.552986] ret_from_fork_asm
> > > (arch/x86/entry/entry_64.S:257) [ 1481.556918]  </TASK>
> > >
> > > [ 1481.560615] Allocated by task 8713:
> > > [ 1481.564099] kasan_save_stack (mm/kasan/common.c:48) [
> > > 1481.564104] kasan_save_track (./arch/x86/include/asm/current.h:49
> > > mm/kasan/common.c:60 mm/kasan/common.c:69) [ 1481.564108]
> > > __kasan_kmalloc (mm/kasan/common.c:398) [ 1481.564112]
> > > drm_sched_init (drivers/gpu/drm/scheduler/sched_main.c:1314)
> > > gpu_sched [ 1481.564121] guc_exec_queue_init
> > > (drivers/gpu/drm/xe/xe_guc_submit.c:1445 (discriminator 6)) xe [
> > > 1481.564349] xe_exec_queue_create
> > > (drivers/gpu/drm/xe/xe_exec_queue.c:152
> > > drivers/gpu/drm/xe/xe_exec_queue.c:183) xe [ 1481.564563]
> > > xe_exec_queue_create_ioctl (drivers/gpu/drm/xe/xe_exec_queue.c:687)
> > > xe [ 1481.564777] drm_ioctl_kernel (drivers/gpu/drm/drm_ioctl.c:796)
> > > drm [ 1481.564859] drm_ioctl (drivers/gpu/drm/drm_ioctl.c:894) drm [
> > > 1481.564937] xe_drm_ioctl (drivers/gpu/drm/xe/xe_device.c:210) xe [
> > > 1481.565149] __x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:906
> > > fs/ioctl.c:892 fs/ioctl.c:892) [ 1481.565154] do_syscall_64
> > > (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) [
> > > 1481.565159] entry_SYSCALL_64_after_hwframe
> > > (arch/x86/entry/entry_64.S:130)
> > >
> > > Tejas
> > > >
> > > > Matt
> > > >
> > > > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_gpu_scheduler.c | 8 ++++++++
> > > > >  1 file changed, 8 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > > > > b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > > > > index 50361b4638f9..42ca3c4a299a 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c
> > > > > @@ -80,6 +80,14 @@ void xe_sched_fini(struct xe_gpu_scheduler
> > > > > *sched)
> > > > >
> > > > >  void xe_sched_submission_start(struct xe_gpu_scheduler *sched)
> > > > > {
> > > > > +	/* Async call to __guc_exec_queue_fini_async frees scheduler
> > > > > +	 * while some scheduler submission would have already
> started.
> > > > > +	 * To handle such small window race case, submission should
> only
> > > > > +	 * start when scheduler is ready.
> > > > > +	 */
> > > > > +	if (!drm_sched_wqueue_ready(&sched->base))
> > > > > +		return;
> > > > > +
> > > > >  	drm_sched_wqueue_start(&sched->base);
> > > > >  	queue_work(sched->base.submit_wq, &sched->work_process_msg);
> > > > }
> > > > > --
> > > > > 2.34.1
> > > > >