BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]
Christian König
ckoenig.leichtzumerken at gmail.com
Wed Apr 19 08:12:24 UTC 2023
Am 19.04.23 um 09:00 schrieb Mikhail Gavrilov:
> Christian?
I'm already looking into this, but can't figure out why we run into
problems here.
What happens is that a CS is aborted without sending the job to the
scheduler and in this case the cleanup function doesn't seem to work.
Christian.
>
> ❯ /usr/src/kernels/6.3.0-0.rc7.56.fc39.x86_64/scripts/faddr2line
> /lib/debug/lib/modules/6.3.0-0.rc7.56.fc39.x86_64/kernel/drivers/gpu/drm/scheduler/gpu-sched.ko.debug
> drm_sched_job_cleanup+0x9a
> drm_sched_job_cleanup+0x9a/0x130:
> drm_sched_job_cleanup at
> /usr/src/debug/kernel-6.3-rc7/linux-6.3.0-0.rc7.56.fc39.x86_64/drivers/gpu/drm/scheduler/sched_main.c:808
> (discriminator 3)
>
> ❯ cat -s -n /usr/src/debug/kernel-6.3-rc7/linux-6.3.0-0.rc7.56.fc39.x86_64/drivers/gpu/drm/scheduler/sched_main.c
> | head -818 | tail -20
> 799 /* drm_sched_job_arm() has been called */
> 800 dma_fence_put(&job->s_fence->finished);
> 801 } else {
> 802 /* aborted job before committing to run it */
> 803 drm_sched_fence_free(job->s_fence);
> 804 }
> 805
> 806 job->s_fence = NULL;
> 807
> 808 xa_for_each(&job->dependencies, index, fence) {
> 809 dma_fence_put(fence);
> 810 }
> 811 xa_destroy(&job->dependencies);
> 812
> 813 }
> 814 EXPORT_SYMBOL(drm_sched_job_cleanup);
> 815
> 816 /**
> 817 * drm_sched_ready - is the scheduler ready
> 818 *
>
>> git blame drivers/gpu/drm/scheduler/sched_main.c -L 800,819
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-17 10:49:16 +0200 800)
> dma_fence_put(&job->s_fence->finished);
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-17 10:49:16 +0200 801) } else {
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-17 10:49:16 +0200 802) /* aborted job
> before committing to run it */
> d4c16733e7960 drivers/gpu/drm/scheduler/sched_main.c (Boris
> Brezillon 2021-09-03 14:05:54 +0200 803)
> drm_sched_fence_free(job->s_fence);
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-17 10:49:16 +0200 804) }
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-17 10:49:16 +0200 805)
> 26efecf955889 drivers/gpu/drm/scheduler/sched_main.c (Sharat
> Masetty 2018-10-29 15:02:28 +0530 806) job->s_fence = NULL;
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-05 12:46:49 +0200 807)
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-05 12:46:49 +0200 808)
> xa_for_each(&job->dependencies, index, fence) {
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-05 12:46:49 +0200 809)
> dma_fence_put(fence);
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-05 12:46:49 +0200 810) }
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-05 12:46:49 +0200 811)
> xa_destroy(&job->dependencies);
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c (Daniel
> Vetter 2021-08-05 12:46:49 +0200 812)
> 26efecf955889 drivers/gpu/drm/scheduler/sched_main.c (Sharat
> Masetty 2018-10-29 15:02:28 +0530 813) }
> 26efecf955889 drivers/gpu/drm/scheduler/sched_main.c (Sharat
> Masetty 2018-10-29 15:02:28 +0530 814)
> EXPORT_SYMBOL(drm_sched_job_cleanup);
> 26efecf955889 drivers/gpu/drm/scheduler/sched_main.c (Sharat
> Masetty 2018-10-29 15:02:28 +0530 815)
> e688b728228b9 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c (Christian
> König 2015-08-20 17:01:01 +0200 816) /**
> 2d33948e4e00b drivers/gpu/drm/scheduler/gpu_scheduler.c (Nayan
> Deshmukh 2018-05-29 11:23:07 +0530 817) * drm_sched_ready - is the
> scheduler ready
> 2d33948e4e00b drivers/gpu/drm/scheduler/gpu_scheduler.c (Nayan
> Deshmukh 2018-05-29 11:23:07 +0530 818) *
> 2d33948e4e00b drivers/gpu/drm/scheduler/gpu_scheduler.c (Nayan
> Deshmukh 2018-05-29 11:23:07 +0530 819) * @sched: scheduler instance
>
> Daniel, because Christian, looks a little busy. Can you help? The git
> blame says that you are the author of code which KASAN mentions in its
> report.
> The issue is reproducible on all available AMD hardware: 6800M, 6900XT, 7900XTX.
>
More information about the amd-gfx
mailing list