BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

Christian König ckoenig.leichtzumerken at gmail.com
Wed Apr 19 08:12:24 UTC 2023


Am 19.04.23 um 09:00 schrieb Mikhail Gavrilov:
> Christian?

I'm already looking into this, but can't figure out why we run into 
problems here.

What happens is that a CS is aborted without sending the job to the 
scheduler and in this case the cleanup function doesn't seem to work.

Christian.

>
> ❯ /usr/src/kernels/6.3.0-0.rc7.56.fc39.x86_64/scripts/faddr2line
> /lib/debug/lib/modules/6.3.0-0.rc7.56.fc39.x86_64/kernel/drivers/gpu/drm/scheduler/gpu-sched.ko.debug
> drm_sched_job_cleanup+0x9a
> drm_sched_job_cleanup+0x9a/0x130:
> drm_sched_job_cleanup at
> /usr/src/debug/kernel-6.3-rc7/linux-6.3.0-0.rc7.56.fc39.x86_64/drivers/gpu/drm/scheduler/sched_main.c:808
> (discriminator 3)
>
> ❯ cat -s -n /usr/src/debug/kernel-6.3-rc7/linux-6.3.0-0.rc7.56.fc39.x86_64/drivers/gpu/drm/scheduler/sched_main.c
> | head -818 | tail -20
>     799 /* drm_sched_job_arm() has been called */
>     800 dma_fence_put(&job->s_fence->finished);
>     801 } else {
>     802 /* aborted job before committing to run it */
>     803 drm_sched_fence_free(job->s_fence);
>     804 }
>     805
>     806 job->s_fence = NULL;
>     807
>     808 xa_for_each(&job->dependencies, index, fence) {
>     809 dma_fence_put(fence);
>     810 }
>     811 xa_destroy(&job->dependencies);
>     812
>     813 }
>     814 EXPORT_SYMBOL(drm_sched_job_cleanup);
>     815
>     816 /**
>     817 * drm_sched_ready - is the scheduler ready
>     818 *
>
>> git blame drivers/gpu/drm/scheduler/sched_main.c -L 800,819
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-17 10:49:16 +0200 800)
> dma_fence_put(&job->s_fence->finished);
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-17 10:49:16 +0200 801)     } else {
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-17 10:49:16 +0200 802)             /* aborted job
> before committing to run it */
> d4c16733e7960 drivers/gpu/drm/scheduler/sched_main.c        (Boris
> Brezillon 2021-09-03 14:05:54 +0200 803)
> drm_sched_fence_free(job->s_fence);
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-17 10:49:16 +0200 804)     }
> dbe48d030b285 drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-17 10:49:16 +0200 805)
> 26efecf955889 drivers/gpu/drm/scheduler/sched_main.c        (Sharat
> Masetty  2018-10-29 15:02:28 +0530 806)     job->s_fence = NULL;
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-05 12:46:49 +0200 807)
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-05 12:46:49 +0200 808)
> xa_for_each(&job->dependencies, index, fence) {
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-05 12:46:49 +0200 809)
> dma_fence_put(fence);
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-05 12:46:49 +0200 810)     }
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-05 12:46:49 +0200 811)
> xa_destroy(&job->dependencies);
> ebd5f74255b9f drivers/gpu/drm/scheduler/sched_main.c        (Daniel
> Vetter   2021-08-05 12:46:49 +0200 812)
> 26efecf955889 drivers/gpu/drm/scheduler/sched_main.c        (Sharat
> Masetty  2018-10-29 15:02:28 +0530 813) }
> 26efecf955889 drivers/gpu/drm/scheduler/sched_main.c        (Sharat
> Masetty  2018-10-29 15:02:28 +0530 814)
> EXPORT_SYMBOL(drm_sched_job_cleanup);
> 26efecf955889 drivers/gpu/drm/scheduler/sched_main.c        (Sharat
> Masetty  2018-10-29 15:02:28 +0530 815)
> e688b728228b9 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c (Christian
> König 2015-08-20 17:01:01 +0200 816) /**
> 2d33948e4e00b drivers/gpu/drm/scheduler/gpu_scheduler.c     (Nayan
> Deshmukh  2018-05-29 11:23:07 +0530 817)  * drm_sched_ready - is the
> scheduler ready
> 2d33948e4e00b drivers/gpu/drm/scheduler/gpu_scheduler.c     (Nayan
> Deshmukh  2018-05-29 11:23:07 +0530 818)  *
> 2d33948e4e00b drivers/gpu/drm/scheduler/gpu_scheduler.c     (Nayan
> Deshmukh  2018-05-29 11:23:07 +0530 819)  * @sched: scheduler instance
>
> Daniel, because Christian, looks a little busy. Can you help? The git
> blame says that you are the author of code which KASAN mentions in its
> report.
> The issue is reproducible on all available AMD hardware: 6800M, 6900XT, 7900XTX.
>



More information about the amd-gfx mailing list