[RFC PATCH] drm/sched: Fix a UAF on drm_sched_fence::sched

Boris Brezillon boris.brezillon at collabora.com
Fri Aug 30 10:44:29 UTC 2024


On Fri, 30 Aug 2024 11:37:21 +0200
Boris Brezillon <boris.brezillon at collabora.com> wrote:

> > > With the introduction of a new model where each entity has its own
> > > drm_gpu_scheduler instance, this situation is likely to happen every time
> > > a GPU context is destroyed and some of its fences remain attached to
> > > dma_buf objects still owned by other drivers/processes.
> > >
> > > In order to make drm_sched_fence_get_timeline_name() safe, we need to
> > > copy the scheduler name into our own refcounted object that's only
> > > destroyed when both the scheduler and all its fences are gone.
> > >
> > > The fact drm_sched_fence might have a reference to the drm_gpu_scheduler
> > > even after it's been released is worrisome though, but I'd rather
> > > discuss that with everyone than come up with a solution that's likely
> > > to end up being rejected.
> > >
> > > Note that the bug was found while repeatedly reading dma_buf's debugfs
> > > file, which, at some point, calls dma_resv_describe() on a resv that
> > > contains signalled fences coming from a destroyed GPU context.
> > > AFAIK, there's nothing invalid there.    
> > 
> > Yeah but reading debugfs is not guaranteed to crash the kernel.
> > 
> > On the other hand the approach with a kref'ed string looks rather sane 
> > to me. One comment on this below.  
> 
> There's still the problem I mentioned above (unloading drm_sched can
> make things crash). Are there any plans to fix that? The simple option
> would be to prevent compiling drm_sched as a module, but that's not an
> option because it depends on DRM which is a tristate too. Maybe we
> could have drm_sched_fence.o linked statically, just like dma-fence.c
> is linked statically to prevent the stub ops from disappearing.
> Not sure if drm_sched_fence.c depends on symbols defined in
> sched_{main,entity}.c or other parts of the DRM subsystem though.

For the record, I gave it a try, and linking drm_sched_fence.o
statically while leaving the rest of drm_sched as a module seems to
work. I just sent an RFC for that [1].

[1]https://lore.kernel.org/dri-devel/20240830104057.1479321-1-boris.brezillon@collabora.com/T/#u


More information about the dri-devel mailing list