[PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

Christian König christian.koenig at amd.com
Thu Nov 2 10:48:25 UTC 2023


Am 01.11.23 um 09:13 schrieb Daniel Vetter:
> On Wed, 1 Nov 2023 at 07:59, Dave Airlie <airlied at gmail.com> wrote:
>>> Well, to make it clear once more: Signaling a dma_fence from the
>>> destructor of a reference counted object is very problematic! This will
>>> be rejected no matter if you do that in C or in Rust.
>>>
>>> What we can do is to make it safe in the sense that you don't access
>>> freed up memory by using the scheduler fences even more as wrapper
>>> around the hardware fence as we do now. But this quite a change and
>>> requires a bit more than just hacking around
>>> drm_sched_fence_get_timeline_name().
>> I really think this needs to be documented if nothing else out of this thread.
>>
>> Clearly nobody is going to get it right and hidden here in this
>> thread, this info isn't useful.
>>
>> Can we have some sort of design document for the dma-fence/scheduler
>> interactions written and we can try and refine it with solutions on
>> the list, because I'm tired of people proposing things and NAK's
>> getting thrown around without anything to point people at.
>>
>> The next NAK I see on the list will mean I block all patches from the
>> sender until they write a documentation patch, because seriously this
>> stuff is too hard for someone to just keep it in their head and expect
>> everyone else to understand from reading the code.
> I very much like the idea that NAK replies are counted as "you've just
> volunteered yourself for some documentation patches so that next time
> around you can reply with a link to the docs instead of just a NAK".

Yeah, that sounds like a great idea to me as well :)

Especially when I can use it to convince managers that we need to have 
more work force on writing documentation.

> I don't think we'll get out of these discussions otherwise, since
> currently we have undocumented, but very tricky semantics of the
> drm/sched codebase for ringbuffer scheduling which is extended to fw
> scheduling in also very tricky ways, with not entirely clear impacts
> on semantics of all the drm/sched things. And as a result we just pile
> up enormous amounts of threads where I think the only thing assured is
> that people talk past each another.

The scheduler is certainly the ugliest part, but it's unfortunately 
still only the tip of the iceberg.

I have seen at least halve a dozen approach in the last two years where 
people tried to signal a dma_fence from userspace or similar.

Fortunately it was mostly prototyping and I could jump in early enough 
to stop that, but basically this is a fight against windmills.

I was considering to change the dma_fence semantics so that 
dma_fence_signal() could only be called from the interrupt contexts of 
devices and then put a big fat WARN_ON(!in_interrupt()) in there.

It's a sledgehammer, but as far as I can see the only thing which might 
help. Opinions?

Thanks,
Christian.

>
> Converting NAKs into doc patches should at least eventually get rid of
> the worst confusions we're dealing with here.
>
> Cheers, Sima



More information about the dri-devel mailing list