[PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

Asahi Lina lina at asahilina.net
Tue Jul 18 02:35:51 UTC 2023


On 18/07/2023 00.55, Christian König wrote:
> Am 15.07.23 um 16:14 schrieb alyssa at rosenzweig.io:
>> 15 July 2023 at 00:03, "Luben Tuikov" <luben.tuikov at amd.com> wrote:
>>> On 2023-07-14 05:57, Christian König wrote:
>>>
>>>> Am 14.07.23 um 11:49 schrieb Asahi Lina:
>>>>
>>>>> On 14/07/2023 17.43, Christian König wrote:
>>>>>
>>>>    Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>>>    A signaled scheduler fence can outlive its scheduler, since fences are
>>>>    independencly reference counted. Therefore, we can't reference the
>>>>    scheduler in the get_timeline_name() implementation.
>>>>
>>>>    Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
>>>>    dma-bufs reference fences from GPU schedulers that no longer exist.
>>>>
>>>>    Signed-off-by: Asahi Lina <lina at asahilina.net>
>>>>    ---
>>>>       drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++-
>>>>       drivers/gpu/drm/scheduler/sched_fence.c  | 4 +++-
>>>>       include/drm/gpu_scheduler.h              | 5 +++++
>>>>       3 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>>    diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>    b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>    index b2bbc8a68b30..17f35b0b005a 100644
>>>>    --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>    +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>    @@ -389,7 +389,12 @@ static bool
>>>>    drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>>                  /*
>>>>                * Fence is from the same scheduler, only need to wait for
>>>>    -         * it to be scheduled
>>>>    +         * it to be scheduled.
>>>>    +         *
>>>>    +         * Note: s_fence->sched could have been freed and reallocated
>>>>    +         * as another scheduler. This false positive case is okay,
>>>>    as if
>>>>    +         * the old scheduler was freed all of its jobs must have
>>>>    +         * signaled their completion fences.
>>>>
>>>>    This is outright nonsense. As long as an entity for a scheduler exists
>>>>    it is not allowed to free up this scheduler.
>>>>
>>>>    So this function can't be called like this.
>>>>
>>>>> As I already explained, the fences can outlive their scheduler. That
>>>>>    means *this* entity certainly exists for *this* scheduler, but the
>>>>>    *dependency* fence might have come from a past scheduler that was
>>>>>    already destroyed along with all of its entities, and its address reused.
>>>>>
>>>>    
>>>>    Well this is function is not about fences, this function is a callback
>>>>    for the entity.
>>>>    
>>>>
>>>>> Christian, I'm really getting tired of your tone. I don't appreciate
>>>>>    being told my comments are "outright nonsense" when you don't even
>>>>>    take the time to understand what the issue is and what I'm trying to
>>>>>    do/document. If you aren't interested in working with me, I'm just
>>>>>    going to give up on drm_sched, wait until Rust gets workqueue support,
>>>>>    and reimplement it in Rust. You can keep your broken fence lifetime
>>>>>    semantics and I'll do my own thing.
>>>>>
>>>>    
>>>>    I'm certainly trying to help here, but you seem to have unrealistic
>>>>    expectations.
>>>>    
>>>>    I perfectly understand what you are trying to do, but you don't seem to
>>>>    understand that this functionality here isn't made for your use case.
>>>>    
>>>>    We can adjust the functionality to better match your requirements, but
>>>>    you can't say it is broken because it doesn't work when you use it not
>>>>    in the way it is intended to be used.
>>>>
>>> I believe "adjusting" functionality to fit some external requirements,
>>> may have unintended consequences, requiring yet more and more "adjustments".
>>> (Or may allow (new) drivers to do wild things which may lead to wild results. :-) )
>>>
>>> We need to be extra careful and wary of this.
>> Either drm/scheduler is common code that we should use for our driver, in which case we need to "adjust" it to fit the requirements of a safe Rust abstraction usable for AGX.
> 
> Well this is the fundamental disagreement we have. As far as I can see
> you don't need to adjust anything in the common drm/scheduler code.
> 
> That code works with quite a bunch of different drivers, including the
> Intel XE which has similar requirements to your work here.
> 
> We can talk about gradually improving the common code, but as Luben
> already wrote as well this needs to be done very carefully.
> 
>>    Or, drm/scheduler is not common code intended for drivers with our requirements, and then we need to be able to write our own scheduler.
>>
>> AMD has NAK'd both options, effectively NAK'ing the driver.
>>
>> I will ask a simple yes/no question: Should we use drm/sched?
> 
> Well, yes.
> 
>>
>> If yes, it will need patches like these,
> 
> No, you don't.
> 
> First of all you need to try to adjust your driver to match the
> requirements of drm/scheduler and *not* the other way around.
> 
>>    and AMD needs to be ok with that and stop NAK'ing them on sight becuase they don't match the existing requirements.
>>
>> If no, we will write our own scheduler in Rust, and AMD needs to be ok with that and not NAK it on sight because it's not drm/sched.
>>
>> Which is it?
>>
>> Note if we write a Rust scheduler, drm/sched and amdgpu will be unaffected. If we do that and AMD comes back and NAKs it -- as said in this thread would "probably" happen -- then it is impossible for us to upstream a driver regardless of whether we use drm/sched.
>>
>> Lina has been polite and accommodating while AMD calls her code "outright nonsense" and gets "outright NAK"s, and puts her into an impossible catch-22 where no matter what she does it's NAK'd.
> 
> Well as far as I can see I'm totally polite as well.
> 
> Pointing out that approaches doesn't seem to make sense and NAKing
> patches is a perfectly normal part of the review process.
> 
> What you need to to is to take a step back and ask yourself why this
> here is facing so much rejection from our side. I have to admit that I
> don't seem to be good at explaining that, cause we are obviously talking
> past each other, but you don't seem to try hard to understand what I'm
> pointing out either.
> 
>> That's not ok.
> 
> As far as I can see it is.
> 
> As maintainer of a commonly used component my first duty is to preserve
> the status quo and prevent modifications which are not well thought
> through. And to be honest those changes here strongly looks like Lina is
> just adjusting the code to match her requirements without looking left
> and right first.
> 
> Regards,
> Christian.
> 
> 

I give up. You are ignoring everything we say, and rejecting everything 
we suggest. We've already explained why drm_sched doesn't work for us. 
I'm tired of repeating the same explanation over and over again only to 
be ignored and told I'm wrong.

I'll start working on a new, much simpler Rust-native scheduler based on 
the workqueue Rust abstractions which are in review.

~~ Lina



More information about the dri-devel mailing list