[PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

Tue Sep 7 09:25:30 UTC 2021

Am 07.09.21 um 10:47 schrieb Daniel Vetter:
> [SNIP]
>>>>> If we embed it, then I think it should start existing latest from
>>>>> drm_sched_job_arm. Maybe not yet initialized, but at least allocated. So
>>>>> the right thing to do here is to have the hw fence as a pointer in
>>>>> struct drm_sched_job. And check in drm_sched_job_arm() that it's at least
>>>>> allocated.
>>>> Why we need to allocate the HW fence if it's embedded within a job struct ?
>>> the hw fence is a refcounted struct, and the drm_sched_job is a different
>>> struct. And we didn't have a dri-devel discussion about whether it's
>>> correct to conflate these two lifetimes, amdgpu folks simply hacked
>>> something together.
>>
>> Obviously scheduler level changes must be discussed at dri-devel forum
>> level.
>> What happened here and as Monk already mentioned - we had internal
>> discussion
>> about how to fix the problem in the header of this thread - avoiding
>> accessing feed job
>> from TDR handler without the current hack in place of removal and back
>> insertion
>> into pending list. It's there we we came up (I think Christian first
>> mentioned this) with the
>> idea of embedding the HW fence into amdgpu job - both for avoiding memory
>> allocations
>> but also - because this allows to use the HW fence's recounting as a
>> solution to the above
>> problem by simply grabbing a reference to the next fence in the pending list
>> as a first step
>> in the TDR handler. What we didn't take into account at the time is that
>> indeed this change
>> cannot be limited to amdgpu level - this we noticed much later during final
>> code reviews.
> Not sure where this fell through cracks, but imo at least changing where
> the hw fence is allocated is a very fundamental change, and latest then
> you should have discussed this on dri-devel.

I'm the one who kicked this off in April and I made a nice internal 
presentation to explain what the problems is etc... So the idea of 
embedding the hardware fence into the job came from me.

But during the presentation I also noted that we need to sync up with a 
guy named Daniel Vetter because it was his patch set which surfaced this 
issue by annotating fence completion prerequisite in lockdep.

> But even the tdr races would probably have been good to start on
> dri-devel. Now it looks like Monk&team have lost 6 months for nothing.

Well to make it clear I've noted during the presentation in April that 
this needs to be discussed with you, I've also noted to the first guy 
working on this that this needs to be discussed on dri-devel instead of 
internally and I'm pretty sure that I've noted this a couple of more 
times after it moved to somebody else. And IIRC Andrey also noted that 
we should not discuss this internally pretty early as well.

So if people are not listening it is not a surprise that they spend time 
on stuff which isn't upstreamable like this.

Christian.

> -Daniel
>
>
>> Andrey
>>
>>
>>>>> Otherwise we're just diverging across drivers and tempting them to do the
>>>>> wrong thing with the current ->run_job callback interface.
>>>> Maybe we should switch from embedding in driver level job struct as it's now
>>>> to drm_sched_job and just leave the fence initialization to driver specific
>>>> code ?
>>> Maybe? Like I've not been involved in these discussion ont he amd side at
>>> all, I'm just noticing that we do have a now rather inconsistently used
>>> inteface across drivers. Which is no good.
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>> Can you guys look into this?
>>>>> -Daniel
>>>>>
>>>>>>>> Another thing I recently pondered for tdr races looking at i915 code is
>>>>>>>> whether the tdr should first block the completion fence for that job. My
>>>>>>>> motivation is to have a race-free error capture (if the completion races
>>>>>>>> then we might start evicting memory and everything goes boom), but maybe
>>>>>>>> that helps here too. Some kind of atomic "block this fence from
>>>>>>>> completing thing.
>>>>>>>>
>>>>>>>> Or I'm I completely guessing in the wrong direction?
>>>>>>> I think we already do it here - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.14-rc1%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fscheduler%2Fsched_main.c%23L410&data=04%7C01%7Cchristian.koenig%40amd.com%7C485eb1f956d8488a041408d971dc1398%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666013202978201%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=RILakhBoNRBPNFhvI5VfDDUP9l6R%2FnHrylglDBtg7%2Bo%3D&reserved=0
>>>>>> Ah yes this works becase drm/sched has separate hw fence from the logical
>>>>>> job fence.
>>>>>> -Daniel
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                        spin_unlock(&sched->job_list_lock);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                        status = job->sched->ops->timedout_job(job);
>>>>>>>>>>>>>>> @@ -342,6 +342,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>>>>>>>>>>>>>>                                job->sched->ops->free_job(job);
>>>>>>>>>>>>>>>                                sched->free_guilty = false;
>>>>>>>>>>>>>>>                        }
>>>>>>>>>>>>>>> +               dma_fence_put(fence);
>>>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>>>                        spin_unlock(&sched->job_list_lock);
>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>> @@ -392,20 +393,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                kthread_park(sched->thread);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -       /*
>>>>>>>>>>>>>>> -        * Reinsert back the bad job here - now it's safe as
>>>>>>>>>>>>>>> -        * drm_sched_get_cleanup_job cannot race against us and release the
>>>>>>>>>>>>>>> -        * bad job at this point - we parked (waited for) any in progress
>>>>>>>>>>>>>>> -        * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
>>>>>>>>>>>>>>> -        * now until the scheduler thread is unparked.
>>>>>>>>>>>>>>> -        */
>>>>>>>>>>>>>>> -       if (bad && bad->sched == sched)
>>>>>>>>>>>>>>> -               /*
>>>>>>>>>>>>>>> -                * Add at the head of the queue to reflect it was the earliest
>>>>>>>>>>>>>>> -                * job extracted.
>>>>>>>>>>>>>>> -                */
>>>>>>>>>>>>>>> -               list_add(&bad->list, &sched->pending_list);
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>                /*
>>>>>>>>>>>>>>>                 * Iterate the job list from later to  earlier one and either deactive
>>>>>>>>>>>>>>>                 * their HW callbacks or remove them from pending list if they already
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> 2.25.1
>>>>>>>>>>>>>>>
>>>>>> -- 
>>>>>> Daniel Vetter
>>>>>> Software Engineer, Intel Corporation
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C485eb1f956d8488a041408d971dc1398%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666013202978201%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QXBZgAv4sCcE1OTdhC%2BGeRimDFteEC85YEhjJUj7Sig%3D&reserved=0