[PATCH v4] drm/scheduler: Avoid accessing freed bad job.

Thu Feb 6 15:49:45 UTC 2020

On 2/6/20 9:51 AM, Christian König wrote:
> Am 06.02.20 um 15:49 schrieb Alex Deucher:
>> On Thu, Feb 6, 2020 at 6:50 AM Christian König
>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>> Am 06.02.20 um 12:10 schrieb Lucas Stach:
>>>> Hi all,
>>>>
>>>> On Mi, 2020-02-05 at 19:24 +0100, Lucas Stach wrote:
>>>>> Hi Andrey,
>>>>>
>>>>> This commit breaks all drivers, which may bail out of the timeout
>>>>> processing as they wish to extend the timeout (etnaviv, v3d).
>>>>>
>>>>> Those drivers currently just return from the timeout handler before
>>>>> calling drm_sched_stop(), which means with this commit applied we are
>>>>> removing the first job from the ring_mirror_list, but never put it
>>>>> back. This leads to jobs getting lost from the ring mirror, which 
>>>>> then
>>>>> causes quite a bit of fallout like unsignaled fences.
>>>>>
>>>>> Not sure yet what to do about it, we can either add a function to add
>>>>> the job back to the ring_mirror if the driver wants to extend the
>>>>> timeout, or we could look for another way to stop
>>>>> drm_sched_cleanup_jobs from freeing jobs that are currently in 
>>>>> timeout
>>>>> processing.
>>>> So after thinking about this a bit more my opinion is that we need to
>>>> revert this change for now and go back to the drawing board for the
>>>> scheduler timeout handling.
>>>>
>>>> Right now this starts to feel like a big midlayer mistake with all the
>>>> very intricate intertwining between the drivers and the scheduler. The
>>>> rules on when it's safe to manipulate the ring mirror and when
>>>> completed jobs are signaled and freed are not really well specified.
>>>> The fact that we need to mutate state in order to get rid of races
>>>> instead of having a single big "timeout processing is owner of the
>>>> scheduler state for now" is a big fat warning sign IMHO.
>>> Yes, that strongly feels like a hack to me as well. But I didn't had
>>> time and still haven't to take a closer look and suggest something 
>>> better.
>>>
>> In that case, can someone send me a revert?
>
> Well a revert would break our driver.
>
> The real solution is that somebody needs to sit down, gather ALL the 
> requirements and then come up with a solution which is clean and works 
> for everyone.
>
> Christian.

I can to take on this as indeed our general design on this becomes more 
and more entangled as GPU reset scenarios grow in complexity (at least 
in AMD driver). Currently I am on a high priority internal task which 
should take me around a week or 2 to finish and after that I can get to it.

Regarding temporary solution  - I looked into v3d and etnaviv use cases 
and we in AMD actually face the same scenario where we decide to skip HW 
reset if the guilty job did finish by the time we are processing the 
timeout  (see amdgpu_device_gpu_recover and skip_hw_reset goto) - the 
difference is we always call drm_sched_stop/start irrespectively of 
whether we are going to actually HW reset or not (same as extend 
timeout). I wonder if something like this can be done also for ve3 and 
etnaviv ?

Andrey

>
>>
>> Alex
>>
>>
>>> Christian.
>>>
>>>> It took me far longer than I'd like to admit to understand the failure
>>>> mode with fences not getting signaled after a GPU hang. The back and
>>>> forth between scheduler and driver code makes things really hard to
>>>> follow.
>>>>
>>>> Regards,
>>>> Lucas
>>>>
>>>>> Regards,
>>>>> Lucas
>>>>>
>>>>> On Mo, 2019-11-25 at 15:51 -0500, Andrey Grodzovsky wrote:
>>>>>> Problem:
>>>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and
>>>>>> drm_sched_job_timedout in timeout work there is a possiblity that
>>>>>> bad job was already freed while still being accessed from the
>>>>>> timeout thread.
>>>>>>
>>>>>> Fix:
>>>>>> Instead of just peeking at the bad job in the mirror list
>>>>>> remove it from the list under lock and then put it back later when
>>>>>> we are garanteed no race with main sched thread is possible which
>>>>>> is after the thread is parked.
>>>>>>
>>>>>> v2: Lock around processing ring_mirror_list in 
>>>>>> drm_sched_cleanup_jobs.
>>>>>>
>>>>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
>>>>>> drm_sched_get_cleanup_job already has a lock there.
>>>>>>
>>>>>> v4: Fix comments to relfect latest code in drm-misc.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>> Reviewed-by: Christian König <christian.koenig at amd.com>
>>>>>> Tested-by: Emily Deng <Emily.Deng at amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 27 
>>>>>> +++++++++++++++++++++++++++
>>>>>>    1 file changed, 27 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 6774955..1bf9c40 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct 
>>>>>> work_struct *work)
>>>>>>      unsigned long flags;
>>>>>>
>>>>>>      sched = container_of(work, struct drm_gpu_scheduler, 
>>>>>> work_tdr.work);
>>>>>> +
>>>>>> +   /* Protects against concurrent deletion in 
>>>>>> drm_sched_get_cleanup_job */
>>>>>> +   spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>      job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>                                     struct drm_sched_job, node);
>>>>>>
>>>>>>      if (job) {
>>>>>> +           /*
>>>>>> +            * Remove the bad job so it cannot be freed by 
>>>>>> concurrent
>>>>>> +            * drm_sched_cleanup_jobs. It will be reinserted back 
>>>>>> after sched->thread
>>>>>> +            * is parked at which point it's safe.
>>>>>> +            */
>>>>>> +           list_del_init(&job->node);
>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> +
>>>>>> job->sched->ops->timedout_job(job);
>>>>>>
>>>>>>              /*
>>>>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct 
>>>>>> work_struct *work)
>>>>>> job->sched->ops->free_job(job);
>>>>>>                      sched->free_guilty = false;
>>>>>>              }
>>>>>> +   } else {
>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>      }
>>>>>>
>>>>>>      spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> @@ -370,6 +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>>>>> *sched, struct drm_sched_job *bad)
>>>>>>      kthread_park(sched->thread);
>>>>>>
>>>>>>      /*
>>>>>> +    * Reinsert back the bad job here - now it's safe as
>>>>>> +    * drm_sched_get_cleanup_job cannot race against us and 
>>>>>> release the
>>>>>> +    * bad job at this point - we parked (waited for) any in 
>>>>>> progress
>>>>>> +    * (earlier) cleanups and drm_sched_get_cleanup_job will not 
>>>>>> be called
>>>>>> +    * now until the scheduler thread is unparked.
>>>>>> +    */
>>>>>> +   if (bad && bad->sched == sched)
>>>>>> +           /*
>>>>>> +            * Add at the head of the queue to reflect it was the 
>>>>>> earliest
>>>>>> +            * job extracted.
>>>>>> +            */
>>>>>> +           list_add(&bad->node, &sched->ring_mirror_list);
>>>>>> +
>>>>>> +   /*
>>>>>>       * Iterate the job list from later to  earlier one and 
>>>>>> either deactive
>>>>>>       * their HW callbacks or remove them from mirror list if 
>>>>>> they already
>>>>>>       * signaled.
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel at lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&sdata=L7Hin%2Faw7vK9IYBaZn%2BVmWZKzjqTYBsvJ%2BIL80qB3M4%3D&reserved=0 
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&sdata=94EyD8X91MT5IVE8TN9%2FRYed8aIX6tN1Pvl8LJBkCeU%3D&reserved=0 
>>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=02%7C01%7Cchristian.koenig%40amd.com%7Ce88b51a2443741b0b56f08d7ab13da74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165974076779365&sdata=L7Hin%2Faw7vK9IYBaZn%2BVmWZKzjqTYBsvJ%2BIL80qB3M4%3D&reserved=0 
>>>
>