[PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend

Fri Mar 18 16:27:34 UTC 2022

On 2022-03-18 12:20, Rob Clark wrote:
> On Fri, Mar 18, 2022 at 9:04 AM Andrey Grodzovsky
> <andrey.grodzovsky at amd.com> wrote:
>>
>> On 2022-03-17 16:35, Rob Clark wrote:
>>> On Thu, Mar 17, 2022 at 12:50 PM Andrey Grodzovsky
>>> <andrey.grodzovsky at amd.com> wrote:
>>>> On 2022-03-17 14:25, Rob Clark wrote:
>>>>> On Thu, Mar 17, 2022 at 11:10 AM Andrey Grodzovsky
>>>>> <andrey.grodzovsky at amd.com> wrote:
>>>>>> On 2022-03-17 13:35, Rob Clark wrote:
>>>>>>> On Thu, Mar 17, 2022 at 9:45 AM Christian König
>>>>>>> <christian.koenig at amd.com> wrote:
>>>>>>>> Am 17.03.22 um 17:18 schrieb Rob Clark:
>>>>>>>>> On Thu, Mar 17, 2022 at 9:04 AM Christian König
>>>>>>>>> <christian.koenig at amd.com> wrote:
>>>>>>>>>> Am 17.03.22 um 16:10 schrieb Rob Clark:
>>>>>>>>>>> [SNIP]
>>>>>>>>>>> userspace frozen != kthread frozen .. that is what this patch is
>>>>>>>>>>> trying to address, so we aren't racing between shutting down the hw
>>>>>>>>>>> and the scheduler shoveling more jobs at us.
>>>>>>>>>> Well exactly that's the problem. The scheduler is supposed to shoveling
>>>>>>>>>> more jobs at us until it is empty.
>>>>>>>>>>
>>>>>>>>>> Thinking more about it we will then keep some dma_fence instance
>>>>>>>>>> unsignaled and that is and extremely bad idea since it can lead to
>>>>>>>>>> deadlocks during suspend.
>>>>>>>>> Hmm, perhaps that is true if you need to migrate things out of vram?
>>>>>>>>> It is at least not a problem when vram is not involved.
>>>>>>>> No, it's much wider than that.
>>>>>>>>
>>>>>>>> See what can happen is that the memory management shrinkers want to wait
>>>>>>>> for a dma_fence during suspend.
>>>>>>> we don't wait on fences in shrinker, only purging or evicting things
>>>>>>> that are already ready.  Actually, waiting on fences in shrinker path
>>>>>>> sounds like a pretty bad idea.
>>>>>>>
>>>>>>>> And if you stop the scheduler they will just wait forever.
>>>>>>>>
>>>>>>>> What you need to do instead is to drain the scheduler, e.g. call
>>>>>>>> drm_sched_entity_flush() with a proper timeout for each entity you have
>>>>>>>> created.
>>>>>>> yeah, it would work to drain the scheduler.. I guess that might be the
>>>>>>> more portable approach as far as generic solution for suspend.
>>>>>>>
>>>>>>> BR,
>>>>>>> -R
>>>>>> I am not sure how this drains the scheduler ? Suppose we done the
>>>>>> waiting in drm_sched_entity_flush,
>>>>>> what prevents someone to push right away another job into the same
>>>>>> entity's queue  right after that ?
>>>>>> Shouldn't we first disable further pushing of jobs into entity before we
>>>>>> wait for  sched->job_scheduled ?
>>>>>>
>>>>> In the system suspend path, userspace processes will have already been
>>>>> frozen, so there should be no way to push more jobs to the scheduler,
>>>>> unless they are pushed from the kernel itself.
>>>>> amdgpu_device_suspend
>>>> It was my suspicion but I wasn't sure about it.
>>>>
>>>>
>>>>> We don't do that in
>>>>> drm/msm, but maybe you need to to move things btwn vram and system
>>>>> memory?
>>>> Exactly, that was my main concern - if we use this method we have to use
>>>> it in a point in
>>>> suspend sequence when all the in kernel job submissions activity already
>>>> suspended
>>>>
>>>>> But even in that case, if the # of jobs you push is bounded I
>>>>> guess that is ok?
>>>> Submissions to scheduler entities are using unbounded queue, the bounded
>>>> part is when
>>>> you extract next job from entity to submit to HW ring and it rejects if
>>>> submission limit reached (drm_sched_ready)
>>>>
>>>> In general - It looks to me at least that what we what we want her is
>>>> more of a drain operation then flush (i.e.
>>>> we first want to disable any further job submission to entity's queue
>>>> and then flush all in flight ones). As example
>>>> for this i was looking at  flush_workqueue vs. drain_workqueue
>>> Would it be possible for amdgpu to, in the system suspend task,
>>>
>>> 1) first queue up all the jobs needed to migrate bos out of vram, and
>>> whatever other housekeeping jobs are needed
>>> 2) then drain gpu scheduler's queues
>>> 3) and then finally wait for jobs executing on GPU to complete
>>
>> We already do most of it in amdgpu_device_suspend,
>> amdgpu_device_ip_suspend_phase1
>> followed by amdgpu_device_evict_resources followed by
>> amdgpu_fence_driver_hw_fini is
>> exactly steps 1 + 3. What we are missing is step 2). For this step I
>> suggest adding a function
>> called drm_sched_entity_drain which basically sets entity->stopped =
>> true and then calls drm_sched_entity_flush.
>> This will both reject any new insertions into entity's job queue and
>> will flush all pending job submissions to HW from that entity.
>> One point is we need to make make drm_sched_entity_push_job return value
>> so the caller knows about job enqueue
>> rejection.
> Hmm, seems like job enqueue that is rejected because we are in the
> process of suspending should be more of a WARN_ON() sort of thing?
> Not sure if there is something sensible to do for the caller at that
> point?

What about the job's fence the caller is waiting on ? If we rejected
job submission the caller must know about it to not get stuck waiting
on that fence.

>
>> What about runtime suspend ? I guess same issue with scheduler racing
>> against HW susppend is relevant there ?
> Runtime suspend should be ok, as long as the driver holds a runpm
> reference whenever the hw needs to be awake.  The problem with system
> suspend (at least if you are using pm_runtime_force_suspend() or doing
> something equivalent) is that it bypasses the runpm reference.
> (Which, IMO, seems like a bad design..)

I am not totally clear  yet - can you expand a bit why one case is ok 
but the other
problematic ?

Andrey

>
>> Also, could you point to a particular buggy scenario where the race
>> between SW shceduler and suspend is causing a problem ?
> I wrote a piglit test[1] to try to trigger this scenario.. it isn't
> really that easy to hit
>
> BR,
> -R
>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fpiglit%2F-%2Fmerge_requests%2F643&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C502ac8db4fb94b3b0e9d08da08fb270e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637832172051790527%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=u2Fqq%2BZpmjFHQFK77xwxEA5092O3Nc%2FdCMllfejgnvU%3D&reserved=0
>
>> Andrey
>>
>>
>>> BR,
>>> -R
>>>
>>>> Andrey
>>>>
>>>>
>>>>> BR,
>>>>> -R