[PATCH] drm: Don't free jobs in wait_event_interruptible()

Thu Sep 26 14:31:20 UTC 2019

On 9/26/19 5:41 AM, Steven Price wrote:
> On 25/09/2019 21:09, Grodzovsky, Andrey wrote:
>> On 9/25/19 12:07 PM, Andrey Grodzovsky wrote:
>>> On 9/25/19 12:00 PM, Steven Price wrote:
>>>
>>>> On 25/09/2019 16:56, Grodzovsky, Andrey wrote:
>>>>> On 9/25/19 11:14 AM, Steven Price wrote:
>>>>>
>>>>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however
>>>>>> because
>>>>>> it is called as the condition of wait_event_interruptible() it must
>>>>>> not
>>>>>> sleep. Unfortunately some free callbacks (notably for Panfrost) do
>>>>>> sleep.
>>>>>>
>>>>>> Instead let's rename drm_sched_cleanup_jobs() to
>>>>>> drm_sched_get_cleanup_job() and simply return a job for processing if
>>>>>> there is one. The caller can then call the free_job() callback outside
>>>>>> the wait_event_interruptible() where sleeping is possible before
>>>>>> re-checking and returning to sleep if necessary.
>>>>>>
>>>>>> Signed-off-by: Steven Price <steven.price at arm.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/scheduler/sched_main.c | 44
>>>>>> ++++++++++++++------------
>>>>>>     1 file changed, 24 insertions(+), 20 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 9a0ee74d82dc..0ed4aaa4e6d1 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct
>>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>>>     }
>>>>>>        /**
>>>>>> - * drm_sched_cleanup_jobs - destroy finished jobs
>>>>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be
>>>>>> destroyed
>>>>>>      *
>>>>>>      * @sched: scheduler instance
>>>>>>      *
>>>>>> - * Remove all finished jobs from the mirror list and destroy them.
>>>>>> + * Returns the next finished job from the mirror list (if there is
>>>>>> one)
>>>>>> + * ready for it to be destroyed.
>>>>>>      */
>>>>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>>> +static struct drm_sched_job *
>>>>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>>>>>     {
>>>>>> +    struct drm_sched_job *job = NULL;
>>>>>>         unsigned long flags;
>>>>>>            /* Don't destroy jobs while the timeout worker is running */
>>>>>>         if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>>>             !cancel_delayed_work(&sched->work_tdr))
>>>>>> -        return;
>>>>>> -
>>>>>> -
>>>>>> -    while (!list_empty(&sched->ring_mirror_list)) {
>>>>>> -        struct drm_sched_job *job;
>>>>>> +        return NULL;
>>>>>>     -        job = list_first_entry(&sched->ring_mirror_list,
>>>>>> +    job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>                            struct drm_sched_job, node);
>>>>>> -        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>>> -            break;
>>>>>>     -        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +
>>>>>> +    if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>>>             /* remove job from ring_mirror_list */
>>>>>>             list_del_init(&job->node);
>>>>>> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> -
>>>>>> -        sched->ops->free_job(job);
>>>>>> +    } else {
>>>>>> +        job = NULL;
>>>>>> +        /* queue timeout for next job */
>>>>>> +        drm_sched_start_timeout(sched);
>>>>>>         }
>>>>>>     -    /* queue timeout for next job */
>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> -    drm_sched_start_timeout(sched);
>>>>>>         spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>     +    return job;
>>>>>>     }
>>>>>>        /**
>>>>>> @@ -698,12 +696,18 @@ static int drm_sched_main(void *param)
>>>>>>             struct drm_sched_fence *s_fence;
>>>>>>             struct drm_sched_job *sched_job;
>>>>>>             struct dma_fence *fence;
>>>>>> +        struct drm_sched_job *cleanup_job = NULL;
>>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>>> -                     (drm_sched_cleanup_jobs(sched),
>>>>>> +                     (cleanup_job =
>>>>>> drm_sched_get_cleanup_job(sched)) ||
>>>>>>                          (!drm_sched_blocked(sched) &&
>>>>>>                           (entity = drm_sched_select_entity(sched))) ||
>>>>>> -                     kthread_should_stop()));
>>>>>> +                     kthread_should_stop());
>>>>> Can't we just call drm_sched_cleanup_jobs right here, remove all the
>>>>> conditions in wait_event_interruptible (make it always true) and after
>>>>> drm_sched_cleanup_jobs is called test for all those conditions and
>>>>> return to sleep if they evaluate to false ? drm_sched_cleanup_jobs is
>>>>> called unconditionally inside wait_event_interruptible anyway...
>>>>> This is
>>>>> more of a question to Christian.
>>>> Christian may know better than me, but I think those conditions need to
>>>> be in wait_event_interruptible() to avoid race conditions. If we simply
>>>> replace all the conditions with a literal "true" then
>>>> wait_event_interruptible() will never actually sleep.
>>>>
>>>> Steve
>>> Yes you right, it won't work as I missed that condition is evaluated
>>> as first step in wait_event_interruptible before it sleeps.
>>>
>>> Andrey
>> Another idea  - what about still just relocating drm_sched_cleanup_jobs
>> to after wait_event_interruptible and also call it in drm_sched_fini so
>> for the case when it will not be called from drm_sched_main due to
>> conditions not evaluating to true  it eventually be called last time
>> from drm_sched_fini. I mean - the refactor looks ok to me but the code
>> becomes  somewhat confusing this way to grasp.
>>
>> Andrey
> That sounds similar to my first stab at this[1]. However Christian
> pointed out that it is necessary to also free jobs even if there isn't a
> new one to be scheduled. Otherwise it ends up with the jobs lying around
> until something kicks it.

But if there is no new job to be scheduled then no one will wake up the 
sched->wake_up_worker and the condition in wait_event_interruptible is 
reevaluated only when the wq head is waked up.

>
> There is also the aspect of queueing the timeout for the next job - this
> is the part that I don't actually understand, but removing it from the
> wait_event_interruptible() invariable seems to cause problems. Hence
> this approach which avoids changing this behaviour. But I welcome input
> from anyone who understands this timeout mechanism!

Not familiar with this problem as before it was done from 
wait_event_interruptible it was done from scheduled work fired from 
job's completion interrupt  and I don't remember any issues with it.

In either case I think both solutions are legit so Reviewed-by: Andrey 
Grodzovsky <andrey.grodzovsky at amd.com>

Andrey

>
> Steve
>
> [1]
> https://lists.freedesktop.org/archives/dri-devel/2019-September/235346.html