[PATCH] drm/scheduler: Add drm_sched_suspend/resume timeout functions

Wed Nov 14 19:03:41 UTC 2018

Am 14.11.18 um 18:29 schrieb Sharat Masetty:
>
>
> On 11/8/2018 8:11 PM, Koenig, Christian wrote:
>> Am 08.11.18 um 14:42 schrieb Sharat Masetty:
>>> Hi Christian,
>>>
>>> Can you please review this patch? It is a continuation of the 
>>> discussion at [1].
>>> At first I was thinking of using a cancel for suspend instead of a 
>>> mod(to an
>>> arbitrarily large value), but I couldn't get it to fit in as I have 
>>> an additional
>>> constraint of being able to call these functions from an IRQ context.
>>>
>>> These new functions race with other contexts, primarily finish_job(),
>>> timedout_job() and recovery(), but I did go through the possible 
>>> races between
>>> these(I think). Please let me know what you think of this? I have 
>>> not tested
>>> this yet and if this is something in the right direction, I will put 
>>> this
>>> through my testing drill and polish it.
>>>
>>> IMO I think I prefer the callback approach as it appears to be 
>>> simple, less
>>> error prone for both the scheduler and the drivers.
>>
>> Well I agree that this is way to complicated and looks racy to me as
>> well. But this is because you moved away from my initial suggestion.
>>
>> So here is once more how to do it without any additional locks or races:
>>
>> /**
>>    * drm_sched_suspend_timeout - suspend timeout for reset worker
>>    *
>>    * @sched: scheduler instance for which to suspend the timeout
>>    *
>>    * Suspend the delayed work timeout for the scheduler. Note that
>>    * this function can be called from an IRQ context. It returns the
>> timeout remaining.
>>    */
>> unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched)
>> {
>>
>>     unsigned long timeout, current = jiffies
>>
>>     timeout = sched->work_tdr.timer.expires;
>>
>>     /*
>>      * Set timeout to an arbitrarily large value, this also prevents 
>> the timer to be
>>      * started when new submissions arrive.
>>      */
>>     if (mod_delayed_work(system_wq, &sched->work_tdr, sched->timeout 
>> * 10) &&
>>         time_after(timeout, current))
>>         return timeout - current;
>>     else
>>         return sched->timeout;
>> }
>>
>> /**
>>    * drm_sched_resume_timeout - resume timeout for reset worker
>>    *
>>    * @sched: scheduler instance for which to resume the timeout
>>    * @remaining: remaining timeout
>>    *
>>    * Resume the delayed work timeout for the scheduler. Note that
>>    * this function can be called from an IRQ context.
>>    */
>> void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, 
>> unsigned long remaining)
>> {
>>     if (list_empty(&sched->ring_mirror_list))
>>         cancel_delayed_work(&sched->work_tdr);
>>     else
>>         mod_delayed_work(system_wq, &sched->work_tdr, remaining);
>> }
> Hi Christian,
>
> Thank you for the suggestions - I was able to try this out this week. 
> It works for the most part, but there are a couple of races which need 
> further considerations.
>
> 1) The drm_sched_resume_timeout() can race with both the 
> drm_sched_job_finish() and also new job submissions. In the driver the 
> job submission which triggered the preemption can be complete as soon 
> as the switch happens and it is quite possible that I get the 
> preemption complete and the job done interrupt at the same time. So 
> this means that drm_sched_resume_timeout() in IRQ context can race 
> with drm_sched_job_finish() in worker thread context on another CPU. 
> Also in parallel new jobs can be submitted, which will also update the 
> ring mirror list . These races can be addressed however with locking 
> the job_list_lock inside the drm_sched_resume_timeout().

Yeah I know, but I considered this race harmless. Ok, thinking more 
about that I realized that this probably means that we could timeout a 
job way to early.

How about canceling the timer first and then using mod_delayed_work to 
set it to the remaining jiffies if there is a job running?

Otherwise as you noted as well the alternative is really to make the 
job_list_lock irq safe.

>
> 2) This one is a little more tricky - In the driver I start off with 
> all the timeouts suspended(except the one for the default ring), then 
> on every preemption interrupt I suspend the outgoing ring and resume 
> the incoming ring. I can only resume if it was previously suspended. 
> This is how it is set up. The problem that becomes apparent with this 
> approach is that for the suspended rings this arbitrarily large 
> timeout can fire at some point(because of no work) and just before 
> drm_sched_timedout_job() runs a new job can be inserted into the 
> mirror list. So in this case we get an incorrect timeout.
>
> What are your thoughts on using cancel_delayed_work() instead of mod 
> in suspend_timeout. Yes we will lose the benefits of mod, but we 
> should be able to synchronize drm_sched_suspend_timeout() and 
> drm_sched_start_timeout() with some basic state. I have not thought 
> this completely through so I may be missing something here.

Sounds simply like your arbitrary large timeout is not large enough or 
do I miss the problem here?

I just suggested regular timeout*10 because at least on our hardware we 
still want to have some timeout even if the ring is preempted, but you 
could also use MAX_SCHEDULE_TIMEOUT as well.

Christian.

>
> Please share your thoughts on this
>
> Thank you
>
> Sharat
>>
>>
>> Regards,
>> Christian.
>>
>>>
>>> [1]  https://patchwork.freedesktop.org/patch/259914/
>>>
>>> Signed-off-by: Sharat Masetty <smasetty at codeaurora.org>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 81 
>>> +++++++++++++++++++++++++++++++++-
>>>    include/drm/gpu_scheduler.h            |  5 +++
>>>    2 files changed, 85 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index c993d10..9645789 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -191,11 +191,84 @@ bool drm_sched_dependency_optimized(struct 
>>> dma_fence* fence,
>>>     */
>>>    static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>>>    {
>>> +    unsigned long flags;
>>> +
>>> +    spin_lock_irqsave(&sched->tdr_suspend_lock, flags);
>>> +
>>> +    sched->timeout_remaining = sched->timeout;
>>> +
>>>        if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>> -        !list_empty(&sched->ring_mirror_list))
>>> +        !list_empty(&sched->ring_mirror_list) && 
>>> !sched->work_tdr_suspended)
>>>            schedule_delayed_work(&sched->work_tdr, sched->timeout);
>>> +
>>> +    spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
>>>    }
>>>
>>> +/**
>>> + * drm_sched_suspend_timeout - suspend timeout for reset worker
>>> + *
>>> + * @sched: scheduler instance for which to suspend the timeout
>>> + *
>>> + * Suspend the delayed work timeout for the scheduler. Note that
>>> + * this function can be called from an IRQ context.
>>> + */
>>> +void drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched)
>>> +{
>>> +    unsigned long flags, timeout;
>>> +
>>> +    spin_lock_irqsave(&sched->tdr_suspend_lock, flags);
>>> +
>>> +    if (sched->work_tdr_suspended ||
>>> +            sched->timeout == MAX_SCHEDULE_TIMEOUT ||
>>> +            list_empty(&sched->ring_mirror_list))
>>> +        goto done;
>>> +
>>> +    timeout = sched->work_tdr.timer.expires;
>>> +
>>> +    /*
>>> +     * Reset timeout to an arbitrarily large value
>>> +     */
>>> +    mod_delayed_work(system_wq, &sched->work_tdr, sched->timeout * 
>>> 10);
>>> +
>>> +    timeout -= jiffies;
>>> +
>>> +    /* FIXME: Can jiffies be after timeout? */
>>> +    sched->timeout_remaining = time_after(jiffies, timeout)? 0: 
>>> timeout;
>>> +    sched->work_tdr_suspended = true;
>>> +
>>> +done:
>>> +    spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
>>> +}
>>> +EXPORT_SYMBOL(drm_sched_suspend_timeout);
>>> +
>>> +/**
>>> + * drm_sched_resume_timeout - resume timeout for reset worker
>>> + *
>>> + * @sched: scheduler instance for which to resume the timeout
>>> + *
>>> + * Resume the delayed work timeout for the scheduler. Note that
>>> + * this function can be called from an IRQ context.
>>> + */
>>> +void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched)
>>> +{
>>> +    unsigned long flags;
>>> +
>>> +    spin_lock_irqsave(&sched->tdr_suspend_lock, flags);
>>> +
>>> +    if (!sched->work_tdr_suspended ||
>>> +            sched->timeout == MAX_SCHEDULE_TIMEOUT) {
>>> + spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
>>> +        return;
>>> +    }
>>> +
>>> +    mod_delayed_work(system_wq, &sched->work_tdr, 
>>> sched->timeout_remaining);
>>> +
>>> +    sched->work_tdr_suspended = false;
>>> +
>>> +    spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
>>> +}
>>> +EXPORT_SYMBOL(drm_sched_resume_timeout);
>>> +
>>>    /* job_finish is called after hw fence signaled
>>>     */
>>>    static void drm_sched_job_finish(struct work_struct *work)
>>> @@ -348,6 +421,11 @@ void drm_sched_job_recovery(struct 
>>> drm_gpu_scheduler *sched)
>>>        struct drm_sched_job *s_job, *tmp;
>>>        bool found_guilty = false;
>>>        int r;
>>> +    unsigned long flags;
>>> +
>>> +    spin_lock_irqsave(&sched->tdr_suspend_lock, flags);
>>> +    sched->work_tdr_suspended = false;
>>> +    spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
>>>
>>>        spin_lock(&sched->job_list_lock);
>>>        list_for_each_entry_safe(s_job, tmp, 
>>> &sched->ring_mirror_list, node) {
>>> @@ -607,6 +685,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>        init_waitqueue_head(&sched->job_scheduled);
>>>        INIT_LIST_HEAD(&sched->ring_mirror_list);
>>>        spin_lock_init(&sched->job_list_lock);
>>> +    spin_lock_init(&sched->tdr_suspend_lock);
>>>        atomic_set(&sched->hw_rq_count, 0);
>>>        INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
>>>        atomic_set(&sched->num_jobs, 0);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index d87b268..5d39572 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -278,6 +278,9 @@ struct drm_gpu_scheduler {
>>>        atomic_t            hw_rq_count;
>>>        atomic64_t            job_id_count;
>>>        struct delayed_work        work_tdr;
>>> +    unsigned long            timeout_remaining;
>>> +    bool                work_tdr_suspended;
>>> +    spinlock_t            tdr_suspend_lock;
>>>        struct task_struct        *thread;
>>>        struct list_head        ring_mirror_list;
>>>        spinlock_t            job_list_lock;
>>> @@ -300,6 +303,8 @@ void drm_sched_hw_job_reset(struct 
>>> drm_gpu_scheduler *sched,
>>>    bool drm_sched_dependency_optimized(struct dma_fence* fence,
>>>                        struct drm_sched_entity *entity);
>>>    void drm_sched_job_kickout(struct drm_sched_job *s_job);
>>> +void drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched);
>>> +void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched);
>>>
>>>    void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
>>>                     struct drm_sched_entity *entity);
>>> -- 
>>> 1.9.1
>>>
>>
>> _______________________________________________
>> Freedreno mailing list
>> Freedreno at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/freedreno
>>
>