[PATCH 12/13] drm/scheduler: rework entity flush, kill and fini

Thu Nov 17 12:55:43 UTC 2022

Am 17.11.22 um 13:47 schrieb Dmitry Osipenko:
> On 11/17/22 12:53, Christian König wrote:
>> Am 17.11.22 um 03:36 schrieb Dmitry Osipenko:
>>> Hi,
>>>
>>> On 10/14/22 11:46, Christian König wrote:
>>>> +/* Remove the entity from the scheduler and kill all pending jobs */
>>>> +static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>>> +{
>>>> +    struct drm_sched_job *job;
>>>> +    struct dma_fence *prev;
>>>> +
>>>> +    if (!entity->rq)
>>>> +        return;
>>>> +
>>>> +    spin_lock(&entity->rq_lock);
>>>> +    entity->stopped = true;
>>>> +    drm_sched_rq_remove_entity(entity->rq, entity);
>>>> +    spin_unlock(&entity->rq_lock);
>>>> +
>>>> +    /* Make sure this entity is not used by the scheduler at the
>>>> moment */
>>>> +    wait_for_completion(&entity->entity_idle);
>>> I'm always hitting lockup here using Panfrost driver on terminating
>>> Xorg. Revering this patch helps. Any ideas how to fix it?
>>>
>> Well is the entity idle or are there some unsubmitted jobs left?
> Do you mean unsubmitted to h/w? IIUC, there are unsubmitted jobs left.
>
> I see that there are 5-6 incomplete (in-flight) jobs when
> panfrost_job_close() is invoked.
>
> There are 1-2 jobs that are constantly scheduled and finished once in a
> few seconds after the lockup happens.

Well what drm_sched_entity_kill() is supposed to do is to prevent 
pushing queued up stuff to the hw when the process which queued it is 
killed. Is the process really killed or is that just some incorrect 
handling?

In other words I see two possibilities here, either we have a bug in the 
scheduler or panfrost isn't using it correctly.

Does panfrost calls drm_sched_entity_flush() before it calls 
drm_sched_entity_fini()? (I don't have the driver source at hand at the 
moment).

Regards,
Christian.