[PATCH 12/13] drm/scheduler: rework entity flush, kill and fini
Christian König
ckoenig.leichtzumerken at gmail.com
Thu Nov 17 12:55:43 UTC 2022
Am 17.11.22 um 13:47 schrieb Dmitry Osipenko:
> On 11/17/22 12:53, Christian König wrote:
>> Am 17.11.22 um 03:36 schrieb Dmitry Osipenko:
>>> Hi,
>>>
>>> On 10/14/22 11:46, Christian König wrote:
>>>> +/* Remove the entity from the scheduler and kill all pending jobs */
>>>> +static void drm_sched_entity_kill(struct drm_sched_entity *entity)
>>>> +{
>>>> + struct drm_sched_job *job;
>>>> + struct dma_fence *prev;
>>>> +
>>>> + if (!entity->rq)
>>>> + return;
>>>> +
>>>> + spin_lock(&entity->rq_lock);
>>>> + entity->stopped = true;
>>>> + drm_sched_rq_remove_entity(entity->rq, entity);
>>>> + spin_unlock(&entity->rq_lock);
>>>> +
>>>> + /* Make sure this entity is not used by the scheduler at the
>>>> moment */
>>>> + wait_for_completion(&entity->entity_idle);
>>> I'm always hitting lockup here using Panfrost driver on terminating
>>> Xorg. Revering this patch helps. Any ideas how to fix it?
>>>
>> Well is the entity idle or are there some unsubmitted jobs left?
> Do you mean unsubmitted to h/w? IIUC, there are unsubmitted jobs left.
>
> I see that there are 5-6 incomplete (in-flight) jobs when
> panfrost_job_close() is invoked.
>
> There are 1-2 jobs that are constantly scheduled and finished once in a
> few seconds after the lockup happens.
Well what drm_sched_entity_kill() is supposed to do is to prevent
pushing queued up stuff to the hw when the process which queued it is
killed. Is the process really killed or is that just some incorrect
handling?
In other words I see two possibilities here, either we have a bug in the
scheduler or panfrost isn't using it correctly.
Does panfrost calls drm_sched_entity_flush() before it calls
drm_sched_entity_fini()? (I don't have the driver source at hand at the
moment).
Regards,
Christian.
More information about the amd-gfx
mailing list