[PATCH 4/4] drm/scheduler: do not keep a copy of sched list
Christian König
christian.koenig at amd.com
Tue Dec 10 17:32:23 UTC 2019
Am 10.12.19 um 16:08 schrieb Nirmoy:
> I think amdgpu_ctx_init() should check for num_scheds and not call
> drm_sched_entity_init()
>
> if its zero.
Ah, that's where that came from. No that is intentionally this way, but
see below.
>
> On 12/10/19 3:47 PM, Nirmoy wrote:
>>
>> On 12/10/19 2:00 PM, Christian König wrote:
>>> Am 10.12.19 um 13:53 schrieb Nirmoy Das:
>>>> entity should not keep copy and maintain sched list for
>>>> itself.
>>>>
>>>> Signed-off-by: Nirmoy Das <nirmoy.das at amd.com>
>>>> ---
>>>> drivers/gpu/drm/scheduler/sched_entity.c | 10 +---------
>>>> 1 file changed, 1 insertion(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index f9b6ce29c58f..a5f729f421f8 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -67,17 +67,10 @@ int drm_sched_entity_init(struct
>>>> drm_sched_entity *entity,
>>>> entity->guilty = guilty;
>>>> entity->num_sched_list = num_sched_list;
>>>> entity->priority = priority;
>>>> - entity->sched_list = kcalloc(num_sched_list,
>>>> - sizeof(struct drm_gpu_scheduler *),
>>>> GFP_KERNEL);
>>>> + entity->sched_list = sched_list;
>>>
>>> Maybe make this "num_sched_list > 1 ? sched_list : NULL" to avoid
>>> accidentally dereferencing a stale pointer to the stack.
>> Do you mean "num_sched_list >= 1 ? sched_list : NULL"
No, the entity->sched_list field should be NULL when num_sched_list==1.
When num_sched_list==1 the entity->sched_list shouldn't be used and we
can use a dummy list on the stack as parameter. But we should set the
pointer to NULL in this case just to make sure that nobody is
dereferencing it.
>>>
>>>> - if(!entity->sched_list)
>>>> - return -ENOMEM;
>>>> init_completion(&entity->entity_idle);
>>>> -
>>>> - for (i = 0; i < num_sched_list; i++)
>>>> - entity->sched_list[i] = sched_list[i];
>>>> -
>>>> if (num_sched_list)
>>>
>>> That check can actually be dropped as well. We return -EINVAL when
>>> the num_sched_list is zero.
>>
>> This one took me some time to understand. So we don't really return
>> -EINVAL if num_sched_list == 0
>>
>> if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
>> return -EINVAL;
>>
>> This is coming from below patch. Are we suppose to tolerate IPs with
>> uninitialized sched so that ctx creation dosn't return error ?
Yes, exactly. That's intentionally this way.
GPU reset sometimes resulted in schedulers being disabled because we
couldn't re-init them. In this case we had entities with an empty
scheduler list.
That can't happen any more after you make the scheduler arrays constant,
but I would stick with that behavior.
Regards,
Christian.
>>
>> commit 1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa
>> Author: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>> Date: Wed Jan 30 02:53:19 2019 +0100
>>
>> drm/sched: Fix entities with 0 rqs.
>>
>> Some blocks in amdgpu can have 0 rqs.
>>
>> Job creation already fails with -ENOENT when entity->rq is NULL,
>> so jobs cannot be pushed. Without a rq there is no scheduler to
>> pop jobs, and rq selection already does the right thing with a
>> list of length 0.
>>
>> So the operations we need to fix are:
>> - Creation, do not set rq to rq_list[0] if the list can have
>> length 0.
>> - Do not flush any jobs when there is no rq.
>> - On entity destruction handle the rq = NULL case.
>> - on set_priority, do not try to change the rq if it is NULL.
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>> entity->rq =
>>>> &entity->sched_list[0]->sched_rq[entity->priority];
>>>> @@ -312,7 +305,6 @@ void drm_sched_entity_fini(struct
>>>> drm_sched_entity *entity)
>>>> dma_fence_put(entity->last_scheduled);
>>>> entity->last_scheduled = NULL;
>>>> - kfree(entity->sched_list);
>>>> }
>>>> EXPORT_SYMBOL(drm_sched_entity_fini);
>>>
More information about the amd-gfx
mailing list