[PATCH 4/4] drm/scheduler: do not keep a copy of sched list

Tue Dec 10 17:32:23 UTC 2019

Am 10.12.19 um 16:08 schrieb Nirmoy:
> I think amdgpu_ctx_init() should check for num_scheds and not call 
> drm_sched_entity_init()
>
> if its zero.

Ah, that's where that came from. No that is intentionally this way, but 
see below.

>
> On 12/10/19 3:47 PM, Nirmoy wrote:
>>
>> On 12/10/19 2:00 PM, Christian König wrote:
>>> Am 10.12.19 um 13:53 schrieb Nirmoy Das:
>>>> entity should not keep copy and maintain sched list for
>>>> itself.
>>>>
>>>> Signed-off-by: Nirmoy Das <nirmoy.das at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/scheduler/sched_entity.c | 10 +---------
>>>>   1 file changed, 1 insertion(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index f9b6ce29c58f..a5f729f421f8 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -67,17 +67,10 @@ int drm_sched_entity_init(struct 
>>>> drm_sched_entity *entity,
>>>>       entity->guilty = guilty;
>>>>       entity->num_sched_list = num_sched_list;
>>>>       entity->priority = priority;
>>>> -    entity->sched_list =  kcalloc(num_sched_list,
>>>> -                      sizeof(struct drm_gpu_scheduler *), 
>>>> GFP_KERNEL);
>>>> +    entity->sched_list =  sched_list;
>>>
>>> Maybe make this "num_sched_list > 1 ? sched_list : NULL" to avoid 
>>> accidentally dereferencing a stale pointer to the stack.
>> Do you mean "num_sched_list >= 1 ? sched_list : NULL"

No, the entity->sched_list field should be NULL when num_sched_list==1.

When num_sched_list==1 the entity->sched_list shouldn't be used and we 
can use a dummy list on the stack as parameter. But we should set the 
pointer to NULL in this case just to make sure that nobody is 
dereferencing it.

>>>
>>>>   -    if(!entity->sched_list)
>>>> -        return -ENOMEM;
>>>>         init_completion(&entity->entity_idle);
>>>> -
>>>> -    for (i = 0; i < num_sched_list; i++)
>>>> -        entity->sched_list[i] = sched_list[i];
>>>> -
>>>>       if (num_sched_list)
>>>
>>> That check can actually be dropped as well. We return -EINVAL when 
>>> the num_sched_list is zero.
>>
>> This  one took me some time to understand. So we don't really return 
>> -EINVAL if num_sched_list == 0
>>
>> if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
>>                 return -EINVAL;
>>
>> This is coming from below patch. Are we suppose to tolerate IPs with 
>> uninitialized sched so that ctx creation dosn't return error ?

Yes, exactly. That's intentionally this way.

GPU reset sometimes resulted in schedulers being disabled because we 
couldn't re-init them. In this case we had entities with an empty 
scheduler list.

That can't happen any more after you make the scheduler arrays constant, 
but I would stick with that behavior.

Regards,
Christian.

>>
>> commit 1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa
>> Author: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>> Date:   Wed Jan 30 02:53:19 2019 +0100
>>
>>     drm/sched: Fix entities with 0 rqs.
>>
>>     Some blocks in amdgpu can have 0 rqs.
>>
>>     Job creation already fails with -ENOENT when entity->rq is NULL,
>>     so jobs cannot be pushed. Without a rq there is no scheduler to
>>     pop jobs, and rq selection already does the right thing with a
>>     list of length 0.
>>
>>     So the operations we need to fix are:
>>       - Creation, do not set rq to rq_list[0] if the list can have 
>> length 0.
>>       - Do not flush any jobs when there is no rq.
>>       - On entity destruction handle the rq = NULL case.
>>       - on set_priority, do not try to change the rq if it is NULL.
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>           entity->rq = 
>>>> &entity->sched_list[0]->sched_rq[entity->priority];
>>>>   @@ -312,7 +305,6 @@ void drm_sched_entity_fini(struct 
>>>> drm_sched_entity *entity)
>>>>         dma_fence_put(entity->last_scheduled);
>>>>       entity->last_scheduled = NULL;
>>>> -    kfree(entity->sched_list);
>>>>   }
>>>>   EXPORT_SYMBOL(drm_sched_entity_fini);
>>>