[RFC PATCH 2/4] drm/scheduler: implement a function to modify sched list
Luben Tuikov
luben.tuikov at amd.com
Mon Mar 2 20:47:28 UTC 2020
On 2020-02-28 2:47 a.m., Christian König wrote:
> Am 28.02.20 um 06:08 schrieb Luben Tuikov:
>> On 2020-02-27 4:40 p.m., Nirmoy Das wrote:
>>> implement drm_sched_entity_modify_sched() which can modify existing
>>> sched_list with a different one. This is going to be helpful when
>>> userspace changes priority of a ctx/entity then driver can switch to
>>> corresponding hw shced list for that priority
>>>
>>> Signed-off-by: Nirmoy Das <nirmoy.das at amd.com>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_entity.c | 24 ++++++++++++++++++++++++
>>> include/drm/gpu_scheduler.h | 4 ++++
>>> 2 files changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 63bccd201b97..711e9d504bcb 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -83,6 +83,30 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>>> }
>>> EXPORT_SYMBOL(drm_sched_entity_init);
>>>
>>> +/**
>>> + * drm_sched_entity_modify_sched - Modify sched of an entity
>>> + *
>>> + * @entity: scheduler entity to init
>>> + * @sched_list: the list of new drm scheds which will replace
>>> + * existing entity->sched_list
>>> + * @num_sched_list: number of drm sched in sched_list
>>> + *
>>> + * Returns 0 on success or a negative error code on failure.
>>> + */
>>> +int drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>> + struct drm_gpu_scheduler **sched_list,
>>> + unsigned int num_sched_list)
>>> +{
>>> + if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
>>> + return -EINVAL;
>> This seems unmaintainable. I'd write it in its natural form:
>
> This is probably just copy & pasted from the init function and complete
> overkill here.
It should be an easy rejection then. Statements like this make
the code unmaintainable. Regardless of whether it was copy-and-pasted
I wanted to emphasize the lack of simplification of what
was being done.
We need to put intention and sense into what we're doing, scrutinizing
every line we put into a patch. This is why I suggested
the simplification here:
>
>>
>> int drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>> struct drm_gpu_scheduler **sched_list,
>> unsigned int num_sched_list)
>> {
>> if (entity && sched_list && (num_sched_list == 0 || sched_list[0] != NULL)) {
>> entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>> entity->num_sched_list = num_sched_list;
>> return 0;
>> } else {
>> return -EINVAL;
>> }
>> }
>
> Actually that's a rather bad idea. Error handling should always be in
I actually don't think that it is a "rather bad idea". At all.
I actually think that it makes this leaf function more clear to
understand as the conditional would read like a sentence in prose.
> the form of:
>
> if (check_error || missing_prerequisite)
> return_or_goto_cleanup;
I don't think we should generalize across the board. We should be
more flexible in order to create clear and maintainable code.
>
>> That's too heavy. Can we improve the architecture
>> so we don't have to check for this in leaf functions like this one?
>> We can just return a parameterization.
>>
>> Why would this be called with entity being NULL?
>> Or with sched_list being NULL? Or num_sched_list being not zero
>> yet sched_list[0] being NULL? Why not make sure that sched_list[0] is
>> never NULL and that num_sched_list is greater than 0 always?
>>
>> Does this make it to user space?
>> Why would the state of execution be one such that this is true/false
>> for the code to return -EINVAL?
>> From patch 3/4 it seems that an error is printed inside amdgpu_ctx_priority_override()
>> and the result is not propagated up the stack.
>>
>> I think we should improve the code where here this condition above
>> is never true. Then we can use parameterization for those two
>> statements below:
>>
>>> +
>>> + entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>> So if we're here, we know from the top conditional that
>> either num_sched_list is 0 or that sched_list[0] not NULL
>> or both.
>>
>> So if num_sched_list is 0 or 1 we return NULL?
>> And if num_sched_list is 2 or greater we return sched_list
>> of which sched_list[0] could be NULL?
>
> This is also copy&pasted from the init code and completely wrong here.
So even more reasons to reject this patch.
>
> What we should do instead is just: WARN_ON(!num_sched_list || !sched_list);
Again, what does that *mean*? What does the check mean and what
does the num_sched_list == 0 or sched_list == NULL mean?
And how did we get into a situation like this where either or both
could be nil?
Wouldn't it be better to simplify or re-architecture this (we only recently
decided to hide physical rings from user-space), so that the code
is elegant (meaning no if-else) yet flexible and straightforward?
>
> And to the checking for not keeping around the scheduler list in the
> init function.
>
>> Why not fix the architecture so that this is simply copied?
>
> We had that and moved away from it because the scheduler list is
> actually const and shouldn't be allocated with each entity (which we can
> easily have thousands of).
I think that peppering the code with if-else conditionals
everywhere as these patch-series into the DRM scheduler have been,
would make the code unmaintainable in the long run.
Regards,
Luben
>
> Regards,
> Christian.
>
>> (in which case
>> we can simply index-parameterize it and simply copy it.
>> Why are there so many checks everywhere?
>>
>>> + entity->num_sched_list = num_sched_list;
>>> +
>> I mean, all we're doing in this function is initializing
>> entity->sched_list and entity->num_sched_list. Why does this
>> function have to be so complex and do so many checks?
>> Can we improve/fix the architecture instead?
>>
>> Regards,
>> Luben
>>
>>
>>> + return 0;
>>> +}
>>> +EXPORT_SYMBOL(drm_sched_entity_modify_sched);
>>> +
>>> /**
>>> * drm_sched_entity_is_idle - Check if entity is idle
>>> *
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index 589be851f8a1..0c164a96d51b 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -297,6 +297,10 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>> int drm_sched_job_init(struct drm_sched_job *job,
>>> struct drm_sched_entity *entity,
>>> void *owner);
>>> +int drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>> + struct drm_gpu_scheduler **sched_list,
>>> + unsigned int num_sched_list);
>>> +
>>> void drm_sched_job_cleanup(struct drm_sched_job *job);
>>> void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>> void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
>>>
>
More information about the amd-gfx
mailing list