[PATCH v3 1/1] drm/amdgpu: rework sched_list generation

Wed Apr 1 15:02:38 UTC 2020

On 2020-03-31 08:46, Nirmoy wrote:
> 
> On 3/31/20 3:01 AM, Luben Tuikov wrote:
>> This patch seems to be using DOS line-endings.
> 
> 
> Strange, I don't see that in my local patch file.
> 

Not sure why "git am" complained about DOS endings
the first time I downloaded it. Second time was fine.

[snip]>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 29f0a410091b..27abbdc603dd 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -721,6 +721,11 @@ struct amd_powerplay {
>>>   	const struct amd_pm_funcs *pp_funcs;
>>>   };
>>>
>>> +struct amdgpu_sched {
>>> +	uint32_t			num_scheds;
>>> +	struct drm_gpu_scheduler	*sched[HWIP_MAX_INSTANCE];
>>> +};
>>> +
>>>   #define AMDGPU_RESET_MAGIC_NUM 64
>>>   #define AMDGPU_MAX_DF_PERFMONS 4
>>>   struct amdgpu_device {
>>> @@ -858,6 +863,8 @@ struct amdgpu_device {
>>>   	struct amdgpu_ring		*rings[AMDGPU_MAX_RINGS];
>>>   	bool				ib_pool_ready;
>>>   	struct amdgpu_sa_manager	ring_tmp_bo[AMDGPU_IB_POOL_MAX];
>>> +	/* drm scheduler list */
>>> +	struct amdgpu_sched		gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];
>> That's a 2-dimensional array of "struct amdgpu_sched".
>> I think that the comment should be removed, or at least
>> not say "drm scheduler list". (I can see the structure
>> definition above.)
> 
> 
> Yes I should remove it.
> 
> 
>> If this is the path you want to go, consider removing
>> "num_scheds" and creating a three dimensional array,
>> which would really essentialize the direction you want
>> to go:
>>
>> struct drm_gpu_scheduler *gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX][HWIP_MAX_INSTANCE];
>>
>> Now that this architecture is stripped down to its essentials,
>> perhaps we can see some optimizations...?
> 
> 
> If you mean whether we should see any performance improvement then imo 
> we may not see much
> 
> difference as we are using pretty much same number of memory access plus 
> now we have extra cost of array_index_nospec().
> 
> Also this is  not hot code path. We do only 1 
> amdgpu_ctx_init_entity()/HW_IP/Context.

No, this has nothing to do with "performance".
It's all about architecture and design.

You seem to have array-array-struct with array and int,
and it seems much cleaner to just have array-array-array.
I think you don't need to break the chain with
struct of int and array, just as I described
in my comment below which you snipped without addressing it.

If you're not going to address a comment, don't delete it, leave it
for others to see that it hasn't been addressed. Only
snip previously addressed and resolved comments and previously
seen patch info, like diffstat/etc.

>> Also consider that since you're creating an array of pointers,
>> you don't necessarily need to know their count. Your hot-path
>> algorithms should not need to know it. If you need to print
>> their count, say in sysfs, then you can count them up on
>> behalf of the user-space process cat-ing the sysfs entry.
>>

[snip]

>>> +
>>> +	/* set to default prio if sched_list is NULL */
>>> +	if (!adev->gpu_sched[hw_ip][hw_prio].num_scheds)
>>> +		hw_prio = AMDGPU_RING_PRIO_DEFAULT;
>> That comment is a bit confusing--it talks about a list
>> not being NULL, while the conditional implicitly checks
>> against 0.
> 
> 
> Yes, this is wrong, will remove it.
> 
> <snip>
> 

I wish you hadn't snipped my comment here, but address it
one way or the other. It is:

I'd much rather that integer comparison be performed against
integers, as opposed to using logical NOT operator (which is
implicit in comparing against 0), i.e.,

if (adev->gpu_sched[hw_ip][hw_prio].num_scheds == 0)

Also, architecturally, there seems to be informational
redundancy, in keeping an integer count and list of
objects at the same time, as the above if-conditional
exposes: the comment talks about a list and NULL but
the if-conditional implicitly checks for 0.

Perhaps, we don't need "num_scheds" and you can just
check if the index is NULL and assign PRIO_DEFAULT.

>> @@ -258,6 +272,12 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
>>   	ring->priority = DRM_SCHED_PRIORITY_NORMAL;
>>   	mutex_init(&ring->priority_mutex);
>>
>> +	if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
>> +		hw_ip = amdgpu_ring_type_to_drm_hw_ip[ring->funcs->type];
>> +		num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds;
>> +		adev->gpu_sched[hw_ip][hw_prio].sched[(*num_sched)++] = &ring->sched;
>> +	}
>> This seems unnecessarily complicated. Perhaps we can remove
>> "num_scheds", as recommended above, and keep a running pointer
>> while initialization is being done...?
> 
> 
> What do you mean by running pointer ?

A "running pointer" is a local pointer you're using temporarily
to traverse memory. If you remove the "num_scheds", as noted in my
previous comment, then you can use a running pointer to contain
the next entry you'd initialize.

Regards,
Luben