[PATCH 1/1] drm/amdgpu: disable gpu_sched load balancer for vcn jobs
Nirmoy
nirmodas at amd.com
Thu Mar 12 10:56:42 UTC 2020
On 3/12/20 9:50 AM, Christian König wrote:
> Am 11.03.20 um 21:55 schrieb Nirmoy:
>>
>> On 3/11/20 9:35 PM, Andrey Grodzovsky wrote:
>>>
>>> On 3/11/20 4:32 PM, Nirmoy wrote:
>>>>
>>>> On 3/11/20 9:02 PM, Andrey Grodzovsky wrote:
>>>>>
>>>>> On 3/11/20 4:00 PM, Andrey Grodzovsky wrote:
>>>>>>
>>>>>> On 3/11/20 4:00 PM, Nirmoy Das wrote:
>>>>>>> [SNIP]
>>>>>>> @@ -1257,6 +1258,9 @@ static int amdgpu_cs_submit(struct
>>>>>>> amdgpu_cs_parser *p,
>>>>>>> priority = job->base.s_priority;
>>>>>>> drm_sched_entity_push_job(&job->base, entity);
>>>>>>> + if (ring->funcs->no_gpu_sched_loadbalance)
>>>>>>> + amdgpu_ctx_disable_gpu_sched_load_balance(entity);
>>>>>>> +
>>>>>>
>>>>>>
>>>>>> Why this needs to be done each time a job is submitted and not
>>>>>> once in drm_sched_entity_init (same foramdgpu_job_submit bellow ?)
>>>>>>
>>>>>> Andrey
>>>>>
>>>>>
>>>>> My bad - not in drm_sched_entity_init but in relevant amdgpu code.
>>>>
>>>>
>>>> Hi Andrey,
>>>>
>>>> Do you mean drm_sched_job_init() or after creating VCN entities?
>>>>
>>>>
>>>> Nirmoy
>>>
>>>
>>> I guess after creating the VCN entities (has to be amdgpu specific
>>> code) - I just don't get why it needs to be done each time job is
>>> submitted, I mean - since you set .no_gpu_sched_loadbalance = true
>>> anyway this is always true and so shouldn't you just initialize the
>>> VCN entity with a schedulers list consisting of one scheduler and
>>> that it ?
>>
>>
>> Assumption: If I understand correctly we shouldn't be doing load
>> balance among VCN jobs in the same context. Christian, James and Leo
>> can clarify that if I am wrong.
>>
>> But we can still do load balance of VNC jobs among multiple contexts.
>> That load balance decision happens in drm_sched_entity_init(). If we
>> initialize VCN entity with one scheduler then
>>
>> all entities irrespective of context gets that one scheduler which
>> means we are not utilizing extra VNC instances.
>
> Andrey has a very good point here. So far we only looked at this from
> the hardware requirement side that we can't change the ring after the
> first submission any more.
>
> But it is certainly valuable to keep the extra overhead out of the hot
> path during command submission.
>
>> Ideally we should be calling
>> amdgpu_ctx_disable_gpu_sched_load_balance() only once after 1st call
>> of drm_sched_entity_init() of a VCN job. I am not sure how to do that
>> efficiently.
>>
>> Another option might be to copy the logic of
>> drm_sched_entity_get_free_sched() and choose suitable VNC sched
>> at/after VCN entity creation.
>
> Yes, but we should not copy the logic but rather refactor it :)
>
> Basically we need a drm_sched_pick_best() function which gets an array
> of drm_gpu_scheduler structures and returns the one with the least
> load on it.
>
> This function can then be used by VCN to pick one instance before
> initializing the entity as well as a replacement for
> drm_sched_entity_get_free_sched() to change the scheduler for load
> balancing.
This sounds like a optimum solution here.
Thanks Andrey and Christian. I will resend with suggested changes.
>
> Regards,
> Christian.
>
>>
>>
>> Regards,
>>
>> Nirmoy
>>
>
More information about the amd-gfx
mailing list