[RFC PATCH] drm/amdgpu: allocate entities on demand
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Dec 2 14:59:09 UTC 2019
Am 02.12.19 um 15:43 schrieb Nirmoy:
>
> On 11/29/19 7:42 PM, Christian König wrote:
>> Am 29.11.19 um 15:29 schrieb Nirmoy:
>>> Hi Christian,
>>>
>>> On 11/26/19 10:45 AM, Christian König wrote:
>>>> It looks like a start, but there numerous things which needs to be
>>>> fixed.
>>>>
>>>> Question number one is: What's that good for? Entities are not the
>>>> problem here. The real issue is the fence ring and the rq_list.
>>>>
>>>> The rq_list could actually be made constant since it should never
>>>> be changed by the entity. It is only changed for backward
>>>> compatibility in drm_sched_entity_set_priority().
>>>>
>>>> So I would start there and cleanup the
>>>> drm_sched_entity_set_priority() to actually just set a new constant
>>>> rq list instead.
>>>
>>> I am missing some context here. Can you please explain bit more? I
>>> looked over and over again but I still don't understand what do you
>>> mean by new constant rq list :/
>>
>> Ok that needs a bit wider explanation.
>>
>> The GPU scheduler consists mainly of drm_gpu_scheduler instances.
>> Each of those instances contain multiple runqueues with different
>> priorities (5 IIRC).
>>
>> Now for each entity we give a list of runqueues where this entity can
>> be served on, e.g. where the jobs which are pushed to the entities
>> are executed.
>>
>> The entity itself keeps a copy of that runqueue list because we have
>> the drm_sched_entity_set_priority() which modifies this runqueue list.
>>
>> But essentially that is complete overkill, the runqueue lists are
>> constant for each amdgpu device, e.g. all contexts should use SDMA0
>> and SDMA1 in the same way.
>>
>> In other words building the list on runqueues should happen only once
>> and not for each contexts.
> Okay I understand now the real problem. Thanks for detail explanation.
>>
>> Multiple approach to fix this would be possible. One rather elegant
>> solution would be to change the rq list into a scheduler instances
>> list + priority.
>
> Do you mean something like
>
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 684692a8ed76..ac67f8f098fa 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -81,7 +81,7 @@ enum drm_sched_priority {
> Â struct drm_sched_entity {
>        struct list_head               list;
>        struct drm_sched_rq            *rq;
> -      struct drm_sched_rq            **rq_list;
> +     struct drm_gpu_scheduler       **sched;
>        unsigned int                   num_rq_list;
>        spinlock_t                     rq_lock;
Yes, exactly. Problem is that I'm not 100% sure if that really works
with all users of the rq_list.
Regards,
Christian.
>
>>
>>
>> This way we would also fix the age old bug that changing the priority
>> of a context could actually mess up already scheduled jobs.
>>
>> The alternative I noted before would be to drop
>> drm_sched_entity_set_priority() or change it into
>> drm_sched_entity_set_runqueues().
> I was working on it but then I got stuck by a "BUG: kernel NULL
> pointer dereference, address:" which I am trying to figure out why
>>
>> Regards,
>> Christian.
>>
>>>
>>>>
>>>> Then we could embed the fences in amdgpu_ctx_entity as dynamic
>>>> array at the end of the structure.
>>>>
>>>> And last we can start to dynamic allocate and initialize the
>>>> amdgpu_ctx_entity() structures.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list