[PATCH 1/1] drm/amdgpu: fix ctx init failure for asics without gfx ring

Christian König ckoenig.leichtzumerken at gmail.com
Sat Jan 4 16:02:00 UTC 2020


Am 03.01.20 um 11:43 schrieb Nirmoy:
>
> On 1/2/20 7:13 PM, Christian König wrote:
>> Am 02.01.20 um 10:47 schrieb Nirmoy:
>>>
>>> On 1/1/20 1:52 PM, Christian König wrote:
>>>> Am 19.12.19 um 13:01 schrieb Nirmoy:
>>>>> Reviewed-by: Nirmoy Das <nirmoy.das at amd.com>
>>>>>
>>>>> On 12/19/19 12:42 PM, Le Ma wrote:
>>>>>> This workaround does not affect other asics because amdgpu only 
>>>>>> need expose
>>>>>> one gfx sched to user for now.
>>>>>>
>>>>>> Change-Id: Ica92b8565a89899aebe0eba7b2b5a25159b411d3
>>>>>> Signed-off-by: Le Ma <le.ma at amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 ++-
>>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>> index 63f6365..64e2bab 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>> @@ -127,7 +127,8 @@ static int amdgpu_ctx_init(struct 
>>>>>> amdgpu_device *adev,
>>>>>>             switch (i) {
>>>>>>           case AMDGPU_HW_IP_GFX:
>>>>>> -            scheds = adev->gfx.gfx_sched;
>>>>>> +            sched = &adev->gfx.gfx_ring[0].sched;
>>>>>> +            scheds = &sched;
>>>>>>               num_scheds = 1;
>>>>
>>>> Mhm, we should probably rather fix this here and don't expose a GFX 
>>>> ring when the hardware doesn't have one.
>>> Hi Christian,
>>>
>>> Do you mean by not initializing entity for gfx when not available?
>>
>> Well we still initialize it, but with num_scheds=0.
>
> Hi Christian,
>
> Currently drm_sched_entity_init requires a non-NULL sched_list/(old) 
> rq_list. This is forcing us to pass an uninitialized
>
> drm sched with 0 num_scheds to drm_sched_entity_init() for 
> non-existing hw ip. I haven't think about such(missing/failed hw ip) 
> corner case before but
>
> I think we can handle it by:
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 2e3a058fc239..563592299817 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -56,7 +56,7 @@ int drm_sched_entity_init(struct drm_sched_entity 
> *entity,
>                           unsigned int num_sched_list,
>                           atomic_t *guilty)
>  {
> -       if (!(entity && sched_list && (num_sched_list == 0 || 
> sched_list[0])))
> +       if (!(entity && (num_sched_list == 0 || sched_list[0])))

I think that change is unnecessary. Even when num_sched_list is 0 we can 
still pass in the pointer.

> return -EINVAL;
>
>         memset(entity, 0, sizeof(struct drm_sched_entity));
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 64e2babbc36e..4c8ad6cb11ee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -127,9 +127,8 @@ static int amdgpu_ctx_init(struct amdgpu_device 
> *adev,
>
>                 switch (i) {
>                 case AMDGPU_HW_IP_GFX:
> -                       sched = &adev->gfx.gfx_ring[0].sched;
> -                       scheds = &sched;
> -                       num_scheds = 1;
> +                       scheds = adev->gfx.gfx_sched;
> +                       num_scheds = adev->gfx.num_compute_sched;

We need a num_gfx_sched here, because the number of compute scheduler 
instances is completely unrelated to the number of gfx scheduler instances.

>
>                         break;
>                 case AMDGPU_HW_IP_COMPUTE:
>                         scheds = adev->gfx.compute_sched;
> @@ -622,6 +621,10 @@ void amdgpu_ctx_init_sched(struct amdgpu_device 
> *adev)
>                 adev->gfx.num_gfx_sched++;
>         }
>
> +       /* Currently there is only one user usable gfx queue */
> +       if (adev->gfx.num_gfx_sched > 1)
> +               adev->gfx.num_gfx_sched = 1;
> +
>         for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>                 adev->gfx.compute_sched[i] = 
> &adev->gfx.compute_ring[i].sched;
>                 adev->gfx.num_compute_sched++;
>
> What do you think?
>
> I have two more questions to clear regarding this
>
> 1 Why do we need to create entity for non existing hw IP?

To cleanly reject command submissions from userspace when they try to 
access that hw.

E.g. the SW structure is still initialized even if the HW isn't present, 
we just set the number of schedulers to zero to indicated that this 
hardware is not present. Otherwise we would work with a uninitialized SW 
structure.

Alternatively when you are done with your changes we can reject creating 
the SW structure in the first place, but for this we need those changes 
first.

>
> 2 What happens when user try push a gfx related job to a GPU without 
> any gfx queue ?

IIRC we just return EINVAL if there is not hw scheduler available to 
handle a software queue.

Regards,
Christian.

>
>
> Regards,
>
> Nirmoy
>
>> Christian.
>>
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>>               break;
>>>>>>           case AMDGPU_HW_IP_COMPUTE:
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cnirmoy.das%40amd.com%7C7603e9d4410045c1b63d08d78faf6e41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637135855948050926&sdata=JOtJwqsXMuXeyq8igxuQ8f4JHCS6MZi6PAJdGWI202g%3D&reserved=0 
>>>>>
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cnirmoy.das%40amd.com%7C7603e9d4410045c1b63d08d78faf6e41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637135855948050926&sdata=JOtJwqsXMuXeyq8igxuQ8f4JHCS6MZi6PAJdGWI202g%3D&reserved=0 
>>>
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list