[PATCH 1/1] drm/amdgpu: fix ctx init failure for asics without gfx ring

Nirmoy nirmodas at amd.com
Fri Jan 3 10:43:07 UTC 2020


On 1/2/20 7:13 PM, Christian König wrote:
> Am 02.01.20 um 10:47 schrieb Nirmoy:
>>
>> On 1/1/20 1:52 PM, Christian König wrote:
>>> Am 19.12.19 um 13:01 schrieb Nirmoy:
>>>> Reviewed-by: Nirmoy Das <nirmoy.das at amd.com>
>>>>
>>>> On 12/19/19 12:42 PM, Le Ma wrote:
>>>>> This workaround does not affect other asics because amdgpu only 
>>>>> need expose
>>>>> one gfx sched to user for now.
>>>>>
>>>>> Change-Id: Ica92b8565a89899aebe0eba7b2b5a25159b411d3
>>>>> Signed-off-by: Le Ma <le.ma at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 ++-
>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>> index 63f6365..64e2bab 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>> @@ -127,7 +127,8 @@ static int amdgpu_ctx_init(struct 
>>>>> amdgpu_device *adev,
>>>>>             switch (i) {
>>>>>           case AMDGPU_HW_IP_GFX:
>>>>> -            scheds = adev->gfx.gfx_sched;
>>>>> +            sched = &adev->gfx.gfx_ring[0].sched;
>>>>> +            scheds = &sched;
>>>>>               num_scheds = 1;
>>>
>>> Mhm, we should probably rather fix this here and don't expose a GFX 
>>> ring when the hardware doesn't have one.
>> Hi Christian,
>>
>> Do you mean by not initializing entity for gfx when not available?
>
> Well we still initialize it, but with num_scheds=0.

Hi Christian,

Currently drm_sched_entity_init requires a non-NULL sched_list/(old) 
rq_list. This is forcing us to pass an uninitialized

drm sched with 0 num_scheds to drm_sched_entity_init() for non-existing 
hw ip. I haven't think about such(missing/failed hw ip) corner case 
before but

I think we can handle it by:

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 2e3a058fc239..563592299817 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -56,7 +56,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
                           unsigned int num_sched_list,
                           atomic_t *guilty)
  {
-       if (!(entity && sched_list && (num_sched_list == 0 || 
sched_list[0])))
+       if (!(entity && (num_sched_list == 0 || sched_list[0])))
                 return -EINVAL;

         memset(entity, 0, sizeof(struct drm_sched_entity));
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 64e2babbc36e..4c8ad6cb11ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -127,9 +127,8 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,

                 switch (i) {
                 case AMDGPU_HW_IP_GFX:
-                       sched = &adev->gfx.gfx_ring[0].sched;
-                       scheds = &sched;
-                       num_scheds = 1;
+                       scheds = adev->gfx.gfx_sched;
+                       num_scheds = adev->gfx.num_compute_sched;
                         break;
                 case AMDGPU_HW_IP_COMPUTE:
                         scheds = adev->gfx.compute_sched;
@@ -622,6 +621,10 @@ void amdgpu_ctx_init_sched(struct amdgpu_device *adev)
                 adev->gfx.num_gfx_sched++;
         }

+       /* Currently there is only one user usable gfx queue */
+       if (adev->gfx.num_gfx_sched > 1)
+               adev->gfx.num_gfx_sched = 1;
+
         for (i = 0; i < adev->gfx.num_compute_rings; i++) {
                 adev->gfx.compute_sched[i] = 
&adev->gfx.compute_ring[i].sched;
                 adev->gfx.num_compute_sched++;

What do you think?

I have two more questions to clear regarding this

1 Why do we need to create entity for non existing hw IP?

2 What happens when user try push a gfx related job to a GPU without any 
gfx queue ?


Regards,

Nirmoy

> Christian.
>
>>
>>
>>>
>>> Christian.
>>>
>>>>>               break;
>>>>>           case AMDGPU_HW_IP_COMPUTE:
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cnirmoy.das%40amd.com%7C7603e9d4410045c1b63d08d78faf6e41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637135855948050926&sdata=JOtJwqsXMuXeyq8igxuQ8f4JHCS6MZi6PAJdGWI202g%3D&reserved=0 
>>>>
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cnirmoy.das%40amd.com%7C7603e9d4410045c1b63d08d78faf6e41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637135855948050926&sdata=JOtJwqsXMuXeyq8igxuQ8f4JHCS6MZi6PAJdGWI202g%3D&reserved=0 
>>
>


More information about the amd-gfx mailing list