[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

Tue Sep 27 14:58:56 UTC 2022

Hello Felix,

Thank for the review comments.

On 9/27/2022 4:48 PM, Felix Kuehling wrote:
> Am 2022-09-27 um 02:12 schrieb Christian König:
>> Am 26.09.22 um 23:40 schrieb Shashank Sharma:
>>> This patch switches the GPU workload mode to/from
>>> compute mode, while submitting compute workload.
>>>
>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>
>> Feel free to add my acked-by, but Felix should probably take a look as 
>> well.
> 
> This look OK purely from a compute perspective. But I'm concerned about 
> the interaction of compute with graphics or multiple graphics contexts 
> submitting work concurrently. They would constantly override or disable 
> each other's workload hints.
> 
> For example, you have an amdgpu_ctx with 
> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD 
> process that also wants the compute profile. Those could be different 
> processes belonging to different users. Say, KFD enables the compute 
> profile first. Then the graphics context submits a job. At the start of 
> the job, the compute profile is enabled. That's a no-op because KFD 
> already enabled the compute profile. When the job finishes, it disables 
> the compute profile for everyone, including KFD. That's unexpected.
> 

In this case, it will not disable the compute profile, as the reference 
counter will not be zero. The reset_profile() will only act if the 
reference counter is 0.

But I would be happy to get any inputs about a policy which can be more 
sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already applied 
and keep it Early bird basis ?

For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as job A is 
not finished it yet.

Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also runs in 
PP 3D
- Job B finishes, but does not reset PP as reference count is not zero 
due to compute
- Job  A finishes, profile reset to NONE

Or anything else ?

REgards
Shashank

> Or you have multiple VCN contexts. When context1 finishes a job, it 
> disables the VIDEO profile. But context2 still has a job on the other 
> VCN engine and wants the VIDEO profile to still be enabled.
> 
> Regards,
>    Felix
> 
> 
>>
>> Christian.
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++++++++++---
>>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> index 5e53a5293935..1caed319a448 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> @@ -34,6 +34,7 @@
>>>   #include "amdgpu_ras.h"
>>>   #include "amdgpu_umc.h"
>>>   #include "amdgpu_reset.h"
>>> +#include "amdgpu_ctx_workload.h"
>>>     /* Total memory size in system memory and all GPU VRAM. Used to
>>>    * estimate worst case amount of memory to reserve for page tables
>>> @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device 
>>> *adev,
>>>     void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, 
>>> bool idle)
>>>   {
>>> -    amdgpu_dpm_switch_power_profile(adev,
>>> -                    PP_SMC_POWER_PROFILE_COMPUTE,
>>> -                    !idle);
>>> +    int ret;
>>> +
>>> +    if (idle)
>>> +        ret = amdgpu_clear_workload_profile(adev, 
>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>> +    else
>>> +        ret = amdgpu_set_workload_profile(adev, 
>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>> +
>>> +    if (ret)
>>> +        drm_warn(&adev->ddev, "Failed to %s power profile to compute 
>>> mode\n",
>>> +             idle ? "reset" : "set");
>>>   }
>>>     bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
>>