[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

Lazar, Lijo lijo.lazar at amd.com
Wed Sep 28 07:05:32 UTC 2022



On 9/28/2022 2:10 AM, Alex Deucher wrote:
> On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
> <shashank.sharma at amd.com> wrote:
>>
>>
>>
>> On 9/27/2022 5:23 PM, Felix Kuehling wrote:
>>> Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:
>>>> Hello Felix,
>>>>
>>>> Thank for the review comments.
>>>>
>>>> On 9/27/2022 4:48 PM, Felix Kuehling wrote:
>>>>> Am 2022-09-27 um 02:12 schrieb Christian König:
>>>>>> Am 26.09.22 um 23:40 schrieb Shashank Sharma:
>>>>>>> This patch switches the GPU workload mode to/from
>>>>>>> compute mode, while submitting compute workload.
>>>>>>>
>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>>>>
>>>>>> Feel free to add my acked-by, but Felix should probably take a look
>>>>>> as well.
>>>>>
>>>>> This look OK purely from a compute perspective. But I'm concerned
>>>>> about the interaction of compute with graphics or multiple graphics
>>>>> contexts submitting work concurrently. They would constantly override
>>>>> or disable each other's workload hints.
>>>>>
>>>>> For example, you have an amdgpu_ctx with
>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
>>>>> process that also wants the compute profile. Those could be different
>>>>> processes belonging to different users. Say, KFD enables the compute
>>>>> profile first. Then the graphics context submits a job. At the start
>>>>> of the job, the compute profile is enabled. That's a no-op because
>>>>> KFD already enabled the compute profile. When the job finishes, it
>>>>> disables the compute profile for everyone, including KFD. That's
>>>>> unexpected.
>>>>>
>>>>
>>>> In this case, it will not disable the compute profile, as the
>>>> reference counter will not be zero. The reset_profile() will only act
>>>> if the reference counter is 0.
>>>
>>> OK, I missed the reference counter.
>>>
>>>
>>>>
>>>> But I would be happy to get any inputs about a policy which can be
>>>> more sustainable and gets better outputs, for example:
>>>> - should we not allow a profile change, if a PP mode is already
>>>> applied and keep it Early bird basis ?
>>>>
>>>> For example: Policy A
>>>> - Job A sets the profile to compute
>>>> - Job B tries to set profile to 3D, but we do not allow it as job A is
>>>> not finished it yet.
>>>>
>>>> Or Policy B: Current one
>>>> - Job A sets the profile to compute
>>>> - Job B tries to set profile to 3D, and we allow it. Job A also runs
>>>> in PP 3D
>>>> - Job B finishes, but does not reset PP as reference count is not zero
>>>> due to compute
>>>> - Job  A finishes, profile reset to NONE
>>>
>>> I think this won't work. As I understand it, the
>>> amdgpu_dpm_switch_power_profile enables and disables individual
>>> profiles. Disabling the 3D profile doesn't disable the compute profile
>>> at the same time. I think you'll need one refcount per profile.
>>>
>>> Regards,
>>>     Felix
>>
>> Thanks, This is exactly what I was looking for, I think Alex's initial
>> idea was around it, but I was under the assumption that there is only
>> one HW profile in SMU which keeps on getting overwritten. This can solve
>> our problems, as I can create an array of reference counters, and will
>> disable only the profile whose reference counter goes 0.
> 
> It's been a while since I paged any of this code into my head, but I
> believe the actual workload message in the SMU is a mask where you can
> specify multiple workload types at the same time and the SMU will
> arbitrate between them internally.  E.g., the most aggressive one will
> be selected out of the ones specified.  I think in the driver we just
> set one bit at a time using the current interface.

Yes, this is how it works today. Only one profile is set at a time and 
so setting another one will overwrite the current driver preference.

I think the current expectation of usage is from a system settings 
perspective like Gaming Mode (Full screen 3D) or Cinematic mode (Video) 
etc. This is also set through sysfs and there is also a Custom mode. 
It's not used in the sense of a per-job setting.

   It might be better
> to change the interface and just ref count the hint types and then
> when we call the set function look at the ref counts for each hint
> type and set the mask as appropriate.
> 
This means a pm subsytem level change and the ref counts need to be kept 
in pm layer to account for changes through sysfs or APIs.

Thanks,
Lijo

> Alex
> 
> 
>>
>> - Shashank
>>
>>>
>>>
>>>>
>>>>
>>>> Or anything else ?
>>>>
>>>> REgards
>>>> Shashank
>>>>
>>>>
>>>>> Or you have multiple VCN contexts. When context1 finishes a job, it
>>>>> disables the VIDEO profile. But context2 still has a job on the other
>>>>> VCN engine and wants the VIDEO profile to still be enabled.
>>>>>
>>>>> Regards,
>>>>>     Felix
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++++++++++---
>>>>>>>    1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>> index 5e53a5293935..1caed319a448 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>> @@ -34,6 +34,7 @@
>>>>>>>    #include "amdgpu_ras.h"
>>>>>>>    #include "amdgpu_umc.h"
>>>>>>>    #include "amdgpu_reset.h"
>>>>>>> +#include "amdgpu_ctx_workload.h"
>>>>>>>      /* Total memory size in system memory and all GPU VRAM. Used to
>>>>>>>     * estimate worst case amount of memory to reserve for page tables
>>>>>>> @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>      void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev,
>>>>>>> bool idle)
>>>>>>>    {
>>>>>>> -    amdgpu_dpm_switch_power_profile(adev,
>>>>>>> -                    PP_SMC_POWER_PROFILE_COMPUTE,
>>>>>>> -                    !idle);
>>>>>>> +    int ret;
>>>>>>> +
>>>>>>> +    if (idle)
>>>>>>> +        ret = amdgpu_clear_workload_profile(adev,
>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>>>>> +    else
>>>>>>> +        ret = amdgpu_set_workload_profile(adev,
>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>>>>> +
>>>>>>> +    if (ret)
>>>>>>> +        drm_warn(&adev->ddev, "Failed to %s power profile to
>>>>>>> compute mode\n",
>>>>>>> +             idle ? "reset" : "set");
>>>>>>>    }
>>>>>>>      bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32
>>>>>>> vmid)
>>>>>>


More information about the amd-gfx mailing list