[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

Thu Sep 29 18:07:54 UTC 2022

On 2022-09-29 07:10, Lazar, Lijo wrote:
>
>
> On 9/29/2022 2:18 PM, Sharma, Shashank wrote:
>>
>>
>> On 9/28/2022 11:51 PM, Alex Deucher wrote:
>>> On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
>>> <shashank.sharma at amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/27/2022 10:40 PM, Alex Deucher wrote:
>>>>> On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
>>>>> <shashank.sharma at amd.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/27/2022 5:23 PM, Felix Kuehling wrote:
>>>>>>> Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:
>>>>>>>> Hello Felix,
>>>>>>>>
>>>>>>>> Thank for the review comments.
>>>>>>>>
>>>>>>>> On 9/27/2022 4:48 PM, Felix Kuehling wrote:
>>>>>>>>> Am 2022-09-27 um 02:12 schrieb Christian König:
>>>>>>>>>> Am 26.09.22 um 23:40 schrieb Shashank Sharma:
>>>>>>>>>>> This patch switches the GPU workload mode to/from
>>>>>>>>>>> compute mode, while submitting compute workload.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>>>>>>>>
>>>>>>>>>> Feel free to add my acked-by, but Felix should probably take 
>>>>>>>>>> a look
>>>>>>>>>> as well.
>>>>>>>>>
>>>>>>>>> This look OK purely from a compute perspective. But I'm concerned
>>>>>>>>> about the interaction of compute with graphics or multiple 
>>>>>>>>> graphics
>>>>>>>>> contexts submitting work concurrently. They would constantly 
>>>>>>>>> override
>>>>>>>>> or disable each other's workload hints.
>>>>>>>>>
>>>>>>>>> For example, you have an amdgpu_ctx with
>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
>>>>>>>>> process that also wants the compute profile. Those could be 
>>>>>>>>> different
>>>>>>>>> processes belonging to different users. Say, KFD enables the 
>>>>>>>>> compute
>>>>>>>>> profile first. Then the graphics context submits a job. At the 
>>>>>>>>> start
>>>>>>>>> of the job, the compute profile is enabled. That's a no-op 
>>>>>>>>> because
>>>>>>>>> KFD already enabled the compute profile. When the job 
>>>>>>>>> finishes, it
>>>>>>>>> disables the compute profile for everyone, including KFD. That's
>>>>>>>>> unexpected.
>>>>>>>>>
>>>>>>>>
>>>>>>>> In this case, it will not disable the compute profile, as the
>>>>>>>> reference counter will not be zero. The reset_profile() will 
>>>>>>>> only act
>>>>>>>> if the reference counter is 0.
>>>>>>>
>>>>>>> OK, I missed the reference counter.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> But I would be happy to get any inputs about a policy which can be
>>>>>>>> more sustainable and gets better outputs, for example:
>>>>>>>> - should we not allow a profile change, if a PP mode is already
>>>>>>>> applied and keep it Early bird basis ?
>>>>>>>>
>>>>>>>> For example: Policy A
>>>>>>>> - Job A sets the profile to compute
>>>>>>>> - Job B tries to set profile to 3D, but we do not allow it as 
>>>>>>>> job A is
>>>>>>>> not finished it yet.
>>>>>>>>
>>>>>>>> Or Policy B: Current one
>>>>>>>> - Job A sets the profile to compute
>>>>>>>> - Job B tries to set profile to 3D, and we allow it. Job A also 
>>>>>>>> runs
>>>>>>>> in PP 3D
>>>>>>>> - Job B finishes, but does not reset PP as reference count is 
>>>>>>>> not zero
>>>>>>>> due to compute
>>>>>>>> - Job  A finishes, profile reset to NONE
>>>>>>>
>>>>>>> I think this won't work. As I understand it, the
>>>>>>> amdgpu_dpm_switch_power_profile enables and disables individual
>>>>>>> profiles. Disabling the 3D profile doesn't disable the compute 
>>>>>>> profile
>>>>>>> at the same time. I think you'll need one refcount per profile.
>>>>>>>
>>>>>>> Regards,
>>>>>>>      Felix
>>>>>>
>>>>>> Thanks, This is exactly what I was looking for, I think Alex's 
>>>>>> initial
>>>>>> idea was around it, but I was under the assumption that there is 
>>>>>> only
>>>>>> one HW profile in SMU which keeps on getting overwritten. This 
>>>>>> can solve
>>>>>> our problems, as I can create an array of reference counters, and 
>>>>>> will
>>>>>> disable only the profile whose reference counter goes 0.
>>>>>
>>>>> It's been a while since I paged any of this code into my head, but I
>>>>> believe the actual workload message in the SMU is a mask where you 
>>>>> can
>>>>> specify multiple workload types at the same time and the SMU will
>>>>> arbitrate between them internally.  E.g., the most aggressive one 
>>>>> will
>>>>> be selected out of the ones specified.  I think in the driver we just
>>>>> set one bit at a time using the current interface.  It might be 
>>>>> better
>>>>> to change the interface and just ref count the hint types and then
>>>>> when we call the set function look at the ref counts for each hint
>>>>> type and set the mask as appropriate.
>>>>>
>>>>> Alex
>>>>>
>>>>
>>>> Hey Alex,
>>>> Thanks for your comment, if that is the case, this current patch 
>>>> series
>>>> works straight forward, and no changes would be required. Please 
>>>> let me
>>>> know if my understanding is correct:
>>>>
>>>> Assumption: Order of aggression: 3D > Media > Compute
>>>>
>>>> - Job 1: Requests mode compute: PP changed to compute, ref count 1
>>>> - Job 2: Requests mode media: PP changed to media, ref count 2
>>>> - Job 3: requests mode 3D: PP changed to 3D, ref count 3
>>>> - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref 
>>>> > 0,
>>>> PP still 3D
>>>> - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref 
>>>> > 0,
>>>> PP still 3D
>>>> - Job 2 finishes, downs ref count to 0, PP changed to NONE,
>>>>
>>>> In this way, every job will be operating in the Power profile of 
>>>> desired
>>>> aggression or higher, and this API guarantees the execution 
>>>> at-least in
>>>> the desired power profile.
>>>
>>> I'm not entirely sure on the relative levels of aggression, but I
>>> believe the SMU priorities them by index.  E.g.
>>> #define WORKLOAD_PPLIB_DEFAULT_BIT        0
>>> #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
>>> #define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
>>> #define WORKLOAD_PPLIB_VIDEO_BIT          3
>>> #define WORKLOAD_PPLIB_VR_BIT             4
>>> #define WORKLOAD_PPLIB_COMPUTE_BIT        5
>>> #define WORKLOAD_PPLIB_CUSTOM_BIT         6
>>>
>>> 3D < video < VR < compute < custom
>>>
>>> VR and compute are the most aggressive.  Custom takes preference
>>> because it's user customizable.
>>>
>>> Alex
>>>
>>
>> Thanks, so this UAPI will guarantee the execution of the job in 
>> atleast the requested power profile, or a more aggressive one.
>>
>
> Hi Shashank,
>
> This is not how the API works in the driver PM subsystem. In the final 
> interface with PMFW, driver sets only one profile bit and doesn't set 
> any mask. So it doesn't work the way as Felix explained.

I was not looking at the implementation but at the API:

int amdgpu_dpm_switch_power_profile(struct amdgpu_device *adev,
                                     enum PP_SMC_POWER_PROFILE type,
                                     bool en)

This API suggests, that we can enable and disable individual profiles. 
E.g. disabling PP_SMC_POWER_PROFILE_VIDEO should not change whether 
PP_SMC_POWER_PROFILE_COMPUTE is enabled. What we actually send to the HW 
when multiple profiles are enabled through this API is a different 
question. We have to choose one profile or the other. This can happen in 
the driver or the firmware. I don't care.

But if disabling PP_SMC_POWER_PROFILE_VIDEO makes us forget that we ever 
enabled PP_SMC_POWER_PROFILE_COMPUTE then this API is broken and useless 
as an abstraction.

Regards,
   Felix

> If there is more than one profile bit set, PMFW looks at the mask and 
> picks the one with the highest priority. Note that for each update of 
> workload mask, PMFW should get a message.
>
> Driver currently sets only bit as Alex explained earlier. For our 
> current driver implementation, you can check this as example -
>
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c#L1753 
>
>
> Also, PM layer already stores the current workload profile for a *get* 
> API (which also means a new pm workload variable is not needed). But, 
> that API works as long as driver sets only one profile bit, that way 
> driver is sure of the current profile mode -
>
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c#L1628 
>
>
> When there is more than one, driver is not sure of the internal 
> priority of PMFW though we can follow the bit order which Alex 
> suggested (but sometimes FW carry some workarounds inside which means 
> it doesn't necessarily follow the same order).
>
> There is an existing interface through sysfs through which allow to 
> change the profile mode and add custom settings. In summary, any 
> handling of change from single bit to mask needs to be done at the 
> lower layer.
>
> The problem is this behavior has been there throughout all legacy 
> ASICs. Not sure how much of effort it takes and what all needs to be 
> modified.
>
> Thanks,
> Lijo
>
>> I will do the one change required and send the updated one.
>>
>> - Shashank
>>
>>>
>>>
>>>
>>>>
>>>> - Shashank
>>>>
>>>>>
>>>>>>
>>>>>> - Shashank
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Or anything else ?
>>>>>>>>
>>>>>>>> REgards
>>>>>>>> Shashank
>>>>>>>>
>>>>>>>>
>>>>>>>>> Or you have multiple VCN contexts. When context1 finishes a 
>>>>>>>>> job, it
>>>>>>>>> disables the VIDEO profile. But context2 still has a job on 
>>>>>>>>> the other
>>>>>>>>> VCN engine and wants the VIDEO profile to still be enabled.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>      Felix
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 
>>>>>>>>>>> +++++++++++---
>>>>>>>>>>>     1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>> index 5e53a5293935..1caed319a448 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>> @@ -34,6 +34,7 @@
>>>>>>>>>>>     #include "amdgpu_ras.h"
>>>>>>>>>>>     #include "amdgpu_umc.h"
>>>>>>>>>>>     #include "amdgpu_reset.h"
>>>>>>>>>>> +#include "amdgpu_ctx_workload.h"
>>>>>>>>>>>       /* Total memory size in system memory and all GPU 
>>>>>>>>>>> VRAM. Used to
>>>>>>>>>>>      * estimate worst case amount of memory to reserve for 
>>>>>>>>>>> page tables
>>>>>>>>>>> @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct
>>>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>>>       void amdgpu_amdkfd_set_compute_idle(struct 
>>>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>>> bool idle)
>>>>>>>>>>>     {
>>>>>>>>>>> -    amdgpu_dpm_switch_power_profile(adev,
>>>>>>>>>>> - PP_SMC_POWER_PROFILE_COMPUTE,
>>>>>>>>>>> -                    !idle);
>>>>>>>>>>> +    int ret;
>>>>>>>>>>> +
>>>>>>>>>>> +    if (idle)
>>>>>>>>>>> +        ret = amdgpu_clear_workload_profile(adev,
>>>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>>>>>>>>> +    else
>>>>>>>>>>> +        ret = amdgpu_set_workload_profile(adev,
>>>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>>>>>>>>> +
>>>>>>>>>>> +    if (ret)
>>>>>>>>>>> +        drm_warn(&adev->ddev, "Failed to %s power profile to
>>>>>>>>>>> compute mode\n",
>>>>>>>>>>> +             idle ? "reset" : "set");
>>>>>>>>>>>     }
>>>>>>>>>>>       bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device 
>>>>>>>>>>> *adev, u32
>>>>>>>>>>> vmid)
>>>>>>>>>>