[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

Lazar, Lijo lijo.lazar at amd.com
Fri Sep 30 04:46:37 UTC 2022



On 9/29/2022 11:37 PM, Felix Kuehling wrote:
> On 2022-09-29 07:10, Lazar, Lijo wrote:
>>
>>
>> On 9/29/2022 2:18 PM, Sharma, Shashank wrote:
>>>
>>>
>>> On 9/28/2022 11:51 PM, Alex Deucher wrote:
>>>> On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
>>>> <shashank.sharma at amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 9/27/2022 10:40 PM, Alex Deucher wrote:
>>>>>> On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
>>>>>> <shashank.sharma at amd.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 9/27/2022 5:23 PM, Felix Kuehling wrote:
>>>>>>>> Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:
>>>>>>>>> Hello Felix,
>>>>>>>>>
>>>>>>>>> Thank for the review comments.
>>>>>>>>>
>>>>>>>>> On 9/27/2022 4:48 PM, Felix Kuehling wrote:
>>>>>>>>>> Am 2022-09-27 um 02:12 schrieb Christian König:
>>>>>>>>>>> Am 26.09.22 um 23:40 schrieb Shashank Sharma:
>>>>>>>>>>>> This patch switches the GPU workload mode to/from
>>>>>>>>>>>> compute mode, while submitting compute workload.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>>>>>>>>>
>>>>>>>>>>> Feel free to add my acked-by, but Felix should probably take 
>>>>>>>>>>> a look
>>>>>>>>>>> as well.
>>>>>>>>>>
>>>>>>>>>> This look OK purely from a compute perspective. But I'm concerned
>>>>>>>>>> about the interaction of compute with graphics or multiple 
>>>>>>>>>> graphics
>>>>>>>>>> contexts submitting work concurrently. They would constantly 
>>>>>>>>>> override
>>>>>>>>>> or disable each other's workload hints.
>>>>>>>>>>
>>>>>>>>>> For example, you have an amdgpu_ctx with
>>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
>>>>>>>>>> process that also wants the compute profile. Those could be 
>>>>>>>>>> different
>>>>>>>>>> processes belonging to different users. Say, KFD enables the 
>>>>>>>>>> compute
>>>>>>>>>> profile first. Then the graphics context submits a job. At the 
>>>>>>>>>> start
>>>>>>>>>> of the job, the compute profile is enabled. That's a no-op 
>>>>>>>>>> because
>>>>>>>>>> KFD already enabled the compute profile. When the job 
>>>>>>>>>> finishes, it
>>>>>>>>>> disables the compute profile for everyone, including KFD. That's
>>>>>>>>>> unexpected.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In this case, it will not disable the compute profile, as the
>>>>>>>>> reference counter will not be zero. The reset_profile() will 
>>>>>>>>> only act
>>>>>>>>> if the reference counter is 0.
>>>>>>>>
>>>>>>>> OK, I missed the reference counter.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> But I would be happy to get any inputs about a policy which can be
>>>>>>>>> more sustainable and gets better outputs, for example:
>>>>>>>>> - should we not allow a profile change, if a PP mode is already
>>>>>>>>> applied and keep it Early bird basis ?
>>>>>>>>>
>>>>>>>>> For example: Policy A
>>>>>>>>> - Job A sets the profile to compute
>>>>>>>>> - Job B tries to set profile to 3D, but we do not allow it as 
>>>>>>>>> job A is
>>>>>>>>> not finished it yet.
>>>>>>>>>
>>>>>>>>> Or Policy B: Current one
>>>>>>>>> - Job A sets the profile to compute
>>>>>>>>> - Job B tries to set profile to 3D, and we allow it. Job A also 
>>>>>>>>> runs
>>>>>>>>> in PP 3D
>>>>>>>>> - Job B finishes, but does not reset PP as reference count is 
>>>>>>>>> not zero
>>>>>>>>> due to compute
>>>>>>>>> - Job  A finishes, profile reset to NONE
>>>>>>>>
>>>>>>>> I think this won't work. As I understand it, the
>>>>>>>> amdgpu_dpm_switch_power_profile enables and disables individual
>>>>>>>> profiles. Disabling the 3D profile doesn't disable the compute 
>>>>>>>> profile
>>>>>>>> at the same time. I think you'll need one refcount per profile.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>      Felix
>>>>>>>
>>>>>>> Thanks, This is exactly what I was looking for, I think Alex's 
>>>>>>> initial
>>>>>>> idea was around it, but I was under the assumption that there is 
>>>>>>> only
>>>>>>> one HW profile in SMU which keeps on getting overwritten. This 
>>>>>>> can solve
>>>>>>> our problems, as I can create an array of reference counters, and 
>>>>>>> will
>>>>>>> disable only the profile whose reference counter goes 0.
>>>>>>
>>>>>> It's been a while since I paged any of this code into my head, but I
>>>>>> believe the actual workload message in the SMU is a mask where you 
>>>>>> can
>>>>>> specify multiple workload types at the same time and the SMU will
>>>>>> arbitrate between them internally.  E.g., the most aggressive one 
>>>>>> will
>>>>>> be selected out of the ones specified.  I think in the driver we just
>>>>>> set one bit at a time using the current interface.  It might be 
>>>>>> better
>>>>>> to change the interface and just ref count the hint types and then
>>>>>> when we call the set function look at the ref counts for each hint
>>>>>> type and set the mask as appropriate.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>
>>>>> Hey Alex,
>>>>> Thanks for your comment, if that is the case, this current patch 
>>>>> series
>>>>> works straight forward, and no changes would be required. Please 
>>>>> let me
>>>>> know if my understanding is correct:
>>>>>
>>>>> Assumption: Order of aggression: 3D > Media > Compute
>>>>>
>>>>> - Job 1: Requests mode compute: PP changed to compute, ref count 1
>>>>> - Job 2: Requests mode media: PP changed to media, ref count 2
>>>>> - Job 3: requests mode 3D: PP changed to 3D, ref count 3
>>>>> - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref 
>>>>> > 0,
>>>>> PP still 3D
>>>>> - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref 
>>>>> > 0,
>>>>> PP still 3D
>>>>> - Job 2 finishes, downs ref count to 0, PP changed to NONE,
>>>>>
>>>>> In this way, every job will be operating in the Power profile of 
>>>>> desired
>>>>> aggression or higher, and this API guarantees the execution 
>>>>> at-least in
>>>>> the desired power profile.
>>>>
>>>> I'm not entirely sure on the relative levels of aggression, but I
>>>> believe the SMU priorities them by index.  E.g.
>>>> #define WORKLOAD_PPLIB_DEFAULT_BIT        0
>>>> #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
>>>> #define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
>>>> #define WORKLOAD_PPLIB_VIDEO_BIT          3
>>>> #define WORKLOAD_PPLIB_VR_BIT             4
>>>> #define WORKLOAD_PPLIB_COMPUTE_BIT        5
>>>> #define WORKLOAD_PPLIB_CUSTOM_BIT         6
>>>>
>>>> 3D < video < VR < compute < custom
>>>>
>>>> VR and compute are the most aggressive.  Custom takes preference
>>>> because it's user customizable.
>>>>
>>>> Alex
>>>>
>>>
>>> Thanks, so this UAPI will guarantee the execution of the job in 
>>> atleast the requested power profile, or a more aggressive one.
>>>
>>
>> Hi Shashank,
>>
>> This is not how the API works in the driver PM subsystem. In the final 
>> interface with PMFW, driver sets only one profile bit and doesn't set 
>> any mask. So it doesn't work the way as Felix explained.
> 
> I was not looking at the implementation but at the API:
> 
> int amdgpu_dpm_switch_power_profile(struct amdgpu_device *adev,
>                                      enum PP_SMC_POWER_PROFILE type,
>                                      bool en)
> 
> This API suggests, that we can enable and disable individual profiles. 
> E.g. disabling PP_SMC_POWER_PROFILE_VIDEO should not change whether 
> PP_SMC_POWER_PROFILE_COMPUTE is enabled. What we actually send to the HW 
> when multiple profiles are enabled through this API is a different 
> question. We have to choose one profile or the other. This can happen in 
> the driver or the firmware. I don't care.
> 
> But if disabling PP_SMC_POWER_PROFILE_VIDEO makes us forget that we ever 
> enabled PP_SMC_POWER_PROFILE_COMPUTE then this API is broken and useless 
> as an abstraction.
> 

Checked again. Here driver decides the priority instead of FW. So the 
API works as you mentioned (except that there is no refcount done). 
Sorry for the confusion.

Thanks,
Lijo

> Regards,
>    Felix
> 
> 
>> If there is more than one profile bit set, PMFW looks at the mask and 
>> picks the one with the highest priority. Note that for each update of 
>> workload mask, PMFW should get a message.
>>
>> Driver currently sets only bit as Alex explained earlier. For our 
>> current driver implementation, you can check this as example -
>>
>> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c#L1753 
>>
>>
>> Also, PM layer already stores the current workload profile for a *get* 
>> API (which also means a new pm workload variable is not needed). But, 
>> that API works as long as driver sets only one profile bit, that way 
>> driver is sure of the current profile mode -
>>
>> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c#L1628 
>>
>>
>> When there is more than one, driver is not sure of the internal 
>> priority of PMFW though we can follow the bit order which Alex 
>> suggested (but sometimes FW carry some workarounds inside which means 
>> it doesn't necessarily follow the same order).
>>
>> There is an existing interface through sysfs through which allow to 
>> change the profile mode and add custom settings. In summary, any 
>> handling of change from single bit to mask needs to be done at the 
>> lower layer.
>>
>> The problem is this behavior has been there throughout all legacy 
>> ASICs. Not sure how much of effort it takes and what all needs to be 
>> modified.
>>
>> Thanks,
>> Lijo
>>
>>> I will do the one change required and send the updated one.
>>>
>>> - Shashank
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> - Shashank
>>>>>
>>>>>>
>>>>>>>
>>>>>>> - Shashank
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Or anything else ?
>>>>>>>>>
>>>>>>>>> REgards
>>>>>>>>> Shashank
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Or you have multiple VCN contexts. When context1 finishes a 
>>>>>>>>>> job, it
>>>>>>>>>> disables the VIDEO profile. But context2 still has a job on 
>>>>>>>>>> the other
>>>>>>>>>> VCN engine and wants the VIDEO profile to still be enabled.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>      Felix
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 
>>>>>>>>>>>> +++++++++++---
>>>>>>>>>>>>     1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>>> index 5e53a5293935..1caed319a448 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>>> @@ -34,6 +34,7 @@
>>>>>>>>>>>>     #include "amdgpu_ras.h"
>>>>>>>>>>>>     #include "amdgpu_umc.h"
>>>>>>>>>>>>     #include "amdgpu_reset.h"
>>>>>>>>>>>> +#include "amdgpu_ctx_workload.h"
>>>>>>>>>>>>       /* Total memory size in system memory and all GPU 
>>>>>>>>>>>> VRAM. Used to
>>>>>>>>>>>>      * estimate worst case amount of memory to reserve for 
>>>>>>>>>>>> page tables
>>>>>>>>>>>> @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct
>>>>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>>>>       void amdgpu_amdkfd_set_compute_idle(struct 
>>>>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>>>> bool idle)
>>>>>>>>>>>>     {
>>>>>>>>>>>> -    amdgpu_dpm_switch_power_profile(adev,
>>>>>>>>>>>> - PP_SMC_POWER_PROFILE_COMPUTE,
>>>>>>>>>>>> -                    !idle);
>>>>>>>>>>>> +    int ret;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (idle)
>>>>>>>>>>>> +        ret = amdgpu_clear_workload_profile(adev,
>>>>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>>>>>>>>>> +    else
>>>>>>>>>>>> +        ret = amdgpu_set_workload_profile(adev,
>>>>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (ret)
>>>>>>>>>>>> +        drm_warn(&adev->ddev, "Failed to %s power profile to
>>>>>>>>>>>> compute mode\n",
>>>>>>>>>>>> +             idle ? "reset" : "set");
>>>>>>>>>>>>     }
>>>>>>>>>>>>       bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device 
>>>>>>>>>>>> *adev, u32
>>>>>>>>>>>> vmid)
>>>>>>>>>>>


More information about the amd-gfx mailing list