[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
Felix Kuehling
felix.kuehling at amd.com
Thu Sep 29 18:07:54 UTC 2022
On 2022-09-29 07:10, Lazar, Lijo wrote:
>
>
> On 9/29/2022 2:18 PM, Sharma, Shashank wrote:
>>
>>
>> On 9/28/2022 11:51 PM, Alex Deucher wrote:
>>> On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
>>> <shashank.sharma at amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/27/2022 10:40 PM, Alex Deucher wrote:
>>>>> On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
>>>>> <shashank.sharma at amd.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/27/2022 5:23 PM, Felix Kuehling wrote:
>>>>>>> Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:
>>>>>>>> Hello Felix,
>>>>>>>>
>>>>>>>> Thank for the review comments.
>>>>>>>>
>>>>>>>> On 9/27/2022 4:48 PM, Felix Kuehling wrote:
>>>>>>>>> Am 2022-09-27 um 02:12 schrieb Christian König:
>>>>>>>>>> Am 26.09.22 um 23:40 schrieb Shashank Sharma:
>>>>>>>>>>> This patch switches the GPU workload mode to/from
>>>>>>>>>>> compute mode, while submitting compute workload.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>>>>>>>>
>>>>>>>>>> Feel free to add my acked-by, but Felix should probably take
>>>>>>>>>> a look
>>>>>>>>>> as well.
>>>>>>>>>
>>>>>>>>> This look OK purely from a compute perspective. But I'm concerned
>>>>>>>>> about the interaction of compute with graphics or multiple
>>>>>>>>> graphics
>>>>>>>>> contexts submitting work concurrently. They would constantly
>>>>>>>>> override
>>>>>>>>> or disable each other's workload hints.
>>>>>>>>>
>>>>>>>>> For example, you have an amdgpu_ctx with
>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
>>>>>>>>> process that also wants the compute profile. Those could be
>>>>>>>>> different
>>>>>>>>> processes belonging to different users. Say, KFD enables the
>>>>>>>>> compute
>>>>>>>>> profile first. Then the graphics context submits a job. At the
>>>>>>>>> start
>>>>>>>>> of the job, the compute profile is enabled. That's a no-op
>>>>>>>>> because
>>>>>>>>> KFD already enabled the compute profile. When the job
>>>>>>>>> finishes, it
>>>>>>>>> disables the compute profile for everyone, including KFD. That's
>>>>>>>>> unexpected.
>>>>>>>>>
>>>>>>>>
>>>>>>>> In this case, it will not disable the compute profile, as the
>>>>>>>> reference counter will not be zero. The reset_profile() will
>>>>>>>> only act
>>>>>>>> if the reference counter is 0.
>>>>>>>
>>>>>>> OK, I missed the reference counter.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> But I would be happy to get any inputs about a policy which can be
>>>>>>>> more sustainable and gets better outputs, for example:
>>>>>>>> - should we not allow a profile change, if a PP mode is already
>>>>>>>> applied and keep it Early bird basis ?
>>>>>>>>
>>>>>>>> For example: Policy A
>>>>>>>> - Job A sets the profile to compute
>>>>>>>> - Job B tries to set profile to 3D, but we do not allow it as
>>>>>>>> job A is
>>>>>>>> not finished it yet.
>>>>>>>>
>>>>>>>> Or Policy B: Current one
>>>>>>>> - Job A sets the profile to compute
>>>>>>>> - Job B tries to set profile to 3D, and we allow it. Job A also
>>>>>>>> runs
>>>>>>>> in PP 3D
>>>>>>>> - Job B finishes, but does not reset PP as reference count is
>>>>>>>> not zero
>>>>>>>> due to compute
>>>>>>>> - Job A finishes, profile reset to NONE
>>>>>>>
>>>>>>> I think this won't work. As I understand it, the
>>>>>>> amdgpu_dpm_switch_power_profile enables and disables individual
>>>>>>> profiles. Disabling the 3D profile doesn't disable the compute
>>>>>>> profile
>>>>>>> at the same time. I think you'll need one refcount per profile.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Felix
>>>>>>
>>>>>> Thanks, This is exactly what I was looking for, I think Alex's
>>>>>> initial
>>>>>> idea was around it, but I was under the assumption that there is
>>>>>> only
>>>>>> one HW profile in SMU which keeps on getting overwritten. This
>>>>>> can solve
>>>>>> our problems, as I can create an array of reference counters, and
>>>>>> will
>>>>>> disable only the profile whose reference counter goes 0.
>>>>>
>>>>> It's been a while since I paged any of this code into my head, but I
>>>>> believe the actual workload message in the SMU is a mask where you
>>>>> can
>>>>> specify multiple workload types at the same time and the SMU will
>>>>> arbitrate between them internally. E.g., the most aggressive one
>>>>> will
>>>>> be selected out of the ones specified. I think in the driver we just
>>>>> set one bit at a time using the current interface. It might be
>>>>> better
>>>>> to change the interface and just ref count the hint types and then
>>>>> when we call the set function look at the ref counts for each hint
>>>>> type and set the mask as appropriate.
>>>>>
>>>>> Alex
>>>>>
>>>>
>>>> Hey Alex,
>>>> Thanks for your comment, if that is the case, this current patch
>>>> series
>>>> works straight forward, and no changes would be required. Please
>>>> let me
>>>> know if my understanding is correct:
>>>>
>>>> Assumption: Order of aggression: 3D > Media > Compute
>>>>
>>>> - Job 1: Requests mode compute: PP changed to compute, ref count 1
>>>> - Job 2: Requests mode media: PP changed to media, ref count 2
>>>> - Job 3: requests mode 3D: PP changed to 3D, ref count 3
>>>> - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref
>>>> > 0,
>>>> PP still 3D
>>>> - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref
>>>> > 0,
>>>> PP still 3D
>>>> - Job 2 finishes, downs ref count to 0, PP changed to NONE,
>>>>
>>>> In this way, every job will be operating in the Power profile of
>>>> desired
>>>> aggression or higher, and this API guarantees the execution
>>>> at-least in
>>>> the desired power profile.
>>>
>>> I'm not entirely sure on the relative levels of aggression, but I
>>> believe the SMU priorities them by index. E.g.
>>> #define WORKLOAD_PPLIB_DEFAULT_BIT 0
>>> #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
>>> #define WORKLOAD_PPLIB_POWER_SAVING_BIT 2
>>> #define WORKLOAD_PPLIB_VIDEO_BIT 3
>>> #define WORKLOAD_PPLIB_VR_BIT 4
>>> #define WORKLOAD_PPLIB_COMPUTE_BIT 5
>>> #define WORKLOAD_PPLIB_CUSTOM_BIT 6
>>>
>>> 3D < video < VR < compute < custom
>>>
>>> VR and compute are the most aggressive. Custom takes preference
>>> because it's user customizable.
>>>
>>> Alex
>>>
>>
>> Thanks, so this UAPI will guarantee the execution of the job in
>> atleast the requested power profile, or a more aggressive one.
>>
>
> Hi Shashank,
>
> This is not how the API works in the driver PM subsystem. In the final
> interface with PMFW, driver sets only one profile bit and doesn't set
> any mask. So it doesn't work the way as Felix explained.
I was not looking at the implementation but at the API:
int amdgpu_dpm_switch_power_profile(struct amdgpu_device *adev,
enum PP_SMC_POWER_PROFILE type,
bool en)
This API suggests, that we can enable and disable individual profiles.
E.g. disabling PP_SMC_POWER_PROFILE_VIDEO should not change whether
PP_SMC_POWER_PROFILE_COMPUTE is enabled. What we actually send to the HW
when multiple profiles are enabled through this API is a different
question. We have to choose one profile or the other. This can happen in
the driver or the firmware. I don't care.
But if disabling PP_SMC_POWER_PROFILE_VIDEO makes us forget that we ever
enabled PP_SMC_POWER_PROFILE_COMPUTE then this API is broken and useless
as an abstraction.
Regards,
Felix
> If there is more than one profile bit set, PMFW looks at the mask and
> picks the one with the highest priority. Note that for each update of
> workload mask, PMFW should get a message.
>
> Driver currently sets only bit as Alex explained earlier. For our
> current driver implementation, you can check this as example -
>
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c#L1753
>
>
> Also, PM layer already stores the current workload profile for a *get*
> API (which also means a new pm workload variable is not needed). But,
> that API works as long as driver sets only one profile bit, that way
> driver is sure of the current profile mode -
>
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c#L1628
>
>
> When there is more than one, driver is not sure of the internal
> priority of PMFW though we can follow the bit order which Alex
> suggested (but sometimes FW carry some workarounds inside which means
> it doesn't necessarily follow the same order).
>
> There is an existing interface through sysfs through which allow to
> change the profile mode and add custom settings. In summary, any
> handling of change from single bit to mask needs to be done at the
> lower layer.
>
> The problem is this behavior has been there throughout all legacy
> ASICs. Not sure how much of effort it takes and what all needs to be
> modified.
>
> Thanks,
> Lijo
>
>> I will do the one change required and send the updated one.
>>
>> - Shashank
>>
>>>
>>>
>>>
>>>>
>>>> - Shashank
>>>>
>>>>>
>>>>>>
>>>>>> - Shashank
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Or anything else ?
>>>>>>>>
>>>>>>>> REgards
>>>>>>>> Shashank
>>>>>>>>
>>>>>>>>
>>>>>>>>> Or you have multiple VCN contexts. When context1 finishes a
>>>>>>>>> job, it
>>>>>>>>> disables the VIDEO profile. But context2 still has a job on
>>>>>>>>> the other
>>>>>>>>> VCN engine and wants the VIDEO profile to still be enabled.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Felix
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14
>>>>>>>>>>> +++++++++++---
>>>>>>>>>>> 1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>> index 5e53a5293935..1caed319a448 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>>>>>>>> @@ -34,6 +34,7 @@
>>>>>>>>>>> #include "amdgpu_ras.h"
>>>>>>>>>>> #include "amdgpu_umc.h"
>>>>>>>>>>> #include "amdgpu_reset.h"
>>>>>>>>>>> +#include "amdgpu_ctx_workload.h"
>>>>>>>>>>> /* Total memory size in system memory and all GPU
>>>>>>>>>>> VRAM. Used to
>>>>>>>>>>> * estimate worst case amount of memory to reserve for
>>>>>>>>>>> page tables
>>>>>>>>>>> @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct
>>>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>>> void amdgpu_amdkfd_set_compute_idle(struct
>>>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>>> bool idle)
>>>>>>>>>>> {
>>>>>>>>>>> - amdgpu_dpm_switch_power_profile(adev,
>>>>>>>>>>> - PP_SMC_POWER_PROFILE_COMPUTE,
>>>>>>>>>>> - !idle);
>>>>>>>>>>> + int ret;
>>>>>>>>>>> +
>>>>>>>>>>> + if (idle)
>>>>>>>>>>> + ret = amdgpu_clear_workload_profile(adev,
>>>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>>>>>>>>> + else
>>>>>>>>>>> + ret = amdgpu_set_workload_profile(adev,
>>>>>>>>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>>>>>>>>> +
>>>>>>>>>>> + if (ret)
>>>>>>>>>>> + drm_warn(&adev->ddev, "Failed to %s power profile to
>>>>>>>>>>> compute mode\n",
>>>>>>>>>>> + idle ? "reset" : "set");
>>>>>>>>>>> }
>>>>>>>>>>> bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device
>>>>>>>>>>> *adev, u32
>>>>>>>>>>> vmid)
>>>>>>>>>>
More information about the amd-gfx
mailing list