[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
Felix Kuehling
felix.kuehling at amd.com
Tue Sep 27 15:23:07 UTC 2022
Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:
> Hello Felix,
>
> Thank for the review comments.
>
> On 9/27/2022 4:48 PM, Felix Kuehling wrote:
>> Am 2022-09-27 um 02:12 schrieb Christian König:
>>> Am 26.09.22 um 23:40 schrieb Shashank Sharma:
>>>> This patch switches the GPU workload mode to/from
>>>> compute mode, while submitting compute workload.
>>>>
>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>
>>> Feel free to add my acked-by, but Felix should probably take a look
>>> as well.
>>
>> This look OK purely from a compute perspective. But I'm concerned
>> about the interaction of compute with graphics or multiple graphics
>> contexts submitting work concurrently. They would constantly override
>> or disable each other's workload hints.
>>
>> For example, you have an amdgpu_ctx with
>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
>> process that also wants the compute profile. Those could be different
>> processes belonging to different users. Say, KFD enables the compute
>> profile first. Then the graphics context submits a job. At the start
>> of the job, the compute profile is enabled. That's a no-op because
>> KFD already enabled the compute profile. When the job finishes, it
>> disables the compute profile for everyone, including KFD. That's
>> unexpected.
>>
>
> In this case, it will not disable the compute profile, as the
> reference counter will not be zero. The reset_profile() will only act
> if the reference counter is 0.
OK, I missed the reference counter.
>
> But I would be happy to get any inputs about a policy which can be
> more sustainable and gets better outputs, for example:
> - should we not allow a profile change, if a PP mode is already
> applied and keep it Early bird basis ?
>
> For example: Policy A
> - Job A sets the profile to compute
> - Job B tries to set profile to 3D, but we do not allow it as job A is
> not finished it yet.
>
> Or Policy B: Current one
> - Job A sets the profile to compute
> - Job B tries to set profile to 3D, and we allow it. Job A also runs
> in PP 3D
> - Job B finishes, but does not reset PP as reference count is not zero
> due to compute
> - Job A finishes, profile reset to NONE
I think this won't work. As I understand it, the
amdgpu_dpm_switch_power_profile enables and disables individual
profiles. Disabling the 3D profile doesn't disable the compute profile
at the same time. I think you'll need one refcount per profile.
Regards,
Felix
>
>
> Or anything else ?
>
> REgards
> Shashank
>
>
>> Or you have multiple VCN contexts. When context1 finishes a job, it
>> disables the VIDEO profile. But context2 still has a job on the other
>> VCN engine and wants the VIDEO profile to still be enabled.
>>
>> Regards,
>> Felix
>>
>>
>>>
>>> Christian.
>>>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++++++++++---
>>>> 1 file changed, 11 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> index 5e53a5293935..1caed319a448 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> @@ -34,6 +34,7 @@
>>>> #include "amdgpu_ras.h"
>>>> #include "amdgpu_umc.h"
>>>> #include "amdgpu_reset.h"
>>>> +#include "amdgpu_ctx_workload.h"
>>>> /* Total memory size in system memory and all GPU VRAM. Used to
>>>> * estimate worst case amount of memory to reserve for page tables
>>>> @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct
>>>> amdgpu_device *adev,
>>>> void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev,
>>>> bool idle)
>>>> {
>>>> - amdgpu_dpm_switch_power_profile(adev,
>>>> - PP_SMC_POWER_PROFILE_COMPUTE,
>>>> - !idle);
>>>> + int ret;
>>>> +
>>>> + if (idle)
>>>> + ret = amdgpu_clear_workload_profile(adev,
>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>> + else
>>>> + ret = amdgpu_set_workload_profile(adev,
>>>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
>>>> +
>>>> + if (ret)
>>>> + drm_warn(&adev->ddev, "Failed to %s power profile to
>>>> compute mode\n",
>>>> + idle ? "reset" : "set");
>>>> }
>>>> bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32
>>>> vmid)
>>>
More information about the amd-gfx
mailing list