[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

Fri Sep 30 09:13:51 UTC 2022

On 9/30/2022 2:07 PM, Sharma, Shashank wrote:
> 
> 
> On 9/30/2022 7:08 AM, Lazar, Lijo wrote:
>>
>>
>> On 9/30/2022 12:02 AM, Alex Deucher wrote:
>>> On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 9/29/2022 7:30 PM, Sharma, Shashank wrote:
>>>>>
>>>>>
>>>>> On 9/29/2022 3:37 PM, Lazar, Lijo wrote:
>>>>>> To be clear your understanding -
>>>>>>
>>>>>> Nothing is automatic in PMFW. PMFW picks a priority based on the
>>>>>> actual mask sent by driver.
>>>>>>
>>>>>> Assuming lower bits corresponds to highest priority -
>>>>>>
>>>>>> If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose
>>>>>> profile that corresponds to Bit0. If driver sends a mask with Bit4
>>>>>> Bit2 set and rest unset, PMFW will chose profile that corresponds to
>>>>>> Bit2. However if driver sends a mask only with a single bit set, it
>>>>>> chooses the profile regardless of whatever was the previous 
>>>>>> profile. t
>>>>>> doesn't check if the existing profile > newly requested one. That is
>>>>>> the behavior.
>>>>>>
>>>>>> So if a job send chooses a profile that corresponds to Bit0, driver
>>>>>> will send that. Next time if another job chooses a profile that
>>>>>> corresponds to Bit1, PMFW will receive that as the new profile and
>>>>>> switch to that. It trusts the driver to send the proper workload 
>>>>>> mask.
>>>>>>
>>>>>> Hope that gives the picture.
>>>>>>
>>>>>
>>>>>
>>>>> Thanks, my understanding is also similar, referring to the core power
>>>>> switch profile function here:
>>>>> amd_powerplay.c::pp_dpm_switch_power_profile()
>>>>> *snip code*
>>>>> hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]);
>>>>>           index = fls(hwmgr->workload_mask);
>>>>>           index = index <= Workload_Policy_Max ? index - 1 : 0;
>>>>>           workload = hwmgr->workload_setting[index];
>>>>> *snip_code*
>>>>> hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, &workload, 0);
>>>>>
>>>>> Here I can see that the new workload mask is appended into the 
>>>>> existing
>>>>> workload mask (not overwritten). So if we keep sending new
>>>>> workload_modes, they would be appended into the workload flags and
>>>>> finally the PM will pick the most aggressive one of all these 
>>>>> flags, as
>>>>> per its policy.
>>>>>
>>>>
>>>> Actually it's misleading -
>>>>
>>>> The path for sienna is -
>>>> set_power_profile_mode -> sienna_cichlid_set_power_profile_mode
>>>>
>>>>
>>>> This code here is a picking one based on lookup table.
>>>>
>>>>    workload_type = smu_cmn_to_asic_specific_index(smu,
>>>>
>>>> CMN2ASIC_MAPPING_WORKLOAD,
>>>>
>>>> smu->power_profile_mode);
>>>>
>>>> This is that lookup table -
>>>>
>>>> static struct cmn2asic_mapping
>>>> sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = {
>>>>           WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT,
>>>> WORKLOAD_PPLIB_DEFAULT_BIT),
>>>>           WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D,
>>>> WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT),
>>>>           WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING,
>>>> WORKLOAD_PPLIB_POWER_SAVING_BIT),
>>>>           WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO,
>>>> WORKLOAD_PPLIB_VIDEO_BIT),
>>>>           WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR,
>>>> WORKLOAD_PPLIB_VR_BIT),
>>>>           WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE,
>>>> WORKLOAD_PPLIB_COMPUTE_BIT),
>>>>           WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM,
>>>> WORKLOAD_PPLIB_CUSTOM_BIT),
>>>> };
>>>>
>>>>
>>>> And this is the place of interaction with PMFW. (1 << workload_type) is
>>>> the mask being sent.
>>>>
>>>>          smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
>>>>                                       1 << workload_type, NULL);
>>>>
>>>> In the end, driver implementation expects only one bit to be set.
>>>
>>> Shashank and I had a discussion about this today.  I think there are a
>>> few thing we can do to handle this better:
>>>
>>> 1. Set a flag that if the user changes the default via sysfs that
>>> overrides any runtime setting via an application since presumably that
>>> is what the user wants and we won't change the hint at runtime.
>>> 2. Drop the GET API.  There's no need for this, the hint is just a hint.
>>
>> Double checked again based on Felix's comments on API definition. 
>> Driver decides the priority instead of FW. That way we can still keep 
>> Get API.
>>
>>> 2. Have the driver arbitrate between the available workload profiles
>>> based on the numeric value of the hint (e.g., default < 3D < video <
>>> VR < compute) as the higher values are more aggressive in most cases.
>>> If requests come in for 3D and compute at the same time, the driver
>>> will select compute because it's value is highest.  Each hint type
>>> would be ref counted so we'd know what state to be in every time we go
>>> to set the state.  If all of the clients requesting compute go away,
>>> and only 3D requestors remain, we can switch to 3D.  If all refcounts
>>> go to 0, we go back to default.  This will not require any change to
>>> the current workload API in the SMU code.
>>
>> Since PM layer decides priority, refcount can be kept at powerplay and 
>> swsmu layer instead of any higher level API.
>>
>> User API may keep something like req_power_profile (for any 
>> logging/debug purpose) for the job preference.
> 
> No, I think there has been enough confusion around this implementation 
> so far, we will implement this just as Alex/Felix suggested:
> - No change will be done in pm/SMU layer.

Well, a confusion doesn't justify bad implementation. You could just 
keep the refcount in workload_setting.

Another API that uses power profile indirectly also will need to take 
care of refcount and we don't need every other API to do that separately 
without knowing what is the final outcome.

Thanks,
Lijo

> - The amdgpu_context_workload layer will keep the ref_counting and 
> user_workload_hint management, and it will just call and consume the 
> pm_switch_workload profile() like any other client.

> - We will add a force flag for calls coming from sysfs() interface, and 
> it will take the highest priority. No state machine will be managed for 
> sysfs, and it will work as it is working today.
> 
> - Shashank
> 
>>
>> Thanks,
>> Lijo
>>
>>>
>>> Alex
>>>
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>>> Now, when we have a single workload:
>>>>> -> Job1: requests profile P1 via UAPI, ref count = 1
>>>>> -> driver sends flags for p1
>>>>> -> PM FW applies profile P1
>>>>> -> Job executes in profile P1
>>>>> -> Job goes to reset function, ref_count = 0,
>>>>> -> Power profile resets
>>>>>
>>>>> Now, we have conflicts only when we see multiple workloads (Job1 
>>>>> and Job 2)
>>>>> -> Job1: requests profile P1 via UAPI, ref count = 1
>>>>> -> driver sends flags for p1
>>>>> -> PM FW applies profile P1
>>>>> -> Job executes in profile P1
>>>>> -> Job2: requests profile P2 via UAPI, refcount = 2
>>>>> -> driver sends flags for (P1|P2)
>>>>> -> PM FW picks the more aggressive of the two (Say P1, stays in P1)
>>>>> -> Job1 goes to reset function, ref_count = 1, job1 does not reset 
>>>>> power
>>>>> profile
>>>>> -> Job2 goes to reset function, ref_counter = 2, job 2 resets Power 
>>>>> profile
>>>>> -> Power profile resets to None
>>>>>
>>>>> So this state machine looks like if there is only 1 job, it will be
>>>>> executed in desired mode. But if there are multiple, the most 
>>>>> aggressive
>>>>> profile will be picked, and every job will be executed in atleast the
>>>>> requested power profile mode or higher.
>>>>>
>>>>> Do you find any problem so far ?
>>>>>
>>>>> - Shashank
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Lijo