[Intel-xe] [PATCH 2/2] RFC drm/xe: Disable GuC CT communication for D3Hot Transition

Thu Oct 12 07:39:18 UTC 2023

Hi Rodrigo

Thanks for the review.

On 9/22/2023 12:07 AM, Rodrigo Vivi wrote:
> On Thu, Aug 31, 2023 at 03:21:18PM -0400, Rodrigo Vivi wrote:
>> On Thu, Aug 31, 2023 at 12:02:31PM -0700, Ceraolo Spurio, Daniele wrote:
>>>
>>>
>>> On 8/22/2023 10:09 PM, Riana Tauro wrote:
>>>> During Runtime suspend, GuC is reset for both D0->D3hot/D3Cold
>>>> transistions. It is not necessary for GuC to reset for D0 -> D3hot,
>>>> only enable/disable ctb communication.
>>>>
>>>> Add a function that enables/disables CT communication when d3cold
>>>> is not allowed.
>>>>
>>>> Signed-off-by: Riana Tauro <riana.tauro at intel.com>
>>>
>>> xe_gt_suspend and xe_gt_resume do more things than just resetting the GuC
>>> (e.g. marking submission as disabled). Shouldn't we need at least some of
>>> that in the runtime suspend/resume scenario as well?
>>
>> I was asking myself the same thing...
> 
> I take it back. on the current flow, if we have a chance of receiving a command
> submission we shouldn't allow the runtime suspend. so it should be totally safe
> to leave it there.
> 
> Given the latency data Riana shared with me offline we can see around 60ms of
> latency saved here.
> 
> I was also wondering about the overall package power savings difference here.
> Did you checked that?

Measured power using the hwmon library

Before patch and after patch (34 ~ 35W) on DG2
Don't see much difference

> 
> Only thing we need to change is that now we have the pmu in the suspend/resume
> path and we probably want that on runtime variants as well.
Will add this in next rev

Thanks
Riana Tauro

> 
> Anyway, let's move forward with this.
> 
> Acked-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> 
>>
>>>
>>> Also, which paths and how much we're actually looking to optimize? Are we
>>> interested in optimizing the full suspend as well?
>>> To elaborate, we're not actually killing the GuC in the suspend flow, we're
>>> just resetting its SW state (i.e. putting it back as if it had just been
>>> loaded); the actual reset happens in the resume path in xe_uc_init_hw. If
>>> the GuC and the LMEM have survived the suspend/resume flow, we could
>>
>> On S3/S2idle/S4, GuC and LMEM will lose power.
>>
>>> theoretically just restart the SW side of things (re-register the CTBs and
>>> start SLPC) in the resume path instead of resetting and reloading the FW;
>>> this would be a lesser benefit to the runtime path compared to what you're
>>> proposing, but it could benefit the full resume path as well and it'd have
>>> the added benefit of keeping the 2 paths the same. I'm not sure though if
>>> the LMEM management could end up breaking this approach by moving the memory
>>> around during suspend.
>>
>> this memory movement is indeed the hardest part with all the locking...
>>
>>>
>>>
>>> A couple of minor comments inline
>>>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_gt.c | 56 ++++++++++++++++++++++++++++++++++++++
>>>>    drivers/gpu/drm/xe/xe_gt.h |  2 ++
>>>>    drivers/gpu/drm/xe/xe_pm.c | 10 +++++--
>>>>    drivers/gpu/drm/xe/xe_uc.c | 18 ++++++++++++
>>>>    drivers/gpu/drm/xe/xe_uc.h |  2 ++
>>>>    5 files changed, 85 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>>> index 13320af4ddd3..0d52621ce64d 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>>>> @@ -676,6 +676,62 @@ int xe_gt_resume(struct xe_gt *gt)
>>>>    	return err;
>>>>    }
>>>> +int xe_gt_runtime_suspend(struct xe_gt *gt)
>>>> +{
>>>> +	struct xe_device *xe = gt_to_xe(gt);
>>>> +	int err;
>>>> +
>>>> +	if (xe->d3cold.allowed)
>>>> +		return xe_gt_suspend(gt);
>>>> +
>>>> +	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>> +	if (err)
>>>> +		return err;
>>>> +
>>>> +	err = xe_uc_disable_communication(&gt->uc);
>>>> +	if (err)
>>>> +		goto err_force_wake;
>>>
>>> uc_stop() might already cover what's needed (or could be expanded to do so),
>>> although unfortunately uc_start seems to not be matching and therefore not
>>> usable as-is for the resume side.
>>>
>>>> +
>>>> +	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>>>> +	xe_gt_info(gt, "suspended\n");
>>>> +
>>>> +	return 0;
>>>> +
>>>> +err_force_wake:
>>>> +	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>>>> +	xe_gt_err(gt, "suspend failed (%pe)\n", ERR_PTR(err));
>>>> +
>>>> +	return err;
>>>> +}
>>>> +
>>>> +int xe_gt_runtime_resume(struct xe_gt *gt)
>>>> +{
>>>> +	struct xe_device *xe = gt_to_xe(gt);
>>>> +	int err;
>>>> +
>>>> +	if (xe->d3cold.allowed)
>>>> +		return xe_gt_resume(gt);
>>>> +
>>>> +	err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>> +	if (err)
>>>> +		return err;
>>>> +
>>>> +	err = xe_uc_resume(&gt->uc);
>>>> +	if (err)
>>>> +		goto err_force_wake;
>>>> +
>>>> +	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>>>> +	xe_gt_info(gt, "resumed\n");
>>>> +
>>>> +	return 0;
>>>> +
>>>> +err_force_wake:
>>>> +	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>>>> +	xe_gt_err(gt, "resume failed (%pe)\n", ERR_PTR(err));
>>>> +
>>>> +	return err;
>>>> +}
>>>> +
>>>>    struct xe_hw_engine *xe_gt_hw_engine(struct xe_gt *gt,
>>>>    				     enum xe_engine_class class,
>>>>    				     u16 instance, bool logical)
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
>>>> index caded203a8a0..e6574e51004f 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt.h
>>>> +++ b/drivers/gpu/drm/xe/xe_gt.h
>>>> @@ -37,6 +37,8 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt);
>>>>    void xe_gt_suspend_prepare(struct xe_gt *gt);
>>>>    int xe_gt_suspend(struct xe_gt *gt);
>>>>    int xe_gt_resume(struct xe_gt *gt);
>>>> +int xe_gt_runtime_suspend(struct xe_gt *gt);
>>>> +int xe_gt_runtime_resume(struct xe_gt *gt);
>>>>    void xe_gt_reset_async(struct xe_gt *gt);
>>>>    void xe_gt_sanitize(struct xe_gt *gt);
>>>> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
>>>> index 0f06d8304e17..6bc01bb45fc2 100644
>>>> --- a/drivers/gpu/drm/xe/xe_pm.c
>>>> +++ b/drivers/gpu/drm/xe/xe_pm.c
>>>> @@ -245,7 +245,7 @@ int xe_pm_runtime_suspend(struct xe_device *xe)
>>>>    	}
>>>>    	for_each_gt(gt, xe, id) {
>>>> -		err = xe_gt_suspend(gt);
>>>> +		err = xe_gt_runtime_suspend(gt);
>>>>    		if (err)
>>>>    			goto out;
>>>>    	}
>>>> @@ -294,14 +294,18 @@ int xe_pm_runtime_resume(struct xe_device *xe)
>>>>    	xe_irq_resume(xe);
>>>> -	for_each_gt(gt, xe, id)
>>>> -		xe_gt_resume(gt);
>>>> +	for_each_gt(gt, xe, id) {
>>>> +		err = xe_gt_runtime_resume(gt);
>>>> +		if (err)
>>>> +			goto out;
>>>> +	}
>>>>    	if (xe->d3cold.allowed && xe->d3cold.power_lost) {
>>>>    		err = xe_bo_restore_user(xe);
>>>>    		if (err)
>>>>    			goto out;
>>>>    	}
>>>> +
>>>>    out:
>>>>    	lock_map_release(&xe_device_mem_access_lockdep_map);
>>>>    	xe_pm_write_callback_task(xe, NULL);
>>>> diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
>>>> index addd6f2681b9..b5b53c8c3edb 100644
>>>> --- a/drivers/gpu/drm/xe/xe_uc.c
>>>> +++ b/drivers/gpu/drm/xe/xe_uc.c
>>>> @@ -216,6 +216,15 @@ static void uc_reset_wait(struct xe_uc *uc)
>>>>    		goto again;
>>>>    }
>>>> +int xe_uc_disable_communication(struct xe_uc *uc)
>>>> +{
>>>> +	/* GuC submission not enabled, nothing to do */
>>>> +	if (!xe_device_guc_submission_enabled(uc_to_xe(uc)))
>>>> +		return 0;
>>>> +
>>>> +	return xe_guc_disable_communication(&uc->guc);
>>>> +}
>>>> +
>>>>    int xe_uc_suspend(struct xe_uc *uc)
>>>>    {
>>>>    	int ret;
>>>> @@ -232,3 +241,12 @@ int xe_uc_suspend(struct xe_uc *uc)
>>>>    	return xe_guc_suspend(&uc->guc);
>>>>    }
>>>> +
>>>> +int xe_uc_resume(struct xe_uc *uc)
>>>
>>> This should be called runtime_resume.
>>>
>>> Daniele
>>>
>>>> +{
>>>> +	/* GuC submission not enabled, nothing to do */
>>>> +	if (!xe_device_guc_submission_enabled(uc_to_xe(uc)))
>>>> +		return 0;
>>>> +
>>>> +	return xe_guc_enable_communication(&uc->guc);
>>>> +}
>>>> diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
>>>> index 42219b361df5..29bd692d8800 100644
>>>> --- a/drivers/gpu/drm/xe/xe_uc.h
>>>> +++ b/drivers/gpu/drm/xe/xe_uc.h
>>>> @@ -12,8 +12,10 @@ int xe_uc_init(struct xe_uc *uc);
>>>>    int xe_uc_init_hwconfig(struct xe_uc *uc);
>>>>    int xe_uc_init_post_hwconfig(struct xe_uc *uc);
>>>>    int xe_uc_init_hw(struct xe_uc *uc);
>>>> +int xe_uc_disable_communication(struct xe_uc *uc);
>>>>    void xe_uc_gucrc_disable(struct xe_uc *uc);
>>>>    int xe_uc_reset_prepare(struct xe_uc *uc);
>>>> +int xe_uc_resume(struct xe_uc *uc);
>>>>    void xe_uc_stop_prepare(struct xe_uc *uc);
>>>>    int xe_uc_stop(struct xe_uc *uc);
>>>>    int xe_uc_start(struct xe_uc *uc);
>>>