[Intel-xe] [PATCH v3 4/5] xe/drm/pm: Toggle d3cold_allowed using vram_usages

Wed Jul 5 11:10:06 UTC 2023


On 7/5/2023 12:32 PM, Gupta, Anshuman wrote:
> 
> 
>> -----Original Message-----
>> From: Tauro, Riana <riana.tauro at intel.com>
>> Sent: Tuesday, July 4, 2023 11:34 AM
>> To: Gupta, Anshuman <anshuman.gupta at intel.com>; intel-
>> xe at lists.freedesktop.org
>> Cc: Nilawar, Badal <badal.nilawar at intel.com>; Vivi, Rodrigo
>> <rodrigo.vivi at intel.com>; Sundaresan, Sujaritha
>> <sujaritha.sundaresan at intel.com>; Brost, Matthew
>> <matthew.brost at intel.com>
>> Subject: Re: [PATCH v3 4/5] xe/drm/pm: Toggle d3cold_allowed using
>> vram_usages
>>
>> Hi Anshuman
>>
>> On 6/27/2023 5:26 PM, Anshuman Gupta wrote:
>>> Adding support to control d3cold by using vram_usages metric from ttm
>>> resource manager.
>>> When root port  is capable of d3cold but xe has disallowed d3cold due
>>> to vrame_usages above vram_d3ccold_threshol. It is required to disable
>>> d3cold to avoid any resume failure because root port can still
>>> transition to d3cold when all of pcie endpoints and {upstream,
>>> virtual} switch ports will transition to d3hot.
>>> Also cleaning up the TODO code comment.
>>>
>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>> Signed-off-by: Anshuman Gupta <anshuman.gupta at intel.com>
>>> Reviewed-by: Badal Nilawar <badal.nilawar at intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_pci.c | 27 ++++++++++++++++++++++++---
>>>    drivers/gpu/drm/xe/xe_pm.c  | 29 +++++++++++++++++++++++++++++
>>>    drivers/gpu/drm/xe/xe_pm.h  |  1 +
>>>    3 files changed, 54 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>>> index 848de5dcdaa5..78e906607188 100644
>>> --- a/drivers/gpu/drm/xe/xe_pci.c
>>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>>> @@ -746,6 +746,24 @@ static int xe_pci_resume(struct device *dev)
>>>    	return 0;
>>>    }
>>>
>>> +static void d3cold_toggle(struct pci_dev *pdev, bool enable) {
>>> +	struct xe_device *xe = pdev_to_xe_device(pdev);
>>> +	struct pci_dev *root_pdev;
>>> +
>>> +	if (!xe->d3cold.capable)
>>> +		return;
>>> +
>>> +	root_pdev = pcie_find_root_port(pdev);
>>> +	if (!root_pdev)
>>> +		return;
>>> +
>>> +	if (enable)
>>> +		pci_d3cold_enable(root_pdev);
>>> +	else
>>> +		pci_d3cold_disable(root_pdev);
>>> +}
>>> +
>>>    static int xe_pci_runtime_suspend(struct device *dev)
>>>    {
>>>    	struct pci_dev *pdev = to_pci_dev(dev); @@ -763,6 +781,7 @@
>> static
>>> int xe_pci_runtime_suspend(struct device *dev)
>>>    		pci_ignore_hotplug(pdev);
>>>    		pci_set_power_state(pdev, PCI_D3cold);
>>>    	} else {
>>> +		d3cold_toggle(pdev, false);
>>>    		pci_set_power_state(pdev, PCI_D3hot);
>>>    	}
>>>
>>> @@ -787,6 +806,8 @@ static int xe_pci_runtime_resume(struct device
>> *dev)
>>>    			return err;
>>>
>>>    		pci_set_master(pdev);
>>> +	} else {
>>> +		d3cold_toggle(pdev, true);
>>>    	}
>>>
>>>    	return xe_pm_runtime_resume(xe);
>>> @@ -800,15 +821,15 @@ static int xe_pci_runtime_idle(struct device *dev)
>>>    	if (!xe->d3cold.capable) {
>>>    		xe->d3cold.allowed = false;
>>>    	} else {
>>> +		xe->d3cold.allowed = xe_pm_vram_d3cold_allowed(xe);
>>> +
>>>    		/*
>>>    		 * TODO: d3cold should be allowed (true) if
>>>    		 * (IS_DGFX(xe) && !xe_device_mem_access_ongoing(xe))
>>>    		 * but maybe include some other conditions. So, before
>>>    		 * we can re-enable the D3cold, we need to:
>>>    		 * 1. rewrite the VRAM save / restore to avoid buffer object
>> locks
>>> -		 * 2. block D3cold if we have a big amount of device memory
>> in use
>>> -		 *    in order to reduce the latency.
>>> -		 * 3. at resume, detect if we really lost power and avoid
>> memory
>>> +		 * 2. at resume, detect if we really lost power and avoid
>> memory
>>>    		 *    restoration if we were only up to d3cold
>>>    		 */
>>>    		xe->d3cold.allowed = false;
>>> diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
>>> index 7028c9b6e94c..4db4e5a1b051 100644
>>> --- a/drivers/gpu/drm/xe/xe_pm.c
>>> +++ b/drivers/gpu/drm/xe/xe_pm.c
>>> @@ -277,3 +277,32 @@ int xe_pm_set_vram_threshold(struct xe_device
>>> *xe, u32 threshold)
>>>
>>>    	return 0;
>>>    }
>>> +
>>> +bool xe_pm_vram_d3cold_allowed(struct xe_device *xe) {
>>> +	struct ttm_resource_manager *man;
>>> +	u32 total_vram_used_mb = 0;
>>> +	bool allowed;
>>> +	u64 vram_used;
>>> +	int i;
>>> +
>>> +	/* TODO: Extend the logic to beyond XE_PL_VRAM1 */
>>> +	for (i = XE_PL_VRAM0; i <= XE_PL_VRAM1; ++i) {
>>> +		man = ttm_manager_type(&xe->ttm, i);
>>> +		if (man) {
>>> +			vram_used = ttm_resource_manager_usage(man);
>>> +			total_vram_used_mb +=
>> DIV_ROUND_UP_ULL(vram_used, 1024 * 1024);
>>> +		}
>>> +	}
>>> +
>>> +	mutex_lock(&xe->d3cold.lock);
>>> +
>>> +	if (total_vram_used_mb <= xe->d3cold.vram_threshold)
>>> +		allowed = true;
>> Can't xe->d3cold.allowed be directly modified here? There's also a lock
>> around the code
> patch have lock to protect the vram_threshold condition, it is not to proect the d3cold.allowed. I can assign d3cold.allowed here but that would require to change the function name to
> xe_pm_vram_toggle_d3cold_allow().
Not required to change.

LGTM
Reviewed-by: Riana Tauro <riana.tauro at intel.com>
> Br,
> Anshuman Gupta
>>
>> Thanks
>> Riana
>>> +	else
>>> +		allowed = false;
>>> +
>>> +	mutex_unlock(&xe->d3cold.lock);
>>> +
>>> +	return allowed;
>>> +}
>>> diff --git a/drivers/gpu/drm/xe/xe_pm.h b/drivers/gpu/drm/xe/xe_pm.h
>>> index b50ec8bdce6f..08612cf3e67b 100644
>>> --- a/drivers/gpu/drm/xe/xe_pm.h
>>> +++ b/drivers/gpu/drm/xe/xe_pm.h
>>> @@ -24,5 +24,6 @@ int xe_pm_runtime_put(struct xe_device *xe);
>>>    bool xe_pm_runtime_resume_if_suspended(struct xe_device *xe);
>>>    int xe_pm_runtime_get_if_active(struct xe_device *xe);
>>>    int xe_pm_set_vram_threshold(struct xe_device *xe, u32 threshold);
>>> +bool xe_pm_vram_d3cold_allowed(struct xe_device *xe);
>>>
>>>    #endif