[Intel-gfx] [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation

Wed Dec 9 11:35:26 PST 2015

On 09/12/15 12:46, ankitprasad.r.sharma at intel.com wrote:
> From: Chris Wilson <chris at chris-wilson.co.uk>
>
> Ville reminded us that stolen memory is not preserved across
> hibernation, and a result of this was that context objects now being
> allocated from stolen were being corrupted on S4 and promptly hanging
> the GPU on resume.
>
> We want to utilise stolen for as much as possible (nothing else will use
> that wasted memory otherwise), so we need a strategy for handling
> general objects allocated from stolen and hibernation. A simple solution
> is to do a CPU copy through the GTT of the stolen object into a fresh
> shmemfs backing store and thenceforth treat it as a normal objects. This
> can be refined in future to either use a GPU copy to avoid the slow
> uncached reads (though it's hibernation!) and recreate stolen objects
> upon resume/first-use. For now, a simple approach should suffice for
> testing the object migration.
>
> v2:
> Swap PTE for pinned bindings over to the shmemfs. This adds a
> complicated dance, but is required as many stolen objects are likely to
> be pinned for use by the hardware. Swapping the PTEs should not result
> in externally visible behaviour, as each PTE update should be atomic and
> the two pages identical. (danvet)
>
> safe-by-default, or the principle of least surprise. We need a new flag
> to mark objects that we can wilfully discard and recreate across
> hibernation. (danvet)
>
> Just use the global_list rather than invent a new stolen_list. This is
> the slowpath hibernate and so adding a new list and the associated
> complexity isn't worth it.
>
> v3: Rebased on drm-intel-nightly (Ankit)
>
> v4: Use insert_page to map stolen memory backed pages for migration to
> shmem (Chris)
>
> v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma at intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
>   drivers/gpu/drm/i915/i915_drv.h         |   7 +
>   drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_display.c    |   3 +
>   drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
>   drivers/gpu/drm/i915/intel_pm.c         |   2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
>   7 files changed, 261 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 9f55209..2bb9e9e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
>   	return i915_drm_suspend(drm_dev);
>   }
>
> +static int i915_pm_freeze(struct device *dev)
> +{
> +	int ret;
> +
> +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> +	if (ret)
> +		return ret;
> +
> +	ret = i915_pm_suspend(dev);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
>   static int i915_pm_suspend_late(struct device *dev)
>   {
>   	struct drm_device *drm_dev = dev_to_i915(dev)->dev;
> @@ -1700,7 +1715,7 @@ static const struct dev_pm_ops i915_pm_ops = {
>   	 * @restore, @restore_early : called after rebooting and restoring the
>   	 *                            hibernation image [PMSG_RESTORE]
>   	 */
> -	.freeze = i915_pm_suspend,
> +	.freeze = i915_pm_freeze,
>   	.freeze_late = i915_pm_suspend_late,
>   	.thaw_early = i915_pm_resume_early,
>   	.thaw = i915_pm_resume,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e0b09b0..0d18b07 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2080,6 +2080,12 @@ struct drm_i915_gem_object {
>   	 * Advice: are the backing pages purgeable?
>   	 */
>   	unsigned int madv:2;
> +	/**
> +	 * Whereas madv is for userspace, there are certain situations
> +	 * where we want I915_MADV_DONTNEED behaviour on internal objects
> +	 * without conflating the userspace setting.
> +	 */
> +	unsigned int internal_volatile:1;

Does this new flag need to be examined by other code that currently 
checks 'madv', e.g. put_pages() ? Or does this indicate 
not-really-volatile-in-normal-use-only-across-hibernation ?

.Dave.