[Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap mappings on rpm suspend

Fri Sep 1 21:16:26 UTC 2023

On Fri, 2023-09-01 at 03:04 +0000, Gupta, Anshuman wrote:
> 
> 
> > -----Original Message-----
> > From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > Sent: Friday, September 1, 2023 2:49 AM
> > To: Gupta, Anshuman <anshuman.gupta at intel.com>
> > Cc: Nilawar, Badal <badal.nilawar at intel.com>; intel-
> > xe at lists.freedesktop.org; Auld, Matthew <matthew.auld at intel.com>
> > Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap
> > mappings
> > on rpm suspend
> > 
> > On Thu, Aug 31, 2023 at 01:23:40AM -0400, Gupta, Anshuman wrote:
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > > > Sent: Thursday, August 31, 2023 2:35 AM
> > > > To: Nilawar, Badal <badal.nilawar at intel.com>
> > > > Cc: Gupta, Anshuman <anshuman.gupta at intel.com>; intel-
> > > > xe at lists.freedesktop.org; Auld, Matthew
> > > > <matthew.auld at intel.com>
> > > > Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap
> > > > mappings on rpm suspend
> > > > 
> > > > On Mon, Aug 28, 2023 at 10:00:31PM +0530, Nilawar, Badal wrote:
> > > > > 
> > > > > 
> > > > > On 28-08-2023 17:46, Gupta, Anshuman wrote:
> > > > > > 
> > > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: Nilawar, Badal <badal.nilawar at intel.com>
> > > > > > > Sent: Thursday, August 24, 2023 11:16 PM
> > > > > > > To: intel-xe at lists.freedesktop.org
> > > > > > > Cc: Gupta, Anshuman <anshuman.gupta at intel.com>; Auld,
> > Matthew
> > > > > > > <matthew.auld at intel.com>; Vivi, Rodrigo
> > > > > > > <rodrigo.vivi at intel.com>
> > > > > > > Subject: [RFC PATCH] drm/xe/dgfx: Release mmap mappings
> > > > > > > on
> > rpm
> > > > > > > suspend
> > > > > > > 
> > > > > > > Release all mmap mappings for all vram objects which are
> > > > > > > associated with userfault such that, while pcie function
> > > > > > > in
> > > > > > > D3hot, any access to memory mappings will raise a
> > > > > > > userfault.
> > > > > > > 
> > > > > > > Upon userfault, in order to access memory mappings, if
> > > > > > > graphics function is in
> > > > > > > D3 then runtime resume of dgpu will be triggered to
> > > > > > > transition to
> > D0.
> > > > > > IMO we need a configurable threshold to control the
> > > > > > behavior of
> > > > > > mmap mappings Invalidation, if vram usages is crosses to
> > > > > > certain
> > > > > > threshold, disable the runtime PM for entire life time of
> > > > > > mapping.
> > > > > Agreed. Other option could be disable rpm on server descrete
> > > > > graphics for entire life time of mapping. But mainitaning
> > > > > threshold is more promising and gives control to user.
> > > > 
> > > > what use cases we have here for this?
> > > > I believe that for discrete we could entirely block rpm if we
> > > > have
> > > > display or if we have shared dma_buf. any other case we should
> > > > handle?
> > > If Discrete is used for display then anyhow display is going to
> > > block runtime
> > PM completely(be it PSR or Non-PSR).
> > > The use case is with display turned off or with hybrid gpu use
> > > case.
> > > Currently on Xe we are missing to have mem access ref count on
> > > mmap
> > mapping and therefore mmap for vram bo is broken.
> > > dma-buf will be also the use case in hybrid gpu use case.
> > 
> > right. unfortunately we don't have the unmap callback.
> > We could maybe get the reference on a new xe_gem_object_mmap and
> > just release at xe_ttm_bo_destroy or maybe at
> > xe_ttm_bo_delete_mem_notify?
> > 
> > for the dma_buf we could probably hook the get/put to the
> > attach/detach?
> Hi Rodrigo,
> attach/detach is already protected by mem access get/put but in my
> opinion  that will going to burn more power on dgpu in hybrid case as
> explained below,
> idle display on (display pipeline from igpu and framebuffer is
> imported  from dgpu) is going to burn more power for dgpu when it is
> completely idle (even igpu is idle, if it uses psr panel to refresh
> then platform can go to s0ix).
> Is above behavior expected ?  if not then we will need this patch.

well, I'm trying to step back here and think about the use case.
PSR is only a thing in eDP... so, it needs to be a laptop with both
integrated and discrete where likely OEM plugged the eDP on iGFX to
get better power savings.

but then, the user wants to play some high-end gaming and use prime
to fwd the discrete output to the eDP.

But then user leaves the screen on idle most of the time instead of
doing the game and burn power? :/

likely this informed user could also revert the screen redirection
when not really using the discrete for gaming? otherwise this user
is burning power one way or another right?!

Also, on this PSR case here, we would go idle allow the D3cold on
discrete every idle second and do the trick to get the page-fault
and enable everything to bring the 'primed' display back on time?

I can see so many issues coming out of this scenario, that I tend
to prefer to simply avoid the RPM if user explicitly redirected the
screen.

We should keep it simple.

> Thanks,
> Anshuman Gupta.
> > 
> > > 
> > > Thanks,
> > > Anshuman Gupta.
> > > > 
> > > > > 
> > > > > Regards,
> > > > > Badal
> > > > > > Thanks,
> > > > > > Anshuman Gupta
> > > > > > > 
> > > > > > > Cc: Matthew Auld <matthew.auld at intel.com>
> > > > > > > Cc: Anshuman Gupta <anshuman.gupta at intel.com>
> > > > > > > Signed-off-by: Badal Nilawar <badal.nilawar at intel.com>
> > > > > > > ---
> > > > > > >   drivers/gpu/drm/xe/xe_bo.c           | 53
> > > > ++++++++++++++++++++++++++--
> > > > > > >   drivers/gpu/drm/xe/xe_bo.h           |  2 ++
> > > > > > >   drivers/gpu/drm/xe/xe_bo_types.h     |  6 ++++
> > > > > > >   drivers/gpu/drm/xe/xe_device_types.h | 20 +++++++++++
> > > > > > >   drivers/gpu/drm/xe/xe_pm.c           |  7 ++++
> > > > > > >   5 files changed, 85 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > b/drivers/gpu/drm/xe/xe_bo.c index
> > > > > > > 1ab682d61e3c..4192bfcd8013
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > @@ -776,6 +776,18 @@ static int xe_bo_move(struct
> > > > > > > ttm_buffer_object *ttm_bo, bool evict,
> > > > > > >                 dma_fence_put(fence);
> > > > > > >         }
> > > > > > > 
> > > > > > > +       /*
> > > > > > > +       * TTM has already nuked the mmap for us (see
> > > > > > > ttm_bo_unmap_virtual),
> > > > > > > +       * so if we moved from VRAM make sure to unlink
> > > > > > > this from
> > the
> > > > > > > userfault
> > > > > > > +       * tracking.
> > > > > > > +       */
> > > > > > > +       if (mem_type_is_vram(old_mem_type)) {
> > > > > > > +               spin_lock(&xe-
> > > > > > > >mem_access.vram_userfault_lock);
> > > > > > > +               if (!list_empty(&bo-
> > > > > > > >vram_userfault_link))
> > > > > > > +                       list_del_init(&bo-
> > > > > > > >vram_userfault_link);
> > > > > > > +               spin_unlock(&xe-
> > > mem_access.vram_userfault_lock);
> > > > > > > +       }
> > > > > > > +
> > > > > > >         xe_device_mem_access_put(xe);
> > > > > > >         trace_printk("new_mem->mem_type=%d\n", new_mem-
> > > > > > > > mem_type);
> > > > > > > 
> > > > > > > @@ -1100,6 +1112,8 @@ static vm_fault_t
> > > > > > > xe_gem_fault(struct
> > > > > > > vm_fault
> > > > > > > *vmf)  {
> > > > > > >         struct ttm_buffer_object *tbo = vmf->vma-
> > > vm_private_data;
> > > > > > >         struct drm_device *ddev = tbo->base.dev;
> > > > > > > +       struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > > > > > > +       struct xe_device *xe = to_xe_device(ddev);
> > > > > > >         vm_fault_t ret;
> > > > > > >         int idx, r = 0;
> > > > > > > 
> > > > > > > @@ -1107,9 +1121,10 @@ static vm_fault_t
> > > > > > > xe_gem_fault(struct
> > > > > > > vm_fault
> > > > > > > *vmf)
> > > > > > >         if (ret)
> > > > > > >                 return ret;
> > > > > > > 
> > > > > > > -       if (drm_dev_enter(ddev, &idx)) {
> > > > > > > -               struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > > > > > > +       if (tbo->resource->bus.is_iomem)
> > > > > > > +               xe_device_mem_access_get(xe);
> > > > > > > 
> > > > > > > +       if (drm_dev_enter(ddev, &idx)) {
> > > > > > >                 trace_xe_bo_cpu_fault(bo);
> > > > > > > 
> > > > > > >                 if (should_migrate_to_system(bo)) { @@ -
> > > > > > > 1127,10
> > +1142,25
> > > > @@
> > > > > > > static vm_fault_t xe_gem_fault(struct vm_fault
> > > > > > > *vmf)
> > > > > > >         } else {
> > > > > > >                 ret = ttm_bo_vm_dummy_page(vmf, vmf->vma-
> > > > > > > > vm_page_prot);
> > > > > > >         }
> > > > > > > +
> > > > > > >         if (ret == VM_FAULT_RETRY && !(vmf->flags &
> > > > > > > FAULT_FLAG_RETRY_NOWAIT))
> > > > > > > -               return ret;
> > > > > > > +               goto out_rpm;
> > > > > > > +       /*
> > > > > > > +        * ttm_bo_vm_reserve() already has dma_resv_lock.
> > > > > > > +        * vram_userfault_count is protected by dma_resv
> > > > > > > lock and
> > > > > > > +rpm
> > > > > > > wakeref.
> > > > > > > +        */
> > > > > > > +       if (ret == VM_FAULT_NOPAGE &&
> > > > > > > xe_device_mem_access_ongoing(xe) && !bo-
> > > vram_userfault_count)
> > > > {
> > > > > > > +               bo->vram_userfault_count = 1;
> > > > > > > +               spin_lock(&xe-
> > > > > > > >mem_access.vram_userfault_lock);
> > > > > > > +               list_add(&bo->vram_userfault_link, &xe-
> > > > > > > > mem_access.vram_userfault_list);
> > > > > > > +               spin_unlock(&xe-
> > > mem_access.vram_userfault_lock);
> > > > > > > 
> > > > > > > +               XE_WARN_ON(!tbo->resource->bus.is_iomem);
> > > > > > > +       }
> > > > > > >         dma_resv_unlock(tbo->base.resv);
> > > > > > > +out_rpm:
> > > > > > > +       if(tbo->resource->bus.is_iomem &&
> > > > > > > xe_device_mem_access_ongoing(xe))
> > > > > > > +               xe_device_mem_access_put(xe);
> > > > > > >         return ret;
> > > > > > >   }
> > > > > > > 
> > > > > > > @@ -2108,6 +2138,23 @@ int xe_bo_dumb_create(struct
> > > > > > > drm_file
> > > > > > > *file_priv,
> > > > > > >         return err;
> > > > > > >   }
> > > > > > > 
> > > > > > > +void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo
> > *bo) {
> > > > > > > +       struct ttm_buffer_object *tbo = &bo->ttm;
> > > > > > > +       struct ttm_device *bdev = tbo->bdev;
> > > > > > > +
> > > > > > > +       drm_vma_node_unmap(&tbo->base.vma_node, bdev-
> > > > > > > > dev_mapping);
> > > > > > > +
> > > > > > > +       /*
> > > > > > > +        * We have exclusive access here via runtime
> > > > > > > suspend. All
> > > > > > > +other
> > > > > > > callers
> > > > > > > +        * must first grab the rpm wakeref.
> > > > > > > +        */
> > > > > > > +       XE_WARN_ON(!bo->vram_userfault_count);
> > > > > > > +       list_del(&bo->vram_userfault_link);
> > > > > > > +       bo->vram_userfault_count = 0; }
> > > > > > > +
> > > > > > > +
> > > > > > >   #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
> > > > > > >   #include "tests/xe_bo.c"
> > > > > > >   #endif
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > > > > > > b/drivers/gpu/drm/xe/xe_bo.h index
> > > > > > > 0823dda0f31b..6b86f326c700
> > > > > > > 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > > > > > @@ -247,6 +247,8 @@ int xe_gem_create_ioctl(struct
> > > > > > > drm_device
> > > > > > > *dev, void *data,
> > > > > > >                         struct drm_file *file);
> > > > > > >   int xe_gem_mmap_offset_ioctl(struct drm_device *dev,
> > > > > > > void
> > *data,
> > > > > > >                              struct drm_file *file);
> > > > > > > +void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo
> > *bo);
> > > > > > > +
> > > > > > >   int xe_bo_dumb_create(struct drm_file *file_priv,
> > > > > > >                       struct drm_device *dev,
> > > > > > >                       struct drm_mode_create_dumb *args);
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > > > b/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > > > index f6ee920303af..cdca91a378c4 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > > > @@ -68,6 +68,12 @@ struct xe_bo {
> > > > > > >         struct llist_node freed;
> > > > > > >         /** @created: Whether the bo has passed initial
> > > > > > > creation */
> > > > > > >         bool created;
> > > > > > > +       /**
> > > > > > > +        * Whether the object is currently in fake offset
> > > > > > > mmap
> > > > > > > +backed by
> > > > > > > vram.
> > > > > > > +        */
> > > > > > > +       unsigned int vram_userfault_count;
> > > > > > > +       struct list_head vram_userfault_link;
> > > > > > > +
> > > > > > >   };
> > > > > > > 
> > > > > > >   #endif
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > b/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > index 750e1f0d3339..c345fb483af1 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > @@ -328,6 +328,26 @@ struct xe_device {
> > > > > > >         struct {
> > > > > > >                 /** @ref: ref count of memory accesses */
> > > > > > >                 atomic_t ref;
> > > > > > > +               /*
> > > > > > > +                *  Protects access to vram usefault
> > > > > > > list.
> > > > > > > +                *  It is required, if we are outside of
> > > > > > > the runtime
> > > > > > > +suspend
> > > > > > > path,
> > > > > > > +                *  access to @vram_userfault_list
> > > > > > > requires always
> > first
> > > > > > > grabbing the
> > > > > > > +                *  runtime pm, to ensure we can't race
> > > > > > > against
> > runtime
> > > > > > > suspend.
> > > > > > > +                *  Once we have that we also need to
> > > > > > > grab
> > > > > > > @vram_userfault_lock,
> > > > > > > +                *  at which point we have exclusive
> > > > > > > access.
> > > > > > > +                *  The runtime suspend path is special
> > > > > > > since it
> > doesn't
> > > > > > > +really
> > > > > > > hold any locks,
> > > > > > > +                *  but instead has exclusive access by
> > > > > > > virtue of all
> > other
> > > > > > > accesses requiring
> > > > > > > +                *  holding the runtime pm wakeref.
> > > > > > > +                */
> > > > > > > +               spinlock_t vram_userfault_lock;
> > > > > > > +
> > > > > > > +               /*
> > > > > > > +                *  Keep list of userfaulted gem obj,
> > > > > > > which require to
> > > > > > > +release
> > > > > > > their
> > > > > > > +                *  mmap mappings at runtime suspend
> > > > > > > path.
> > > > > > > +                */
> > > > > > > +               struct list_head vram_userfault_list;
> > > > > > > +
> > > > > > > +               bool vram_userfault_ongoing;
> > > > > > >         } mem_access;
> > > > > > > 
> > > > > > >         /** @d3cold: Encapsulate d3cold related stuff */
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
> > > > > > > index 0f06d8304e17..51cde1db930e 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > > > > @@ -172,6 +172,8 @@ void xe_pm_init(struct xe_device *xe)
> > > > > > >         }
> > > > > > > 
> > > > > > >         xe_pm_runtime_init(xe);
> > > > > > > +       INIT_LIST_HEAD(&xe-
> > > > > > > >mem_access.vram_userfault_list);
> > > > > > > +       spin_lock_init(&xe-
> > > > > > > >mem_access.vram_userfault_lock);
> > > > > > >   }
> > > > > > > 
> > > > > > >   void xe_pm_runtime_fini(struct xe_device *xe) @@ -205,6
> > > > > > > +207,7 @@ struct task_struct
> > > > > > > *xe_pm_read_callback_task(struct
> > > > > > > xe_device
> > > > > > > *xe)
> > > > > > > 
> > > > > > >   int xe_pm_runtime_suspend(struct xe_device *xe)  {
> > > > > > > +       struct xe_bo *bo, *on;
> > > > > > >         struct xe_gt *gt;
> > > > > > >         u8 id;
> > > > > > >         int err = 0;
> > > > > > > @@ -238,6 +241,10 @@ int xe_pm_runtime_suspend(struct
> > > > > > > xe_device
> > > > *xe)
> > > > > > >          */
> > > > > > > 
> >         lock_map_acquire(&xe_device_mem_access_lockdep_map);
> > > > > > > 
> > > > > > > +       list_for_each_entry_safe(bo, on,
> > > > > > > +                                &xe-
> > > mem_access.vram_userfault_list,
> > > > > > > vram_userfault_link)
> > > > > > > +               xe_bo_runtime_pm_release_mmap_offset(bo);
> > > > > > > +
> > > > > > >         if (xe->d3cold.allowed) {
> > > > > > >                 err = xe_bo_evict_all(xe);
> > > > > > >                 if (err)
> > > > > > > --
> > > > > > > 2.25.1
> > > > > >