[Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap mappings on rpm suspend

Mon Sep 4 06:33:05 UTC 2023


> -----Original Message-----
> From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> Sent: Saturday, September 2, 2023 2:46 AM
> To: Gupta, Anshuman <anshuman.gupta at intel.com>
> Cc: Nilawar, Badal <badal.nilawar at intel.com>; intel-
> xe at lists.freedesktop.org; Auld, Matthew <matthew.auld at intel.com>
> Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap mappings
> on rpm suspend
> 
> On Fri, 2023-09-01 at 03:04 +0000, Gupta, Anshuman wrote:
> >
> >
> > > -----Original Message-----
> > > From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > > Sent: Friday, September 1, 2023 2:49 AM
> > > To: Gupta, Anshuman <anshuman.gupta at intel.com>
> > > Cc: Nilawar, Badal <badal.nilawar at intel.com>; intel-
> > > xe at lists.freedesktop.org; Auld, Matthew <matthew.auld at intel.com>
> > > Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap
> > > mappings on rpm suspend
> > >
> > > On Thu, Aug 31, 2023 at 01:23:40AM -0400, Gupta, Anshuman wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> > > > > Sent: Thursday, August 31, 2023 2:35 AM
> > > > > To: Nilawar, Badal <badal.nilawar at intel.com>
> > > > > Cc: Gupta, Anshuman <anshuman.gupta at intel.com>; intel-
> > > > > xe at lists.freedesktop.org; Auld, Matthew
> <matthew.auld at intel.com>
> > > > > Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap
> > > > > mappings on rpm suspend
> > > > >
> > > > > On Mon, Aug 28, 2023 at 10:00:31PM +0530, Nilawar, Badal wrote:
> > > > > >
> > > > > >
> > > > > > On 28-08-2023 17:46, Gupta, Anshuman wrote:
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Nilawar, Badal <badal.nilawar at intel.com>
> > > > > > > > Sent: Thursday, August 24, 2023 11:16 PM
> > > > > > > > To: intel-xe at lists.freedesktop.org
> > > > > > > > Cc: Gupta, Anshuman <anshuman.gupta at intel.com>; Auld,
> > > Matthew
> > > > > > > > <matthew.auld at intel.com>; Vivi, Rodrigo
> > > > > > > > <rodrigo.vivi at intel.com>
> > > > > > > > Subject: [RFC PATCH] drm/xe/dgfx: Release mmap mappings on
> > > rpm
> > > > > > > > suspend
> > > > > > > >
> > > > > > > > Release all mmap mappings for all vram objects which are
> > > > > > > > associated with userfault such that, while pcie function
> > > > > > > > in D3hot, any access to memory mappings will raise a
> > > > > > > > userfault.
> > > > > > > >
> > > > > > > > Upon userfault, in order to access memory mappings, if
> > > > > > > > graphics function is in
> > > > > > > > D3 then runtime resume of dgpu will be triggered to
> > > > > > > > transition to
> > > D0.
> > > > > > > IMO we need a configurable threshold to control the behavior
> > > > > > > of mmap mappings Invalidation, if vram usages is crosses to
> > > > > > > certain threshold, disable the runtime PM for entire life
> > > > > > > time of mapping.
> > > > > > Agreed. Other option could be disable rpm on server descrete
> > > > > > graphics for entire life time of mapping. But mainitaning
> > > > > > threshold is more promising and gives control to user.
> > > > >
> > > > > what use cases we have here for this?
> > > > > I believe that for discrete we could entirely block rpm if we
> > > > > have display or if we have shared dma_buf. any other case we
> > > > > should handle?
> > > > If Discrete is used for display then anyhow display is going to
> > > > block runtime
> > > PM completely(be it PSR or Non-PSR).
> > > > The use case is with display turned off or with hybrid gpu use
> > > > case.
> > > > Currently on Xe we are missing to have mem access ref count on
> > > > mmap
> > > mapping and therefore mmap for vram bo is broken.
> > > > dma-buf will be also the use case in hybrid gpu use case.
> > >
> > > right. unfortunately we don't have the unmap callback.
> > > We could maybe get the reference on a new xe_gem_object_mmap and
> > > just release at xe_ttm_bo_destroy or maybe at
> > > xe_ttm_bo_delete_mem_notify?
> > >
> > > for the dma_buf we could probably hook the get/put to the
> > > attach/detach?
> > Hi Rodrigo,
> > attach/detach is already protected by mem access get/put but in my
> > opinion  that will going to burn more power on dgpu in hybrid case as
> > explained below, idle display on (display pipeline from igpu and
> > framebuffer is imported  from dgpu) is going to burn more power for
> > dgpu when it is completely idle (even igpu is idle, if it uses psr
> > panel to refresh then platform can go to s0ix).
> > Is above behavior expected ?  if not then we will need this patch.
> 
> well, I'm trying to step back here and think about the use case.
> PSR is only a thing in eDP... so, it needs to be a laptop with both integrated
> and discrete where likely OEM plugged the eDP on iGFX to get better power
> savings.
> 
> but then, the user wants to play some high-end gaming and use prime to fwd
> the discrete output to the eDP.
> 
> But then user leaves the screen on idle most of the time instead of doing the
> game and burn power? :/
> 
> likely this informed user could also revert the screen redirection when not
> really using the discrete for gaming? otherwise this user is burning power
> one way or another right?!
> 
> Also, on this PSR case here, we would go idle allow the D3cold on discrete
> every idle second and do the trick to get the page-fault and enable
> everything to bring the 'primed' display back on time?
AFAIU most of cases like on DG2 wake-up from d3(d3hot/d3cold) will be caused by migration(with explicit mem ref count),  or by respective submission on render.
This use is about both d3hot/d3cold, not only specific to d3cold.
Thanks,
Anshuman Gupta.
> 
> I can see so many issues coming out of this scenario, that I tend to prefer to
> simply avoid the RPM if user explicitly redirected the screen.
> 
> We should keep it simple.
> 
> > Thanks,
> > Anshuman Gupta.
> > >
> > > >
> > > > Thanks,
> > > > Anshuman Gupta.
> > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Badal
> > > > > > > Thanks,
> > > > > > > Anshuman Gupta
> > > > > > > >
> > > > > > > > Cc: Matthew Auld <matthew.auld at intel.com>
> > > > > > > > Cc: Anshuman Gupta <anshuman.gupta at intel.com>
> > > > > > > > Signed-off-by: Badal Nilawar <badal.nilawar at intel.com>
> > > > > > > > ---
> > > > > > > >   drivers/gpu/drm/xe/xe_bo.c           | 53
> > > > > ++++++++++++++++++++++++++--
> > > > > > > >   drivers/gpu/drm/xe/xe_bo.h           |  2 ++
> > > > > > > >   drivers/gpu/drm/xe/xe_bo_types.h     |  6 ++++
> > > > > > > >   drivers/gpu/drm/xe/xe_device_types.h | 20 +++++++++++
> > > > > > > >   drivers/gpu/drm/xe/xe_pm.c           |  7 ++++
> > > > > > > >   5 files changed, 85 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > b/drivers/gpu/drm/xe/xe_bo.c index
> > > > > > > > 1ab682d61e3c..4192bfcd8013
> > > > > > > > 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > @@ -776,6 +776,18 @@ static int xe_bo_move(struct
> > > > > > > > ttm_buffer_object *ttm_bo, bool evict,
> > > > > > > >                 dma_fence_put(fence);
> > > > > > > >         }
> > > > > > > >
> > > > > > > > +       /*
> > > > > > > > +       * TTM has already nuked the mmap for us (see
> > > > > > > > ttm_bo_unmap_virtual),
> > > > > > > > +       * so if we moved from VRAM make sure to unlink
> > > > > > > > this from
> > > the
> > > > > > > > userfault
> > > > > > > > +       * tracking.
> > > > > > > > +       */
> > > > > > > > +       if (mem_type_is_vram(old_mem_type)) {
> > > > > > > > +               spin_lock(&xe-
> > > > > > > > >mem_access.vram_userfault_lock);
> > > > > > > > +               if (!list_empty(&bo-
> > > > > > > > >vram_userfault_link))
> > > > > > > > +                       list_del_init(&bo-
> > > > > > > > >vram_userfault_link);
> > > > > > > > +               spin_unlock(&xe-
> > > > mem_access.vram_userfault_lock);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > >         xe_device_mem_access_put(xe);
> > > > > > > >         trace_printk("new_mem->mem_type=%d\n", new_mem-
> > > > > > > > > mem_type);
> > > > > > > >
> > > > > > > > @@ -1100,6 +1112,8 @@ static vm_fault_t
> > > > > > > > xe_gem_fault(struct vm_fault
> > > > > > > > *vmf)  {
> > > > > > > >         struct ttm_buffer_object *tbo = vmf->vma-
> > > > vm_private_data;
> > > > > > > >         struct drm_device *ddev = tbo->base.dev;
> > > > > > > > +       struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > > > > > > > +       struct xe_device *xe = to_xe_device(ddev);
> > > > > > > >         vm_fault_t ret;
> > > > > > > >         int idx, r = 0;
> > > > > > > >
> > > > > > > > @@ -1107,9 +1121,10 @@ static vm_fault_t
> > > > > > > > xe_gem_fault(struct vm_fault
> > > > > > > > *vmf)
> > > > > > > >         if (ret)
> > > > > > > >                 return ret;
> > > > > > > >
> > > > > > > > -       if (drm_dev_enter(ddev, &idx)) {
> > > > > > > > -               struct xe_bo *bo = ttm_to_xe_bo(tbo);
> > > > > > > > +       if (tbo->resource->bus.is_iomem)
> > > > > > > > +               xe_device_mem_access_get(xe);
> > > > > > > >
> > > > > > > > +       if (drm_dev_enter(ddev, &idx)) {
> > > > > > > >                 trace_xe_bo_cpu_fault(bo);
> > > > > > > >
> > > > > > > >                 if (should_migrate_to_system(bo)) { @@ -
> > > > > > > > 1127,10
> > > +1142,25
> > > > > @@
> > > > > > > > static vm_fault_t xe_gem_fault(struct vm_fault
> > > > > > > > *vmf)
> > > > > > > >         } else {
> > > > > > > >                 ret = ttm_bo_vm_dummy_page(vmf, vmf->vma-
> > > > > > > > > vm_page_prot);
> > > > > > > >         }
> > > > > > > > +
> > > > > > > >         if (ret == VM_FAULT_RETRY && !(vmf->flags &
> > > > > > > > FAULT_FLAG_RETRY_NOWAIT))
> > > > > > > > -               return ret;
> > > > > > > > +               goto out_rpm;
> > > > > > > > +       /*
> > > > > > > > +        * ttm_bo_vm_reserve() already has dma_resv_lock.
> > > > > > > > +        * vram_userfault_count is protected by dma_resv
> > > > > > > > lock and
> > > > > > > > +rpm
> > > > > > > > wakeref.
> > > > > > > > +        */
> > > > > > > > +       if (ret == VM_FAULT_NOPAGE &&
> > > > > > > > xe_device_mem_access_ongoing(xe) && !bo-
> > > > vram_userfault_count)
> > > > > {
> > > > > > > > +               bo->vram_userfault_count = 1;
> > > > > > > > +               spin_lock(&xe-
> > > > > > > > >mem_access.vram_userfault_lock);
> > > > > > > > +               list_add(&bo->vram_userfault_link, &xe-
> > > > > > > > > mem_access.vram_userfault_list);
> > > > > > > > +               spin_unlock(&xe-
> > > > mem_access.vram_userfault_lock);
> > > > > > > >
> > > > > > > > +               XE_WARN_ON(!tbo->resource->bus.is_iomem);
> > > > > > > > +       }
> > > > > > > >         dma_resv_unlock(tbo->base.resv);
> > > > > > > > +out_rpm:
> > > > > > > > +       if(tbo->resource->bus.is_iomem &&
> > > > > > > > xe_device_mem_access_ongoing(xe))
> > > > > > > > +               xe_device_mem_access_put(xe);
> > > > > > > >         return ret;
> > > > > > > >   }
> > > > > > > >
> > > > > > > > @@ -2108,6 +2138,23 @@ int xe_bo_dumb_create(struct
> > > > > > > > drm_file *file_priv,
> > > > > > > >         return err;
> > > > > > > >   }
> > > > > > > >
> > > > > > > > +void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo
> > > *bo) {
> > > > > > > > +       struct ttm_buffer_object *tbo = &bo->ttm;
> > > > > > > > +       struct ttm_device *bdev = tbo->bdev;
> > > > > > > > +
> > > > > > > > +       drm_vma_node_unmap(&tbo->base.vma_node, bdev-
> > > > > > > > > dev_mapping);
> > > > > > > > +
> > > > > > > > +       /*
> > > > > > > > +        * We have exclusive access here via runtime
> > > > > > > > suspend. All
> > > > > > > > +other
> > > > > > > > callers
> > > > > > > > +        * must first grab the rpm wakeref.
> > > > > > > > +        */
> > > > > > > > +       XE_WARN_ON(!bo->vram_userfault_count);
> > > > > > > > +       list_del(&bo->vram_userfault_link);
> > > > > > > > +       bo->vram_userfault_count = 0; }
> > > > > > > > +
> > > > > > > > +
> > > > > > > >   #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
> > > > > > > >   #include "tests/xe_bo.c"
> > > > > > > >   #endif
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.h
> > > > > > > > b/drivers/gpu/drm/xe/xe_bo.h index
> > > > > > > > 0823dda0f31b..6b86f326c700
> > > > > > > > 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > > > > > > @@ -247,6 +247,8 @@ int xe_gem_create_ioctl(struct
> > > > > > > > drm_device *dev, void *data,
> > > > > > > >                         struct drm_file *file);
> > > > > > > >   int xe_gem_mmap_offset_ioctl(struct drm_device *dev,
> > > > > > > > void
> > > *data,
> > > > > > > >                              struct drm_file *file);
> > > > > > > > +void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo
> > > *bo);
> > > > > > > > +
> > > > > > > >   int xe_bo_dumb_create(struct drm_file *file_priv,
> > > > > > > >                       struct drm_device *dev,
> > > > > > > >                       struct drm_mode_create_dumb *args);
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > > > > b/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > > > > index f6ee920303af..cdca91a378c4 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > > > > @@ -68,6 +68,12 @@ struct xe_bo {
> > > > > > > >         struct llist_node freed;
> > > > > > > >         /** @created: Whether the bo has passed initial
> > > > > > > > creation */
> > > > > > > >         bool created;
> > > > > > > > +       /**
> > > > > > > > +        * Whether the object is currently in fake offset
> > > > > > > > mmap
> > > > > > > > +backed by
> > > > > > > > vram.
> > > > > > > > +        */
> > > > > > > > +       unsigned int vram_userfault_count;
> > > > > > > > +       struct list_head vram_userfault_link;
> > > > > > > > +
> > > > > > > >   };
> > > > > > > >
> > > > > > > >   #endif
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > > b/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > > index 750e1f0d3339..c345fb483af1 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > > > > > > @@ -328,6 +328,26 @@ struct xe_device {
> > > > > > > >         struct {
> > > > > > > >                 /** @ref: ref count of memory accesses */
> > > > > > > >                 atomic_t ref;
> > > > > > > > +               /*
> > > > > > > > +                *  Protects access to vram usefault
> > > > > > > > list.
> > > > > > > > +                *  It is required, if we are outside of
> > > > > > > > the runtime
> > > > > > > > +suspend
> > > > > > > > path,
> > > > > > > > +                *  access to @vram_userfault_list
> > > > > > > > requires always
> > > first
> > > > > > > > grabbing the
> > > > > > > > +                *  runtime pm, to ensure we can't race
> > > > > > > > against
> > > runtime
> > > > > > > > suspend.
> > > > > > > > +                *  Once we have that we also need to
> > > > > > > > grab
> > > > > > > > @vram_userfault_lock,
> > > > > > > > +                *  at which point we have exclusive
> > > > > > > > access.
> > > > > > > > +                *  The runtime suspend path is special
> > > > > > > > since it
> > > doesn't
> > > > > > > > +really
> > > > > > > > hold any locks,
> > > > > > > > +                *  but instead has exclusive access by
> > > > > > > > virtue of all
> > > other
> > > > > > > > accesses requiring
> > > > > > > > +                *  holding the runtime pm wakeref.
> > > > > > > > +                */
> > > > > > > > +               spinlock_t vram_userfault_lock;
> > > > > > > > +
> > > > > > > > +               /*
> > > > > > > > +                *  Keep list of userfaulted gem obj,
> > > > > > > > which require to
> > > > > > > > +release
> > > > > > > > their
> > > > > > > > +                *  mmap mappings at runtime suspend
> > > > > > > > path.
> > > > > > > > +                */
> > > > > > > > +               struct list_head vram_userfault_list;
> > > > > > > > +
> > > > > > > > +               bool vram_userfault_ongoing;
> > > > > > > >         } mem_access;
> > > > > > > >
> > > > > > > >         /** @d3cold: Encapsulate d3cold related stuff */
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_pm.c
> > > > > > > > b/drivers/gpu/drm/xe/xe_pm.c index
> > > > > > > > 0f06d8304e17..51cde1db930e 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_pm.c
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_pm.c
> > > > > > > > @@ -172,6 +172,8 @@ void xe_pm_init(struct xe_device *xe)
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         xe_pm_runtime_init(xe);
> > > > > > > > +       INIT_LIST_HEAD(&xe-
> > > > > > > > >mem_access.vram_userfault_list);
> > > > > > > > +       spin_lock_init(&xe-
> > > > > > > > >mem_access.vram_userfault_lock);
> > > > > > > >   }
> > > > > > > >
> > > > > > > >   void xe_pm_runtime_fini(struct xe_device *xe) @@ -205,6
> > > > > > > > +207,7 @@ struct task_struct
> > > > > > > > *xe_pm_read_callback_task(struct xe_device
> > > > > > > > *xe)
> > > > > > > >
> > > > > > > >   int xe_pm_runtime_suspend(struct xe_device *xe)  {
> > > > > > > > +       struct xe_bo *bo, *on;
> > > > > > > >         struct xe_gt *gt;
> > > > > > > >         u8 id;
> > > > > > > >         int err = 0;
> > > > > > > > @@ -238,6 +241,10 @@ int xe_pm_runtime_suspend(struct
> > > > > > > > xe_device
> > > > > *xe)
> > > > > > > >          */
> > > > > > > >
> > >         lock_map_acquire(&xe_device_mem_access_lockdep_map);
> > > > > > > >
> > > > > > > > +       list_for_each_entry_safe(bo, on,
> > > > > > > > +                                &xe-
> > > > mem_access.vram_userfault_list,
> > > > > > > > vram_userfault_link)
> > > > > > > > +               xe_bo_runtime_pm_release_mmap_offset(bo);
> > > > > > > > +
> > > > > > > >         if (xe->d3cold.allowed) {
> > > > > > > >                 err = xe_bo_evict_all(xe);
> > > > > > > >                 if (err)
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > >