[Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap mappings on rpm suspend

Mon Sep 18 13:04:44 UTC 2023


On 05-09-2023 11:02, Ghimiray, Himal Prasad wrote:
> Hi Badal,
> 
>> -----Original Message-----
>> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Gupta,
>> Anshuman
>> Sent: 04 September 2023 12:03
>> To: Vivi, Rodrigo <rodrigo.vivi at intel.com>
>> Cc: intel-xe at lists.freedesktop.org; Auld, Matthew
>> <matthew.auld at intel.com>
>> Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap mappings on
>> rpm suspend
>>
>>
>>
>>> -----Original Message-----
>>> From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
>>> Sent: Saturday, September 2, 2023 2:46 AM
>>> To: Gupta, Anshuman <anshuman.gupta at intel.com>
>>> Cc: Nilawar, Badal <badal.nilawar at intel.com>; intel-
>>> xe at lists.freedesktop.org; Auld, Matthew <matthew.auld at intel.com>
>>> Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap mappings
>>> on rpm suspend
>>>
>>> On Fri, 2023-09-01 at 03:04 +0000, Gupta, Anshuman wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
>>>>> Sent: Friday, September 1, 2023 2:49 AM
>>>>> To: Gupta, Anshuman <anshuman.gupta at intel.com>
>>>>> Cc: Nilawar, Badal <badal.nilawar at intel.com>; intel-
>>>>> xe at lists.freedesktop.org; Auld, Matthew <matthew.auld at intel.com>
>>>>> Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap
>>>>> mappings on rpm suspend
>>>>>
>>>>> On Thu, Aug 31, 2023 at 01:23:40AM -0400, Gupta, Anshuman wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Vivi, Rodrigo <rodrigo.vivi at intel.com>
>>>>>>> Sent: Thursday, August 31, 2023 2:35 AM
>>>>>>> To: Nilawar, Badal <badal.nilawar at intel.com>
>>>>>>> Cc: Gupta, Anshuman <anshuman.gupta at intel.com>; intel-
>>>>>>> xe at lists.freedesktop.org; Auld, Matthew
>>> <matthew.auld at intel.com>
>>>>>>> Subject: Re: [Intel-xe] [RFC PATCH] drm/xe/dgfx: Release mmap
>>>>>>> mappings on rpm suspend
>>>>>>>
>>>>>>> On Mon, Aug 28, 2023 at 10:00:31PM +0530, Nilawar, Badal wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28-08-2023 17:46, Gupta, Anshuman wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Nilawar, Badal <badal.nilawar at intel.com>
>>>>>>>>>> Sent: Thursday, August 24, 2023 11:16 PM
>>>>>>>>>> To: intel-xe at lists.freedesktop.org
>>>>>>>>>> Cc: Gupta, Anshuman <anshuman.gupta at intel.com>; Auld,
>>>>> Matthew
>>>>>>>>>> <matthew.auld at intel.com>; Vivi, Rodrigo
>>>>>>>>>> <rodrigo.vivi at intel.com>
>>>>>>>>>> Subject: [RFC PATCH] drm/xe/dgfx: Release mmap mappings
>>>>>>>>>> on
>>>>> rpm
>>>>>>>>>> suspend
>>>>>>>>>>
>>>>>>>>>> Release all mmap mappings for all vram objects which are
>>>>>>>>>> associated with userfault such that, while pcie function
>>>>>>>>>> in D3hot, any access to memory mappings will raise a
>>>>>>>>>> userfault.
>>>>>>>>>>
>>>>>>>>>> Upon userfault, in order to access memory mappings, if
>>>>>>>>>> graphics function is in
>>>>>>>>>> D3 then runtime resume of dgpu will be triggered to
>>>>>>>>>> transition to
>>>>> D0.
>>>>>>>>> IMO we need a configurable threshold to control the
>>>>>>>>> behavior of mmap mappings Invalidation, if vram usages is
>>>>>>>>> crosses to certain threshold, disable the runtime PM for
>>>>>>>>> entire life time of mapping.
>>>>>>>> Agreed. Other option could be disable rpm on server descrete
>>>>>>>> graphics for entire life time of mapping. But mainitaning
>>>>>>>> threshold is more promising and gives control to user.
>>>>>>>
>>>>>>> what use cases we have here for this?
>>>>>>> I believe that for discrete we could entirely block rpm if we
>>>>>>> have display or if we have shared dma_buf. any other case we
>>>>>>> should handle?
>>>>>> If Discrete is used for display then anyhow display is going to
>>>>>> block runtime
>>>>> PM completely(be it PSR or Non-PSR).
>>>>>> The use case is with display turned off or with hybrid gpu use
>>>>>> case.
>>>>>> Currently on Xe we are missing to have mem access ref count on
>>>>>> mmap
>>>>> mapping and therefore mmap for vram bo is broken.
>>>>>> dma-buf will be also the use case in hybrid gpu use case.
>>>>>
>>>>> right. unfortunately we don't have the unmap callback.
>>>>> We could maybe get the reference on a new xe_gem_object_mmap and
>>>>> just release at xe_ttm_bo_destroy or maybe at
>>>>> xe_ttm_bo_delete_mem_notify?
>>>>>
>>>>> for the dma_buf we could probably hook the get/put to the
>>>>> attach/detach?
>>>> Hi Rodrigo,
>>>> attach/detach is already protected by mem access get/put but in my
>>>> opinion  that will going to burn more power on dgpu in hybrid case
>>>> as explained below, idle display on (display pipeline from igpu and
>>>> framebuffer is imported  from dgpu) is going to burn more power for
>>>> dgpu when it is completely idle (even igpu is idle, if it uses psr
>>>> panel to refresh then platform can go to s0ix).
>>>> Is above behavior expected ?  if not then we will need this patch.
>>>
>>> well, I'm trying to step back here and think about the use case.
>>> PSR is only a thing in eDP... so, it needs to be a laptop with both
>>> integrated and discrete where likely OEM plugged the eDP on iGFX to
>>> get better power savings.
>>>
>>> but then, the user wants to play some high-end gaming and use prime to
>>> fwd the discrete output to the eDP.
>>>
>>> But then user leaves the screen on idle most of the time instead of
>>> doing the game and burn power? :/
>>>
>>> likely this informed user could also revert the screen redirection
>>> when not really using the discrete for gaming? otherwise this user is
>>> burning power one way or another right?!
>>>
>>> Also, on this PSR case here, we would go idle allow the D3cold on
>>> discrete every idle second and do the trick to get the page-fault and
>>> enable everything to bring the 'primed' display back on time?
>> AFAIU most of cases like on DG2 wake-up from d3(d3hot/d3cold) will be
>> caused by migration(with explicit mem ref count),  or by respective
>> submission on render.
>> This use is about both d3hot/d3cold, not only specific to d3cold.
>> Thanks,
>> Anshuman Gupta.
>>>
>>> I can see so many issues coming out of this scenario, that I tend to
>>> prefer to simply avoid the RPM if user explicitly redirected the screen.
>>>
>>> We should keep it simple.
>>>
>>>> Thanks,
>>>> Anshuman Gupta.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Anshuman Gupta.
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Badal
>>>>>>>>> Thanks,
>>>>>>>>> Anshuman Gupta
>>>>>>>>>>
>>>>>>>>>> Cc: Matthew Auld <matthew.auld at intel.com>
>>>>>>>>>> Cc: Anshuman Gupta <anshuman.gupta at intel.com>
>>>>>>>>>> Signed-off-by: Badal Nilawar <badal.nilawar at intel.com>
>>>>>>>>>> ---
>>>>>>>>>>    drivers/gpu/drm/xe/xe_bo.c           | 53
>>>>>>> ++++++++++++++++++++++++++--
>>>>>>>>>>    drivers/gpu/drm/xe/xe_bo.h           |  2 ++
>>>>>>>>>>    drivers/gpu/drm/xe/xe_bo_types.h     |  6 ++++
>>>>>>>>>>    drivers/gpu/drm/xe/xe_device_types.h | 20 +++++++++++
>>>>>>>>>>    drivers/gpu/drm/xe/xe_pm.c           |  7 ++++
>>>>>>>>>>    5 files changed, 85 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c
>>>>>>>>>> b/drivers/gpu/drm/xe/xe_bo.c index
>>>>>>>>>> 1ab682d61e3c..4192bfcd8013
>>>>>>>>>> 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>>>>>>>>> @@ -776,6 +776,18 @@ static int xe_bo_move(struct
>>>>>>>>>> ttm_buffer_object *ttm_bo, bool evict,
>>>>>>>>>>                  dma_fence_put(fence);
>>>>>>>>>>          }
>>>>>>>>>>
>>>>>>>>>> +       /*
>>>>>>>>>> +       * TTM has already nuked the mmap for us (see
>>>>>>>>>> ttm_bo_unmap_virtual),
>>>>>>>>>> +       * so if we moved from VRAM make sure to unlink
>>>>>>>>>> this from
>>>>> the
>>>>>>>>>> userfault
>>>>>>>>>> +       * tracking.
>>>>>>>>>> +       */
>>>>>>>>>> +       if (mem_type_is_vram(old_mem_type)) {
>>>>>>>>>> +               spin_lock(&xe-
>>>>>>>>>>> mem_access.vram_userfault_lock);
>>>>>>>>>> +               if (!list_empty(&bo-
>>>>>>>>>>> vram_userfault_link))
>>>>>>>>>> +                       list_del_init(&bo-
>>>>>>>>>>> vram_userfault_link);
>>>>>>>>>> +               spin_unlock(&xe-
>>>>>> mem_access.vram_userfault_lock);
>>>>>>>>>> +       }
>>>>>>>>>> +
>>>>>>>>>>          xe_device_mem_access_put(xe);
>>>>>>>>>>          trace_printk("new_mem->mem_type=%d\n", new_mem-
>>>>>>>>>>> mem_type);
>>>>>>>>>>
>>>>>>>>>> @@ -1100,6 +1112,8 @@ static vm_fault_t
>>>>>>>>>> xe_gem_fault(struct vm_fault
>>>>>>>>>> *vmf)  {
>>>>>>>>>>          struct ttm_buffer_object *tbo = vmf->vma-
>>>>>> vm_private_data;
>>>>>>>>>>          struct drm_device *ddev = tbo->base.dev;
>>>>>>>>>> +       struct xe_bo *bo = ttm_to_xe_bo(tbo);
>>>>>>>>>> +       struct xe_device *xe = to_xe_device(ddev);
>>>>>>>>>>          vm_fault_t ret;
>>>>>>>>>>          int idx, r = 0;
>>>>>>>>>>
>>>>>>>>>> @@ -1107,9 +1121,10 @@ static vm_fault_t
>>>>>>>>>> xe_gem_fault(struct vm_fault
>>>>>>>>>> *vmf)
>>>>>>>>>>          if (ret)
>>>>>>>>>>                  return ret;
>>>>>>>>>>
>>>>>>>>>> -       if (drm_dev_enter(ddev, &idx)) {
>>>>>>>>>> -               struct xe_bo *bo = ttm_to_xe_bo(tbo);
>>>>>>>>>> +       if (tbo->resource->bus.is_iomem)
>>>>>>>>>> +               xe_device_mem_access_get(xe);
>>>>>>>>>>
>>>>>>>>>> +       if (drm_dev_enter(ddev, &idx)) {
>>>>>>>>>>                  trace_xe_bo_cpu_fault(bo);
>>>>>>>>>>
>>>>>>>>>>                  if (should_migrate_to_system(bo)) { @@ -
>>>>>>>>>> 1127,10
>>>>> +1142,25
>>>>>>> @@
>>>>>>>>>> static vm_fault_t xe_gem_fault(struct vm_fault
>>>>>>>>>> *vmf)
>>>>>>>>>>          } else {
>>>>>>>>>>                  ret = ttm_bo_vm_dummy_page(vmf,
>>>>>>>>>> vmf->vma-
>>>>>>>>>>> vm_page_prot);
>>>>>>>>>>          }
>>>>>>>>>> +
>>>>>>>>>>          if (ret == VM_FAULT_RETRY && !(vmf->flags &
>>>>>>>>>> FAULT_FLAG_RETRY_NOWAIT))
>>>>>>>>>> -               return ret;
>>>>>>>>>> +               goto out_rpm;
>>>>>>>>>> +       /*
>>>>>>>>>> +        * ttm_bo_vm_reserve() already has dma_resv_lock.
>>>>>>>>>> +        * vram_userfault_count is protected by dma_resv
>>>>>>>>>> lock and
>>>>>>>>>> +rpm
>>>>>>>>>> wakeref.
>>>>>>>>>> +        */
>>>>>>>>>> +       if (ret == VM_FAULT_NOPAGE &&
>>>>>>>>>> xe_device_mem_access_ongoing(xe) && !bo-
>>>>>> vram_userfault_count)
>>>>>>> {
>>>>>>>>>> +               bo->vram_userfault_count = 1;
>>>>>>>>>> +               spin_lock(&xe-
>>>>>>>>>>> mem_access.vram_userfault_lock);
>>>>>>>>>> +               list_add(&bo->vram_userfault_link, &xe-
>>>>>>>>>>> mem_access.vram_userfault_list);
>>>>>>>>>> +               spin_unlock(&xe-
>>>>>> mem_access.vram_userfault_lock);
>>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> +XE_WARN_ON(!tbo->resource->bus.is_iomem);
>>>>>>>>>> +       }
>>>>>>>>>>          dma_resv_unlock(tbo->base.resv);
>>>>>>>>>> +out_rpm:
>>>>>>>>>> +       if(tbo->resource->bus.is_iomem &&
>>>>>>>>>> xe_device_mem_access_ongoing(xe))
>>>>>>>>>> +               xe_device_mem_access_put(xe);
>>>>>>>>>>          return ret;
>>>>>>>>>>    }
>>>>>>>>>>
>>>>>>>>>> @@ -2108,6 +2138,23 @@ int xe_bo_dumb_create(struct
>>>>>>>>>> drm_file *file_priv,
>>>>>>>>>>          return err;
>>>>>>>>>>    }
>>>>>>>>>>
>>>>>>>>>> +void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo
>>>>> *bo) {
>>>>>>>>>> +       struct ttm_buffer_object *tbo = &bo->ttm;
>>>>>>>>>> +       struct ttm_device *bdev = tbo->bdev;
>>>>>>>>>> +
>>>>>>>>>> +       drm_vma_node_unmap(&tbo->base.vma_node, bdev-
>>>>>>>>>>> dev_mapping);
>>>>>>>>>> +
>>>>>>>>>> +       /*
>>>>>>>>>> +        * We have exclusive access here via runtime
>>>>>>>>>> suspend. All
>>>>>>>>>> +other
>>>>>>>>>> callers
>>>>>>>>>> +        * must first grab the rpm wakeref.
>>>>>>>>>> +        */
>>>>>>>>>> +       XE_WARN_ON(!bo->vram_userfault_count);
>>>>>>>>>> +       list_del(&bo->vram_userfault_link);
>>>>>>>>>> +       bo->vram_userfault_count = 0; }
>>>>>>>>>> +
>>>>>>>>>> +
>>>>>>>>>>    #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
>>>>>>>>>>    #include "tests/xe_bo.c"
>>>>>>>>>>    #endif
>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.h
>>>>>>>>>> b/drivers/gpu/drm/xe/xe_bo.h index
>>>>>>>>>> 0823dda0f31b..6b86f326c700
>>>>>>>>>> 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_bo.h
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>>>>>>>>>> @@ -247,6 +247,8 @@ int xe_gem_create_ioctl(struct
>>>>>>>>>> drm_device *dev, void *data,
>>>>>>>>>>                          struct drm_file *file);
>>>>>>>>>>    int xe_gem_mmap_offset_ioctl(struct drm_device *dev,
>>>>>>>>>> void
>>>>> *data,
>>>>>>>>>>                               struct drm_file *file);
>>>>>>>>>> +void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo
>>>>> *bo);
>>>>>>>>>> +
>>>>>>>>>>    int xe_bo_dumb_create(struct drm_file *file_priv,
>>>>>>>>>>                        struct drm_device *dev,
>>>>>>>>>>                        struct drm_mode_create_dumb
>>>>>>>>>> *args); diff --git a/drivers/gpu/drm/xe/xe_bo_types.h
>>>>>>>>>> b/drivers/gpu/drm/xe/xe_bo_types.h
>>>>>>>>>> index f6ee920303af..cdca91a378c4 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_bo_types.h
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
>>>>>>>>>> @@ -68,6 +68,12 @@ struct xe_bo {
>>>>>>>>>>          struct llist_node freed;
>>>>>>>>>>          /** @created: Whether the bo has passed initial
>>>>>>>>>> creation */
>>>>>>>>>>          bool created;
>>>>>>>>>> +       /**
>>>>>>>>>> +        * Whether the object is currently in fake
>>>>>>>>>> +offset
>>>>>>>>>> mmap
>>>>>>>>>> +backed by
>>>>>>>>>> vram.
>>>>>>>>>> +        */
>>>>>>>>>> +       unsigned int vram_userfault_count;
>>>>>>>>>> +       struct list_head vram_userfault_link;
>>>>>>>>>> +
>>>>>>>>>>    };
>>>>>>>>>>
>>>>>>>>>>    #endif
>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
>>>>>>>>>> b/drivers/gpu/drm/xe/xe_device_types.h
>>>>>>>>>> index 750e1f0d3339..c345fb483af1 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>>>>>>>>>> @@ -328,6 +328,26 @@ struct xe_device {
>>>>>>>>>>          struct {
>>>>>>>>>>                  /** @ref: ref count of memory accesses
>>>>>>>>>> */
>>>>>>>>>>                  atomic_t ref;
>>>>>>>>>> +               /*
>>>>>>>>>> +                *  Protects access to vram usefault
>>>>>>>>>> list.
>>>>>>>>>> +                *  It is required, if we are outside of
>>>>>>>>>> the runtime
>>>>>>>>>> +suspend
>>>>>>>>>> path,
>>>>>>>>>> +                *  access to @vram_userfault_list
>>>>>>>>>> requires always
>>>>> first
>>>>>>>>>> grabbing the
>>>>>>>>>> +                *  runtime pm, to ensure we can't race
>>>>>>>>>> against
>>>>> runtime
>>>>>>>>>> suspend.
>>>>>>>>>> +                *  Once we have that we also need to
>>>>>>>>>> grab
>>>>>>>>>> @vram_userfault_lock,
>>>>>>>>>> +                *  at which point we have exclusive
>>>>>>>>>> access.
>>>>>>>>>> +                *  The runtime suspend path is special
>>>>>>>>>> since it
>>>>> doesn't
>>>>>>>>>> +really
>>>>>>>>>> hold any locks,
>>>>>>>>>> +                *  but instead has exclusive access by
>>>>>>>>>> virtue of all
>>>>> other
>>>>>>>>>> accesses requiring
>>>>>>>>>> +                *  holding the runtime pm wakeref.
>>>>>>>>>> +                */
>>>>>>>>>> +               spinlock_t vram_userfault_lock;
>>>>>>>>>> +
>>>>>>>>>> +               /*
>>>>>>>>>> +                *  Keep list of userfaulted gem obj,
>>>>>>>>>> which require to
>>>>>>>>>> +release
>>>>>>>>>> their
>>>>>>>>>> +                *  mmap mappings at runtime suspend
>>>>>>>>>> path.
>>>>>>>>>> +                */
>>>>>>>>>> +               struct list_head vram_userfault_list;
>>>>>>>>>> +
>>>>>>>>>> +               bool vram_userfault_ongoing;
>>>>>>>>>>          } mem_access;
>>>>>>>>>>
>>>>>>>>>>          /** @d3cold: Encapsulate d3cold related stuff */
>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_pm.c
>>>>>>>>>> b/drivers/gpu/drm/xe/xe_pm.c index
>>>>>>>>>> 0f06d8304e17..51cde1db930e 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_pm.c
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_pm.c
>>>>>>>>>> @@ -172,6 +172,8 @@ void xe_pm_init(struct xe_device
>>>>>>>>>> *xe)
>>>>>>>>>>          }
>>>>>>>>>>
>>>>>>>>>>          xe_pm_runtime_init(xe);
>>>>>>>>>> +       INIT_LIST_HEAD(&xe-
>>>>>>>>>>> mem_access.vram_userfault_list);
>>>>>>>>>> +       spin_lock_init(&xe-
>>>>>>>>>>> mem_access.vram_userfault_lock);
>>>>>>>>>>    }
>>>>>>>>>>
>>>>>>>>>>    void xe_pm_runtime_fini(struct xe_device *xe) @@
>>>>>>>>>> -205,6
>>>>>>>>>> +207,7 @@ struct task_struct
>>>>>>>>>> *xe_pm_read_callback_task(struct xe_device
>>>>>>>>>> *xe)
>>>>>>>>>>
>>>>>>>>>>    int xe_pm_runtime_suspend(struct xe_device *xe)  {
>>>>>>>>>> +       struct xe_bo *bo, *on;
>>>>>>>>>>          struct xe_gt *gt;
>>>>>>>>>>          u8 id;
>>>>>>>>>>          int err = 0;
>>>>>>>>>> @@ -238,6 +241,10 @@ int xe_pm_runtime_suspend(struct
>>>>>>>>>> xe_device
>>>>>>> *xe)
>>>>>>>>>>           */
>>>>>>>>>>
>>>>>          lock_map_acquire(&xe_device_mem_access_lockdep_map);
>>>>>>>>>>
>>>>>>>>>> +       list_for_each_entry_safe(bo, on,
>>>>>>>>>> +                                &xe-
>>>>>> mem_access.vram_userfault_list,
>>>>>>>>>> vram_userfault_link)
>>>>>>>>>> +
>>>>>>>>>> +xe_bo_runtime_pm_release_mmap_offset(bo);
> Why don’t we always evict the mmap buffer object ?
When does these BOs will be restored? Especially the non-pinned one.

Regards,
Badal
> BR
> Himal
>>>>>>>>>> +
>>>>>>>>>>          if (xe->d3cold.allowed) {
>>>>>>>>>>                  err = xe_bo_evict_all(xe);
>>>>>>>>>>                  if (err)
>>>>>>>>>> --
>>>>>>>>>> 2.25.1
>>>>>>>>>
>