<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
<div class="ContentPasted0">>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com></div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Informal commit message for now.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> I got a bit impatient and curious to see if the idea we discussed would</div>
<div class="ContentPasted0">>>> work so sketched something out. I think it is what I was describing back</div>
<div class="ContentPasted0">>>> then..</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> So high level idea is to teach the driver what caching modes are hidden</div>
<div class="ContentPasted0">>>> behind PAT indices. Given you already had that in static tables, if we</div>
<div class="ContentPasted0">>>> just turn the tables a bit around and add a driver abstraction of caching</div>
<div class="ContentPasted0">>>> modes this is what happens:</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  * We can lose the ugly runtime i915_gem_get_pat_index.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  * We can have a smarter i915_gem_object_has_cache_level, which now can</div>
<div class="ContentPasted0">>>>    use the above mentioned table to understand the caching modes and so</div>
<div class="ContentPasted0">>>>    does not have to pessimistically return true for _any_ input when user</div>
<div class="ContentPasted0">>>>    has set the PAT index. This may improve things even for MTL.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  * We can simplify the debugfs printout to be platform agnostic.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  * We are perhaps opening the door to un-regress the dodgy addition</div>
<div class="ContentPasted0">>>>    made to i915_gem_object_can_bypass_llc? See QQQ/FIXME in the patch.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> I hope I did not forget anything, but anyway, please have a read and see</div>
<div class="ContentPasted0">>>> what you think. I think it has potential.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Proper commit message can come later.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com></div>
<div class="ContentPasted0">>>> Cc: Fei Yang <fei.yang@intel.com></div>
<div class="ContentPasted0">>>> ---</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/Makefile                 |   1 +</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/display/intel_dpt.c      |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/display/intel_fb_pin.c   |   2 +-</div>
<div class="ContentPasted0">>>>  .../drm/i915/display/intel_plane_initial.c    |   4 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_domain.c    |  66 +++++-----</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_domain.h    |   7 +-</div>
<div class="ContentPasted0">>>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  13 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   6 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  10 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_object.c    | 116 +++++++++--------</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   9 +-</div>
<div class="ContentPasted0">>>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 117 +++---------------</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  10 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  13 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  43 ++++---</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-</div>
<div class="ContentPasted0">>>>  .../drm/i915/gem/selftests/huge_gem_object.c  |   6 +-</div>
<div class="ContentPasted0">>>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   8 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   4 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  19 +--</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c          |  33 ++---</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c     |   4 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/intel_gtt.c           |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/intel_migrate.c       |  11 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   6 +-</div>
<div class="ContentPasted0">>>>  .../gpu/drm/i915/gt/intel_ring_submission.c   |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/intel_timeline.c      |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/selftest_migrate.c    |   9 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/selftest_reset.c      |  14 +--</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/selftest_timeline.c   |   1 +</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/selftest_tlb.c        |   5 +-</div>
<div class="ContentPasted0">>>>  .../gpu/drm/i915/gt/selftest_workarounds.c    |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   8 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_cache.c             |  72 +++++++++++</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_cache.h             |  49 ++++++++</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_debugfs.c           |  65 +++-------</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_driver.c            |   5 +</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_drv.h               |   3 +</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_gem.c               |  21 +---</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_gpu_error.c         |   7 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_pci.c               |  83 +++++++------</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/i915_perf.c              |   2 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/intel_device_info.h      |   6 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/selftests/i915_gem.c     |   5 +-</div>
<div class="ContentPasted0">>>>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |   8 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  13 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |   2 +-</div>
<div class="ContentPasted0">>>>  .../drm/i915/selftests/intel_memory_region.c  |   4 +-</div>
<div class="ContentPasted0">>>>  .../gpu/drm/i915/selftests/mock_gem_device.c  |  12 +-</div>
<div class="ContentPasted0">>>>  drivers/gpu/drm/i915/selftests/mock_region.c  |   2 +-</div>
<div class="ContentPasted0">>>>  54 files changed, 440 insertions(+), 485 deletions(-)</div>
<div class="ContentPasted0">>>>  create mode 100644 drivers/gpu/drm/i915/i915_cache.c</div>
<div class="ContentPasted0">>>>  create mode 100644 drivers/gpu/drm/i915/i915_cache.h</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile</div>
<div class="ContentPasted0">>>> index 2be9dd960540..2c3da8f0c78e 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/Makefile</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/Makefile</div>
<div class="ContentPasted0">>>> @@ -30,6 +30,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)</div>
<div class="ContentPasted0">>>>  # core driver code</div>
<div class="ContentPasted0">>>>  i915-y += i915_driver.o \</div>
<div class="ContentPasted0">>>>            i915_drm_client.o \</div>
<div class="ContentPasted0">>>> +         i915_cache.o \</div>
<div class="ContentPasted0">>>>            i915_config.o \</div>
<div class="ContentPasted0">>>>            i915_getparam.o \</div>
<div class="ContentPasted0">>>>            i915_ioctl.o \</div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>> index 7c5fddb203ba..5baf0d27a288 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>> @@ -268,7 +268,7 @@ intel_dpt_create(struct intel_framebuffer *fb)</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>          ret = i915_gem_object_lock_interruptible(dpt_obj, NULL);</div>
<div class="ContentPasted0">>>>          if (!ret) {</div>
<div class="ContentPasted0">>>> -               ret = i915_gem_object_set_cache_level(dpt_obj, I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>> +               ret = i915_gem_object_set_cache(dpt_obj, I915_CACHE(UC));</div>
<div class="ContentPasted0">>>>                  i915_gem_object_unlock(dpt_obj);</div>
<div class="ContentPasted0">>>>          }</div>
<div class="ContentPasted0">>>>          if (ret) {</div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c</div>
<div class="ContentPasted0">>>> index fffd568070d4..dcf54b354a74 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/display/intel_fb_pin.c</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c</div>
<div class="ContentPasted0">>>> @@ -66,7 +66,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb,</div>
<div class="ContentPasted0">>>>                                  continue;</div>
<div class="ContentPasted0">>>>                  }</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> -               ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>> +               ret = i915_gem_object_set_cache(obj, I915_CACHE(UC));</div>
<div class="ContentPasted0">>>>                  if (ret)</div>
<div class="ContentPasted0">>>>                          continue;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c</div>
<div class="ContentPasted0">>>> index 736072a8b2b0..9988f6562392 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c</div>
<div class="ContentPasted0">>>> @@ -121,8 +121,8 @@ initial_plane_vma(struct drm_i915_private *i915,</div>
<div class="ContentPasted0">>>>           * cache_level during fbdev initialization. The</div>
<div class="ContentPasted0">>>>           * unbind there would get stuck waiting for rcu.</div>
<div class="ContentPasted0">>>>           */</div>
<div class="ContentPasted0">>>> -       i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?</div>
<div class="ContentPasted0">>>> -                                           I915_CACHE_WT : I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>> +       i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ? I915_CACHE(WT) :</div>
<div class="ContentPasted0">>>> +                                                               I915_CACHE(UC));</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>          switch (plane_config->tiling) {</div>
<div class="ContentPasted0">>>>          case I915_TILING_NONE:</div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>> index dfaaa8b66ac3..f899da2c622a 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>> @@ -8,6 +8,7 @@</div>
<div class="ContentPasted0">>>>  #include "display/intel_frontbuffer.h"</div>
<div class="ContentPasted0">>>>  #include "gt/intel_gt.h"</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> +#include "i915_cache.h"</div>
<div class="ContentPasted0">>>>  #include "i915_drv.h"</div>
<div class="ContentPasted0">>>>  #include "i915_gem_clflush.h"</div>
<div class="ContentPasted0">>>>  #include "i915_gem_domain.h"</div>
<div class="ContentPasted0">>>> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>          if (IS_DGFX(i915))</div>
<div class="ContentPasted0">>>>                  return false;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> -       /*</div>
<div class="ContentPasted0">>>> -        * For objects created by userspace through GEM_CREATE with pat_index</div>
<div class="ContentPasted0">>>> -        * set by set_pat extension, i915_gem_object_has_cache_level() will</div>
<div class="ContentPasted0">>>> -        * always return true, because the coherency of such object is managed</div>
<div class="ContentPasted0">>>> -        * by userspace. Othereise the call here would fall back to checking</div>
<div class="ContentPasted0">>>> -        * whether the object is un-cached or write-through.</div>
<div class="ContentPasted0">>>> -        */</div>
<div class="ContentPasted0">>>> -       return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>>> -                i915_gem_object_has_cache_level(obj, I915_CACHE_WT));</div>
<div class="ContentPasted0">>>> +       return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 &&</div>
<div class="ContentPasted0">>>> +              i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1;</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> This logic was changed for objects with pat index set by user here. It</div>
<div class="ContentPasted0">>> used to return false</div>
<div class="ContentPasted0">>> regardless the cache mode. But now it returns true if its cache mode is</div>
<div class="ContentPasted0">>> neither UC nor WT.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Yes, that was half of the motivation of the refactory. Before the PAT index series code was like this:</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>        return !(obj->cache_level == I915_CACHE_NONE ||</div>
<div class="ContentPasted0">>                 obj->cache_level == I915_CACHE_WT);</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> So kernel knew it needs to flush only if caching mode is neither UC or WT. With the PAT index series you changed it to:</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>        return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>                 i915_gem_object_has_cache_level(obj, I915_CACHE_WT));</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> And as i915_gem_object_has_cache_level was changed to always return true if PAT was set by user, that made the check meaningless for such objects.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">But the point is that the KMD should not flush the cache for objects with</div>
<div class="ContentPasted0">PAT set by user because UMD is handling the cache conherency. That is the</div>
<div class="ContentPasted0">design. Well doing so wouldn't cause functional issue, but it's unecessary</div>
<div class="ContentPasted0">and might have performance impact.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">> With my refactoring it becomes meaningful again and restores to the original behaviour. That's the intent at least.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>>>  bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>> @@ -255,9 +249,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)</div>
<div class="ContentPasted0">>>>  }</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  /**</div>
<div class="ContentPasted0">>>> - * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.</div>
<div class="ContentPasted0">>>> + * i915_gem_object_set_cache - Changes the cache-level of an object across all VMA.</div>
<div class="ContentPasted0">>>>   * @obj: object to act on</div>
<div class="ContentPasted0">>>> - * @cache_level: new cache level to set for the object</div>
<div class="ContentPasted0">>>> + * @cache: new caching mode to set for the object</div>
<div class="ContentPasted0">>>>   *</div>
<div class="ContentPasted0">>>>   * After this function returns, the object will be in the new cache-level</div>
<div class="ContentPasted0">>>>   * across all GTT and the contents of the backing storage will be coherent,</div>
<div class="ContentPasted0">>>> @@ -269,19 +263,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)</div>
<div class="ContentPasted0">>>>   * cache coherency) and all non-MOCS GPU access will also be uncached so</div>
<div class="ContentPasted0">>>>   * that all direct access to the scanout remains coherent.</div>
<div class="ContentPasted0">>>>   */</div>
<div class="ContentPasted0">>>> -int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>> -                                   enum i915_cache_level cache_level)</div>
<div class="ContentPasted0">>>> +int i915_gem_object_set_cache(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>> +                             i915_cache_t cache)</div>
<div class="ContentPasted0">>>>  {</div>
<div class="ContentPasted0">>>> +       struct drm_i915_private *i915 = to_i915(obj->base.dev);</div>
<div class="ContentPasted0">>>>          int ret;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> -       /*</div>
<div class="ContentPasted0">>>> -        * For objects created by userspace through GEM_CREATE with pat_index</div>
<div class="ContentPasted0">>>> -        * set by set_pat extension, simply return 0 here without touching</div>
<div class="ContentPasted0">>>> -        * the cache setting, because such objects should have an immutable</div>
<div class="ContentPasted0">>>> -        * cache setting by desgin and always managed by userspace.</div>
<div class="ContentPasted0">>>> -        */</div>
<div class="ContentPasted0">>>> -       if (i915_gem_object_has_cache_level(obj, cache_level))</div>
<div class="ContentPasted0">>>> +       ret = i915_cache_find_pat(i915, cache);</div>
<div class="ContentPasted0">>>> +       if (ret < 0) {</div>
<div class="ContentPasted0">>>> +               struct drm_printer p =</div>
<div class="ContentPasted0">>>> +                       drm_err_printer("Attempting to use unknown caching mode ");</div>
<div class="ContentPasted0">>>> +</div>
<div class="ContentPasted0">>>> +               i915_cache_print(&p, cache);</div>
<div class="ContentPasted0">>>> +               drm_puts(&p, "!\n");</div>
<div class="ContentPasted0">>>> +</div>
<div class="ContentPasted0">>>> +               return -EINVAL;</div>
<div class="ContentPasted0">>>> +       } else if (ret == obj->pat_index) {</div>
<div class="ContentPasted0">>>>                  return 0;</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> We will have to do this conversion and checking again later in this</div>
<div class="ContentPasted0">>> function when calling i915_gem_object_set_cache_coherency().</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Yes the double lookup part is a bit naff. It is not super required</div>
<div class="ContentPasted0">> apart for validating internal callers (could be a debug build only</div>
<div class="ContentPasted0">> feature perhaps) and to see if PAT index has changed and so whether</div>
<div class="ContentPasted0">> it needs to call i915_gem_object_wait before proceeding to</div>
<div class="ContentPasted0">> i915_gem_object_set_cache_coherency...</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>> My thought was to simply remove the use of cache_level/cache and replace</div>
<div class="ContentPasted0">>> it with pat_idex. Conversion from cache modes to pat index should be done</div>
<div class="ContentPasted0">>> before calling any function used to consume cache_level/cache.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> ... I could probably call the setter which takes PAT index instead of</div>
<div class="ContentPasted0">> i915_gem_object_set_cache_coherency few lines below. That would skip the</div>
<div class="ContentPasted0">> double lookup and make you happy(-ier)?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">Do you see any problem just letting these functions take pat_index as the</div>
<div class="ContentPasted0">second argument? These functions are currently called with a constant</div>
<div class="ContentPasted0">cache_level/mode, if we have INTEL_INFO(i915)->pat_uc/wt/wb set properly,</div>
<div class="ContentPasted0">using pat index makes no difference, right?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>> +       } else if (obj->pat_set_by_user) {</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> Shouldn't this if-statement be placed at the beginning of this function?</div>
<div class="ContentPasted0">>> the original</div>
<div class="ContentPasted0">>> i915_gem_object_has_cache_level() would return true if</div>
<div class="ContentPasted0">>> (obj->pat_set_by_user), so the</div>
<div class="ContentPasted0">>> function returns right away.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>> +               drm_notice_once(&i915->drm,</div>
<div class="ContentPasted0">>>> +                               "Attempting to change caching mode on an object with fixed PAT!\n");</div>
<div class="ContentPasted0">>>> +               return -EINVAL;</div>
<div class="ContentPasted0">>>> +       }</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>          ret = i915_gem_object_wait(obj,</div>
<div class="ContentPasted0">>>>                                     I915_WAIT_INTERRUPTIBLE |</div>
<div class="ContentPasted0">>>> @@ -291,7 +294,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>>                  return ret;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>          /* Always invalidate stale cachelines */</div>
<div class="ContentPasted0">>>> -       i915_gem_object_set_cache_coherency(obj, cache_level);</div>
<div class="ContentPasted0">>>> +       i915_gem_object_set_cache_coherency(obj, cache);</div>
<div class="ContentPasted0">>>>          obj->cache_dirty = true;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>          /* The cache-level will be applied when each vma is rebound. */</div>
<div class="ContentPasted0">>>> @@ -326,10 +329,9 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>                  goto out;</div>
<div class="ContentPasted0">>>>          }</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> -       if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||</div>
<div class="ContentPasted0">>>> -           i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))</div>
<div class="ContentPasted0">>>> +       if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB))</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> This looks wrong, at least for MTL. Prior to MTL the GPU automatically snoop</div>
<div class="ContentPasted0">>> CPU cache, but from MTL onward you have to specify if WB or WB + 1-WAY</div>
<div class="ContentPasted0">>> COH is needed. And for KMD, cacheable mode means WB + 1-WAY COH for MTL to keep the</div>
<div class="ContentPasted0">>> behavior consistent.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> This used to be taken care of by i915_gem_get_pat_index() call. With</div>
<div class="ContentPasted0">>> that being replaced by i915_cache_find_pat(), you would need to do something there.</div>
<div class="ContentPasted0">>> But, without cachelevel_to_pat[], you might end up hard coding something</div>
<div class="ContentPasted0">>> directly in the function, and that is platform dependent. hmm..., I don't really</div>
<div class="ContentPasted0">>> like this idea.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> That's why I commented in v1 that we should use INTEL_INFO(i915)->pat_uc/wb/wt</div>
<div class="ContentPasted0">>> instead of enum i915_cache_level or i915_cache_t.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> I think I get it. I hope so.. So if I made the tables like this:</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> #define LEGACY_CACHE_MODES \</div>
<div class="ContentPasted0">>        .cache_modes = { \</div>
<div class="ContentPasted0">>                [0] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">>                [1] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">>                [2] = _I915_CACHE(WB, COH1W | L3), \ // 2way??</div>
<div class="ContentPasted0">>                [3] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">>         }</div>
<div class="ContentPasted0">> </div>
<div class="ContentPasted0">> #define GEN12_CACHE_MODES \</div>
<div class="ContentPasted0">>        .cache_modes = { \</div>
<div class="ContentPasted0">>                [0] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">>                [1] = I915_CACHE(WC), \</div>
<div class="ContentPasted0">>                [2] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">>                [3] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">>         }</div>
<div class="ContentPasted0">> </div>
<div class="ContentPasted0">> #define MTL_CACHE_MODES \</div>
<div class="ContentPasted0">>        .cache_modes = { \</div>
<div class="ContentPasted0">>                [0] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">>                [1] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">>                [2] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">>                [3] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">>                [4] = _I915_CACHE(WB, COH2W), \</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> And made i915->pat_wc look up _I915_CACHE(WB, COH1W) would that work?</div>
<div class="ContentPasted0">> Hmm and also all "has cache level" call sites would need to look not just</div>
<div class="ContentPasted0">> for WB but WB+COH1W.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Would it work? Too ugly?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">I don't think this would work. The cache_modes needs to be aligned with BSpec,</div>
<div class="ContentPasted0">otherwise checkings for INTEL_INFO(i915)->cache_modes[obj->pat_index] might</div>
<div class="ContentPasted0">become invalid. Also, COH1W/2W were not even there for platforms prior to MTL.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">I still think setting INTEL_INFO(i915)->pat_uc/wt/wb is the best solution.</div>
<div class="ContentPasted0">With that we can also eliminate the use of I915_CACHE({UC|WT|WB}).</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>>                  args->caching = I915_CACHING_CACHED;</div>
<div class="ContentPasted0">>>> -       else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))</div>
<div class="ContentPasted0">>>> +       else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))</div>
<div class="ContentPasted0">>>>                  args->caching = I915_CACHING_DISPLAY;</div>
<div class="ContentPasted0">>>>          else</div>
<div class="ContentPasted0">>>>                  args->caching = I915_CACHING_NONE;</div>
<div class="ContentPasted0">>>> @@ -344,7 +346,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>          struct drm_i915_private *i915 = to_i915(dev);</div>
<div class="ContentPasted0">>>>          struct drm_i915_gem_caching *args = data;</div>
<div class="ContentPasted0">>>>          struct drm_i915_gem_object *obj;</div>
<div class="ContentPasted0">>>> -       enum i915_cache_level level;</div>
<div class="ContentPasted0">>>> +       i915_cache_t cache;</div>
<div class="ContentPasted0">>>>          int ret = 0;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>          if (IS_DGFX(i915))</div>
<div class="ContentPasted0">>>> @@ -355,7 +357,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>          switch (args->caching) {</div>
<div class="ContentPasted0">>>>          case I915_CACHING_NONE:</div>
<div class="ContentPasted0">>>> -               level = I915_CACHE_NONE;</div>
<div class="ContentPasted0">>>> +               cache = I915_CACHE(UC);</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> For code like this, my thought was</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> -               level = I915_CACHE_NONE;</div>
<div class="ContentPasted0">>> +               pat_index = INTEL_INFO(i915)->pat_uc;</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> And later the set_cache call should take pat_index as argument instead</div>
<div class="ContentPasted0">>> of cache mode.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>>                  break;</div>
<div class="ContentPasted0">>>>          case I915_CACHING_CACHED:</div>
<div class="ContentPasted0">>>>                  /*</div>
<div class="ContentPasted0">>>> @@ -367,10 +369,10 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>                  if (!HAS_LLC(i915) && !HAS_SNOOP(i915))</div>
<div class="ContentPasted0">>>>                          return -ENODEV;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> -               level = I915_CACHE_LLC;</div>
<div class="ContentPasted0">>>> +               cache = I915_CACHE(WB);</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> -               level = I915_CACHE_LLC;</div>
<div class="ContentPasted0">>> +               pat_index = INTEL_INFO(i915)->pat_wb;</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> This should take care of the WB + 1-WAY COH issue for MTL mentioned above,</div>
<div class="ContentPasted0">>> assuming the i915_cache_init() set pat_wb properly, and the</div>
<div class="ContentPasted0">>> i915_gem_object_set_cache() consumes pat_index instead of cache mode.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> That works too yes.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>>                  break;</div>
<div class="ContentPasted0">>>>          case I915_CACHING_DISPLAY:</div>
<div class="ContentPasted0">>>> -               level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;</div>
<div class="ContentPasted0">>>> +               cache = HAS_WT(i915) ? I915_CACHE(WT) : I915_CACHE(UC);</div>
<div class="ContentPasted0">>>>                  break;</div>
<div class="ContentPasted0">>>>          default:</div>
<div class="ContentPasted0">>>>                  return -EINVAL;</div>
<div class="ContentPasted0">>>> @@ -409,7 +411,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>          if (ret)</div>
<div class="ContentPasted0">>>>                  goto out;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> -       ret = i915_gem_object_set_cache_level(obj, level);</div>
<div class="ContentPasted0">>>> +       ret = i915_gem_object_set_cache(obj, cache);</div>
<div class="ContentPasted0">>>>          i915_gem_object_unlock(obj);</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  out:</div>
<div class="ContentPasted0">>>> @@ -448,9 +450,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>>           * of uncaching, which would allow us to flush all the LLC-cached data</div>
<div class="ContentPasted0">>>>           * with that bit in the PTE to main memory with just one PIPE_CONTROL.</div>
<div class="ContentPasted0">>>>           */</div>
<div class="ContentPasted0">>>> -       ret = i915_gem_object_set_cache_level(obj,</div>
<div class="ContentPasted0">>>> -                                             HAS_WT(i915) ?</div>
<div class="ContentPasted0">>>> -                                             I915_CACHE_WT : I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>> +       ret = i915_gem_object_set_cache(obj,</div>
<div class="ContentPasted0">>>> +                                       HAS_WT(i915) ?</div>
<div class="ContentPasted0">>>> +                                       I915_CACHE(WT) : I915_CACHE(UC));</div>
<div class="ContentPasted0">>>>          if (ret)</div>
<div class="ContentPasted0">>>>                  return ERR_PTR(ret);</div>
<div class="ContentPasted0">>>></div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">[...]</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>> @@ -215,6 +222,7 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>          /*</div>
<div class="ContentPasted0">>>>           * Always flush cache for UMD objects at creation time.</div>
<div class="ContentPasted0">>>>           */</div>
<div class="ContentPasted0">>>> +       /* QQQ/FIXME why? avoidable performance penalty? */</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">This is needed because UMD's assume zero-initialized BO's are really zero'ed out</div>
<div class="ContentPasted0">before getting the handles to the BO's (See VLK-46522). Otherwise UMD's could read</div>
<div class="ContentPasted0">stale data, thus cause security issues.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>>          if (obj->pat_set_by_user)</div>
<div class="ContentPasted0">>>>                  return true;</div>
<div class="ContentPasted0">>>></div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">[...]</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>> index dbfe6443457b..f48a21895a85 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>> @@ -27,6 +27,8 @@</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  #include <uapi/drm/i915_drm.h></div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> +#include "i915_cache.h"</div>
<div class="ContentPasted0">>>> +</div>
<div class="ContentPasted0">>>>  #include "intel_step.h"</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>  #include "gt/intel_engine_types.h"</div>
<div class="ContentPasted0">>>> @@ -243,8 +245,8 @@ struct intel_device_info {</div>
<div class="ContentPasted0">>>>           */</div>
<div class="ContentPasted0">>>>          const struct intel_runtime_info __runtime;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> -       u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];</div>
<div class="ContentPasted0">>>> -       u32 max_pat_index;</div>
<div class="ContentPasted0">>>> +       i915_cache_t cache_modes[9];</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> I was commenting on the array size here. It's probably better to make it 16</div>
<div class="ContentPasted0">>> because there are 4 PAT index bits defined in the PTE. Indices above</div>
<div class="ContentPasted0">>> max_pat_index</div>
<div class="ContentPasted0">>> are not used, but theoretically new mode could be added. Well, it's up</div>
<div class="ContentPasted0">>> to you,</div>
<div class="ContentPasted0">>> not likely to happen anyway.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Ah okay. I am not too concerned. Compiler will let us know if it happens.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Unrelated to this comment - what about i915_gem_object_can_bypass_llc?</div>
<div class="ContentPasted0">> Could we do better (less pessimistic) with something like my approach and</div>
<div class="ContentPasted0">> so maybe penalize MTL less?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">The problem is that, for the BO's managed by UMD's, the KMD doesn't know</div>
<div class="ContentPasted0">whether they are going to be mapped as cacheable or uncacheable on the CPU</div>
<div class="ContentPasted0">side. The PAT index controls GPU access only. That's why we make sure all</div>
<div class="ContentPasted0">BO's with PAT set by UMD (which means UMD will take control and managing the</div>
<div class="ContentPasted0">coherency) are clflush'ed.</div>
<div><br class="ContentPasted0">
</div>
<div>-Fei</div>
<div class="ContentPasted0"><br>
</div>
<div class="ContentPasted0">> Regards,</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Tvrtko</div>
<br>
</div>
</body>
</html>