<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
> On Sun, May 07, 2023 at 11:39:18PM -0700, Yang, Fei wrote:
<div class="ContentPasted0">>>> On Wed, May 03, 2023 at 03:50:59PM -0700, fei.yang@intel.com wrote:</div>
<div class="ContentPasted0">>>>> From: Fei Yang <fei.yang@intel.com></div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> Currently the KMD is using enum i915_cache_level to set caching policy for</div>
<div class="ContentPasted0">>>>> buffer objects. This is flaky because the PAT index which really controls</div>
<div class="ContentPasted0">>>>> the caching behavior in PTE has far more levels than what's defined in the</div>
<div class="ContentPasted0">>>>> enum. In addition, the PAT index is platform dependent, having to translate</div>
<div class="ContentPasted0">>>>> between i915_cache_level and PAT index is not reliable, and makes the code</div>
<div class="ContentPasted0">>>>> more complicated.</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> From UMD's perspective there is also a necessity to set caching policy for</div>
<div class="ContentPasted0">>>>> performance fine tuning. It's much easier for the UMD to directly use PAT</div>
<div class="ContentPasted0">>>>> index because the behavior of each PAT index is clearly defined in Bspec.</div>
<div class="ContentPasted0">>>>> Having the abstracted i915_cache_level sitting in between would only cause</div>
<div class="ContentPasted0">>>>> more ambiguity.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> It may be worth mentioning here that PAT is expected to work much like</div>
<div class="ContentPasted0">>>> MOCS already works today --- the contract on the exact platform-specific</div>
<div class="ContentPasted0">>>> meaning of each index is documented in the hardware spec and userspace</div>
<div class="ContentPasted0">>>> is expected to select the index that exactly matches the behavior it</div>
<div class="ContentPasted0">>>> desires.</div>
<div class="ContentPasted0">>> will update.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">Please review https://patchwork.freedesktop.org/series/117480/</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> For these reasons this patch replaces i915_cache_level with PAT index. Also</div>
<div class="ContentPasted0">>>>> note, the cache_level is not completely removed yet, because the KMD still</div>
<div class="ContentPasted0">>>>> has the need of creating buffer objects with simple cache settings such as</div>
<div class="ContentPasted0">>>>> cached, uncached, or writethrough. For such simple cases, using cache_level</div>
<div class="ContentPasted0">>>>> would help simplify the code.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> See my comment farther down, but the implementation of</div>
<div class="ContentPasted0">>>> i915_gem_object_has_cache_level() seems a bit confusing and you may want</div>
<div class="ContentPasted0">>>> to elaborate on it here.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Also somewhat confusing from a high-level skim is if/how we're still</div>
<div class="ContentPasted0">>>> using obj->cache_coherent once userspace has taken direct control of the</div>
<div class="ContentPasted0">>>> cache behavior.  Some PAT indices give coherency and others don't (and</div>
<div class="ContentPasted0">>>> the combinations will likely get more complicated on future platforms).</div>
<div class="ContentPasted0">>>> If obj->cache_coherent is still being considered even once PAT indices</div>
<div class="ContentPasted0">>>> are being controlled by userspace, I think we need some explanation of</div>
<div class="ContentPasted0">>>> how that works in the commit message (and likely in the kerneldoc for</div>
<div class="ContentPasted0">>>> that field too).</div>
<div class="ContentPasted0">>> For the objects with pat_index set by user space, coherency is managed by</div>
<div class="ContentPasted0">>> user space. Things like obj->cache_coherent and obj->cache_dirty are still</div>
<div class="ContentPasted0">>> there for objects created by kernel.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Right, that's the challenge --- userspace is taking over control of this</div>
<div class="ContentPasted0">> stuff, but the fields are still around and still used internally within</div>
<div class="ContentPasted0">> the driver.  How we reconcile those two things needs to be clearly</div>
<div class="ContentPasted0">> explained in the commit message and kerneldoc, otherwise we're going to</div>
<div class="ContentPasted0">> wind up with confusing code that's very difficult to maintain down the</div>
<div class="ContentPasted0">> road.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> All the cache behavior is complicated enough that it probably wouldn't</div>
<div class="ContentPasted0">> be a bad idea to have a dedicated section of the kerneldoc focused on</div>
<div class="ContentPasted0">> cache behavior.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0 elementToProof">Updated with kerneldoc and comments.</div>
<div class="ContentPasted0 elementToProof">As discussed off-line, I have also squashed the pte_encode patch.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com></div>
<div class="ContentPasted0">>>>> Cc: Matt Roper <matthew.d.roper@intel.com></div>
<div class="ContentPasted0">>>>> Signed-off-by: Fei Yang <fei.yang@intel.com></div>
<div class="ContentPasted0">>>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com></div>
<div class="ContentPasted0">>>>> ---</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 45 ++++++----</div>
<div class="ContentPasted0">>>>>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gem/i915_gem_object.c    | 51 +++++++++++-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |  4 +</div>
<div class="ContentPasted0">>>>>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 25 +++++-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--</div>
<div class="ContentPasted0">>>>>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-</div>
<div class="ContentPasted0">>>>>  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-</div>
<div class="ContentPasted0">>>>>  .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 71 ++++++++--------</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/intel_ggtt.c          | 82 +++++++++----------</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 ++---</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++++++-----</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 ++++++-----</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/i915_debugfs.c           | 52 +++++++++---</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/i915_gem.c               | 16 +++-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/i915_vma.c               | 16 ++--</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/i915_vma.h               |  2 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/i915_vma_types.h         |  2 -</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-</div>
<div class="ContentPasted0">>>>>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--</div>
<div class="ContentPasted0">>>>>  .../drm/i915/selftests/intel_memory_region.c  |  4 +-</div>
<div class="ContentPasted0">>>>>  drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-</div>
<div class="ContentPasted0">>>>>  36 files changed, 391 insertions(+), 240 deletions(-)</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>>> index c5eacfdba1a5..7c5fddb203ba 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>>> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)</div>
<div class="ContentPasted0">>>>>  static void dpt_insert_page(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>>                    dma_addr_t addr,</div>
<div class="ContentPasted0">>>>>                    u64 offset,</div>
<div class="ContentPasted0">>>>> -                  enum i915_cache_level level,</div>
<div class="ContentPasted0">>>>> +                  unsigned int pat_index,</div>
<div class="ContentPasted0">>>>>                    u32 flags)</div>
<div class="ContentPasted0">>>>>  {</div>
<div class="ContentPasted0">>>>>    struct i915_dpt *dpt = i915_vm_to_dpt(vm);</div>
<div class="ContentPasted0">>>>>    gen8_pte_t __iomem *base = dpt->iomem;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>    gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,</div>
<div class="ContentPasted0">>>>> -             vm->pte_encode(addr, level, flags));</div>
<div class="ContentPasted0">>>>> +             vm->pte_encode(addr, pat_index, flags));</div>
<div class="ContentPasted0">>>>>  }</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>  static void dpt_insert_entries(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>>                       struct i915_vma_resource *vma_res,</div>
<div class="ContentPasted0">>>>> -                     enum i915_cache_level level,</div>
<div class="ContentPasted0">>>>> +                     unsigned int pat_index,</div>
<div class="ContentPasted0">>>>>                       u32 flags)</div>
<div class="ContentPasted0">>>>>  {</div>
<div class="ContentPasted0">>>>>    struct i915_dpt *dpt = i915_vm_to_dpt(vm);</div>
<div class="ContentPasted0">>>>>    gen8_pte_t __iomem *base = dpt->iomem;</div>
<div class="ContentPasted0">>>>> -  const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);</div>
<div class="ContentPasted0">>>>> +  const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);</div>
<div class="ContentPasted0">>>>>    struct sgt_iter sgt_iter;</div>
<div class="ContentPasted0">>>>>    dma_addr_t addr;</div>
<div class="ContentPasted0">>>>>    int i;</div>
<div class="ContentPasted0">>>>> @@ -83,7 +83,7 @@ static void dpt_clear_range(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>>  static void dpt_bind_vma(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>>                 struct i915_vm_pt_stash *stash,</div>
<div class="ContentPasted0">>>>>                 struct i915_vma_resource *vma_res,</div>
<div class="ContentPasted0">>>>> -               enum i915_cache_level cache_level,</div>
<div class="ContentPasted0">>>>> +               unsigned int pat_index,</div>
<div class="ContentPasted0">>>>>                 u32 flags)</div>
<div class="ContentPasted0">>>>>  {</div>
<div class="ContentPasted0">>>>>    u32 pte_flags;</div>
<div class="ContentPasted0">>>>> @@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>>    if (vma_res->bi.lmem)</div>
<div class="ContentPasted0">>>>>          pte_flags |= PTE_LM;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>> -  vm->insert_entries(vm, vma_res, cache_level, pte_flags);</div>
<div class="ContentPasted0">>>>> +  vm->insert_entries(vm, vma_res, pat_index, pte_flags);</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>    vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>>> index d2d5a24301b2..ae99b4be5918 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>>> @@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>>    if (IS_DGFX(i915))</div>
<div class="ContentPasted0">>>>>          return false;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>> -  return !(obj->cache_level == I915_CACHE_NONE ||</div>
<div class="ContentPasted0">>>>> -         obj->cache_level == I915_CACHE_WT);</div>
<div class="ContentPasted0">>>>> +  return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>>>> +         i915_gem_object_has_cache_level(obj, I915_CACHE_WT));</div>
<div class="ContentPasted0">>>>>  }</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>  bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>> @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>>>  {</div>
<div class="ContentPasted0">>>>>    int ret;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>> -  if (obj->cache_level == cache_level)</div>
<div class="ContentPasted0">>>>> +  if (i915_gem_object_has_cache_level(obj, cache_level))</div>
<div class="ContentPasted0">>>>>          return 0;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>    ret = i915_gem_object_wait(obj,</div>
<div class="ContentPasted0">>>>> @@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>>>          return ret;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>    /* Always invalidate stale cachelines */</div>
<div class="ContentPasted0">>>>> -  if (obj->cache_level != cache_level) {</div>
<div class="ContentPasted0">>>>> -        i915_gem_object_set_cache_coherency(obj, cache_level);</div>
<div class="ContentPasted0">>>>> -        obj->cache_dirty = true;</div>
<div class="ContentPasted0">>>>> -  }</div>
<div class="ContentPasted0">>>>> +  i915_gem_object_set_cache_coherency(obj, cache_level);</div>
<div class="ContentPasted0">>>>> +  obj->cache_dirty = true;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>    /* The cache-level will be applied when each vma is rebound. */</div>
<div class="ContentPasted0">>>>>    return i915_gem_object_unbind(obj,</div>
<div class="ContentPasted0">>>>> @@ -306,20 +304,22 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>>          goto out;</div>
<div class="ContentPasted0">>>>>    }</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>> -  switch (obj->cache_level) {</div>
<div class="ContentPasted0">>>>> -  case I915_CACHE_LLC:</div>
<div class="ContentPasted0">>>>> -  case I915_CACHE_L3_LLC:</div>
<div class="ContentPasted0">>>>> -        args->caching = I915_CACHING_CACHED;</div>
<div class="ContentPasted0">>>>> -        break;</div>
<div class="ContentPasted0">>>>> +  /*</div>
<div class="ContentPasted0">>>>> +   * This ioctl should be disabled for the objects with pat_index</div>
<div class="ContentPasted0">>>>> +   * set by user space.</div>
<div class="ContentPasted0">>>>> +   */</div>
<div class="ContentPasted0">>>>> +  if (obj->pat_set_by_user) {</div>
<div class="ContentPasted0">>>>> +        err = -EOPNOTSUPP;</div>
<div class="ContentPasted0">>>>> +        goto out;</div>
<div class="ContentPasted0">>>>> +  }</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>> -  case I915_CACHE_WT:</div>
<div class="ContentPasted0">>>>> +  if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||</div>
<div class="ContentPasted0">>>>> +      i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))</div>
<div class="ContentPasted0">>>>> +        args->caching = I915_CACHING_CACHED;</div>
<div class="ContentPasted0">>>>> +  else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))</div>
<div class="ContentPasted0">>>>>          args->caching = I915_CACHING_DISPLAY;</div>
<div class="ContentPasted0">>>>> -        break;</div>
<div class="ContentPasted0">>>>> -</div>
<div class="ContentPasted0">>>>> -  default:</div>
<div class="ContentPasted0">>>>> +  else</div>
<div class="ContentPasted0">>>>>          args->caching = I915_CACHING_NONE;</div>
<div class="ContentPasted0">>>>> -        break;</div>
<div class="ContentPasted0">>>>> -  }</div>
<div class="ContentPasted0">>>>>  out:</div>
<div class="ContentPasted0">>>>>    rcu_read_unlock();</div>
<div class="ContentPasted0">>>>>    return err;</div>
<div class="ContentPasted0">>>>> @@ -364,6 +364,15 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>>    if (!obj)</div>
<div class="ContentPasted0">>>>>          return -ENOENT;</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>> +  /*</div>
<div class="ContentPasted0">>>>> +   * This ioctl should be disabled for the objects with pat_index</div>
<div class="ContentPasted0">>>>> +   * set by user space.</div>
<div class="ContentPasted0">>>>> +   */</div>
<div class="ContentPasted0">>>>> +  if (obj->pat_set_by_user) {</div>
<div class="ContentPasted0">>>>> +        ret = -EOPNOTSUPP;</div>
<div class="ContentPasted0">>>>> +        goto out;</div>
<div class="ContentPasted0">>>>> +  }</div>
<div class="ContentPasted0">>>>> +</div>
<div class="ContentPasted0">>>>>    /*</div>
<div class="ContentPasted0">>>>>     * The caching mode of proxy object is handled by its generator, and</div>
<div class="ContentPasted0">>>>>     * not allowed to be changed by userspace.</div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c</div>
<div class="ContentPasted0">>>>> index 3aeede6aee4d..d42915516636 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c</div>
<div class="ContentPasted0">>>>> @@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>    return (cache->has_llc ||</div>
<div class="ContentPasted0">>>>>          obj->cache_dirty ||</div>
<div class="ContentPasted0">>>>> -        obj->cache_level != I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>>> +        !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));</div>
<div class="ContentPasted0">>>>>  }</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>  static int eb_reserve_vma(struct i915_execbuffer *eb,</div>
<div class="ContentPasted0">>>>> @@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,</div>
<div class="ContentPasted0">>>>>    offset = cache->node.start;</div>
<div class="ContentPasted0">>>>>    if (drm_mm_node_allocated(&cache->node)) {</div>
<div class="ContentPasted0">>>>>          ggtt->vm.insert_page(&ggtt->vm,</div>
<div class="ContentPasted0">>>>> -                         i915_gem_object_get_dma_address(obj, page),</div>
<div class="ContentPasted0">>>>> -                         offset, I915_CACHE_NONE, 0);</div>
<div class="ContentPasted0">>>>> +              i915_gem_object_get_dma_address(obj, page),</div>
<div class="ContentPasted0">>>>> +              offset,</div>
<div class="ContentPasted0">>>>> +              i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),</div>
<div class="ContentPasted0">>>>> +              0);</div>
<div class="ContentPasted0">>>>>    } else {</div>
<div class="ContentPasted0">>>>>          offset += page << PAGE_SHIFT;</div>
<div class="ContentPasted0">>>>>    }</div>
<div class="ContentPasted0">>>>> @@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,</div>
<div class="ContentPasted0">>>>>                reloc_cache_unmap(&eb->reloc_cache);</div>
<div class="ContentPasted0">>>>>                mutex_lock(&vma->vm->mutex);</div>
<div class="ContentPasted0">>>>>                err = i915_vma_bind(target->vma,</div>
<div class="ContentPasted0">>>>> -                              target->vma->obj->cache_level,</div>
<div class="ContentPasted0">>>>> +                              target->vma->obj->pat_index,</div>
<div class="ContentPasted0">>>>>                                PIN_GLOBAL, NULL, NULL);</div>
<div class="ContentPasted0">>>>>                mutex_unlock(&vma->vm->mutex);</div>
<div class="ContentPasted0">>>>>                reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);</div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c</div>
<div class="ContentPasted0">>>>> index 3dbacdf0911a..50c30efa08a3 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c</div>
<div class="ContentPasted0">>>>> @@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)</div>
<div class="ContentPasted0">>>>>    }</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>>    /* Access to snoopable pages through the GTT is incoherent. */</div>
<div class="ContentPasted0">>>>> -  if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {</div>
<div class="ContentPasted0">>>>> +  if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>>>> +        HAS_LLC(i915))) {</div>
<div class="ContentPasted0">>>>>          ret = -EFAULT;</div>
<div class="ContentPasted0">>>>>          goto err_unpin;</div>
<div class="ContentPasted0">>>>>    }</div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c</div>
<div class="ContentPasted0">>>>> index 8c70a0ec7d2f..46a19b099ec8 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c</div>
<div class="ContentPasted0">>>>> @@ -54,6 +54,24 @@ unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,</div>
<div class="ContentPasted0">>>>>    return INTEL_INFO(i915)->cachelevel_to_pat[level];</div>
<div class="ContentPasted0">>>>>  }</div>
<div class="ContentPasted0">>>>>  </div>
<div class="ContentPasted0">>>>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>>> +                         enum i915_cache_level lvl)</div>
<div class="ContentPasted0">>>>> +{</div>
<div class="ContentPasted0">>>>> +  /*</div>
<div class="ContentPasted0">>>>> +   * In case the pat_index is set by user space, this kernel mode</div>
<div class="ContentPasted0">>>>> +   * driver should leave the coherency to be managed by user space,</div>
<div class="ContentPasted0">>>>> +   * simply return true here.</div>
<div class="ContentPasted0">>>>> +   */</div>
<div class="ContentPasted0">>>>> +  if (obj->pat_set_by_user)</div>
<div class="ContentPasted0">>>>> +        return true;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> This function gets called in a few different places; if the question is</div>
<div class="ContentPasted0">>>> "does the object have specific cache behavior XX" then universally</div>
<div class="ContentPasted0">>>> returning "yes" if the user has taken direct control of PAT seems</div>
<div class="ContentPasted0">>>> confusing, especially since we're not looking at what cache level was</div>
<div class="ContentPasted0">>>> being queried nor what PAT index was selected by userspace.  A return</div>
<div class="ContentPasted0">>>> value of 'true' here could be correct in some situations but not others,</div>
<div class="ContentPasted0">>>> depending on how the caller intends to use that information.  I assume</div>
<div class="ContentPasted0">>>> you've gone through all the places this function gets called and ensured</div>
<div class="ContentPasted0">>>> that a return value of true results in the expected behavior; you may</div>
<div class="ContentPasted0">>>> want to elaborate on that in the commit message.</div>
<div class="ContentPasted0">>> There is a patch later in the series disabling {set|get}_caching ioctl for</div>
<div class="ContentPasted0">>> objects with pat_index set by userspace. With that the logic here would</div>
<div class="ContentPasted0">>> make sure kernel won't touch the cache settings for these objects. There</div>
<div class="ContentPasted0">>> is one caller vm_fault_gtt() might be a bit sensitive though, CI doesn't</div>
<div class="ContentPasted0">>> report any problem here, but if further changes in this part of the code</div>
<div class="ContentPasted0">>> were made, the logic here might need to be refined.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> From a quick skim, it looks like there are callsites at:</div>
<div class="ContentPasted0">>  - gpu_write_needs_clflush()</div>
<div class="ContentPasted0">>  - i915_gem_object_set_cache_level()</div>
<div class="ContentPasted0">>  - i915_gem_get_caching_ioctl()</div>
<div class="ContentPasted0">>  - use_cpu_reloc()</div>
<div class="ContentPasted0">>  - vm_fault_gtt()</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> For some of them like the ioctl function, we'll presumably just never</div>
<div class="ContentPasted0">> get to the point in the function where it's called if userspace has</div>
<div class="ContentPasted0">> taken control of PAT indices, but that's the kind of thing that needs to</div>
<div class="ContentPasted0">> be clearly explained.  These changes are a large change to how the</div>
<div class="ContentPasted0">> driver works and people are going to be looking to the commit messages,</div>
<div class="ContentPasted0">> existing kerneldoc, etc. to figure out how things should work down the</div>
<div class="ContentPasted0">> road.  We can't just explain stuff in a mailing list thread and rely on</div>
<div class="ContentPasted0">> people remembering the details months or years from now.  There may be</div>
<div class="ContentPasted0">> new places in the driver where i915_gem_object_has_cache_level() gets</div>
<div class="ContentPasted0">> used in the future and the non-obvious "always returns true" behavior</div>
<div class="ContentPasted0">> needs to be clearly documented to make sure that the new callers are</div>
<div class="ContentPasted0">> able to handle that properly.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">Added comments for gpu_write_needs_clflush(), use_cpu_reloc() and</div>
<div class="ContentPasted0">vm_fault_gtt(). Others have comments explaining when to skip already.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">> So I'm not saying any of the code is wrong today, I'm just saying that</div>
<div class="ContentPasted0">> we need to make sure it's properly documented so that it's maintainable</div>
<div class="ContentPasted0">> in the future.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Matt</div>
<br>
</div>
</body>
</html>