<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
> On Sun, May 07, 2023 at 11:39:18PM -0700, Yang, Fei wrote:
<div class="ContentPasted0">>>> On Wed, May 03, 2023 at 03:50:59PM -0700, fei.yang@intel.com wrote:</div>
<div class="ContentPasted0">>>>> From: Fei Yang <fei.yang@intel.com></div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> Currently the KMD is using enum i915_cache_level to set caching policy for</div>
<div class="ContentPasted0">>>>> buffer objects. This is flaky because the PAT index which really controls</div>
<div class="ContentPasted0">>>>> the caching behavior in PTE has far more levels than what's defined in the</div>
<div class="ContentPasted0">>>>> enum. In addition, the PAT index is platform dependent, having to translate</div>
<div class="ContentPasted0">>>>> between i915_cache_level and PAT index is not reliable, and makes the code</div>
<div class="ContentPasted0">>>>> more complicated.</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> From UMD's perspective there is also a necessity to set caching policy for</div>
<div class="ContentPasted0">>>>> performance fine tuning. It's much easier for the UMD to directly use PAT</div>
<div class="ContentPasted0">>>>> index because the behavior of each PAT index is clearly defined in Bspec.</div>
<div class="ContentPasted0">>>>> Having the abstracted i915_cache_level sitting in between would only cause</div>
<div class="ContentPasted0">>>>> more ambiguity.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> It may be worth mentioning here that PAT is expected to work much like</div>
<div class="ContentPasted0">>>> MOCS already works today --- the contract on the exact platform-specific</div>
<div class="ContentPasted0">>>> meaning of each index is documented in the hardware spec and userspace</div>
<div class="ContentPasted0">>>> is expected to select the index that exactly matches the behavior it</div>
<div class="ContentPasted0">>>> desires.</div>
<div class="ContentPasted0">>> will update.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">Please review https://patchwork.freedesktop.org/series/117480/</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> For these reasons this patch replaces i915_cache_level with PAT index. Also</div>
<div class="ContentPasted0">>>>> note, the cache_level is not completely removed yet, because the KMD still</div>
<div class="ContentPasted0">>>>> has the need of creating buffer objects with simple cache settings such as</div>
<div class="ContentPasted0">>>>> cached, uncached, or writethrough. For such simple cases, using cache_level</div>
<div class="ContentPasted0">>>>> would help simplify the code.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> See my comment farther down, but the implementation of</div>
<div class="ContentPasted0">>>> i915_gem_object_has_cache_level() seems a bit confusing and you may want</div>
<div class="ContentPasted0">>>> to elaborate on it here.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Also somewhat confusing from a high-level skim is if/how we're still</div>
<div class="ContentPasted0">>>> using obj->cache_coherent once userspace has taken direct control of the</div>
<div class="ContentPasted0">>>> cache behavior. Some PAT indices give coherency and others don't (and</div>
<div class="ContentPasted0">>>> the combinations will likely get more complicated on future platforms).</div>
<div class="ContentPasted0">>>> If obj->cache_coherent is still being considered even once PAT indices</div>
<div class="ContentPasted0">>>> are being controlled by userspace, I think we need some explanation of</div>
<div class="ContentPasted0">>>> how that works in the commit message (and likely in the kerneldoc for</div>
<div class="ContentPasted0">>>> that field too).</div>
<div class="ContentPasted0">>> For the objects with pat_index set by user space, coherency is managed by</div>
<div class="ContentPasted0">>> user space. Things like obj->cache_coherent and obj->cache_dirty are still</div>
<div class="ContentPasted0">>> there for objects created by kernel.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Right, that's the challenge --- userspace is taking over control of this</div>
<div class="ContentPasted0">> stuff, but the fields are still around and still used internally within</div>
<div class="ContentPasted0">> the driver. How we reconcile those two things needs to be clearly</div>
<div class="ContentPasted0">> explained in the commit message and kerneldoc, otherwise we're going to</div>
<div class="ContentPasted0">> wind up with confusing code that's very difficult to maintain down the</div>
<div class="ContentPasted0">> road.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> All the cache behavior is complicated enough that it probably wouldn't</div>
<div class="ContentPasted0">> be a bad idea to have a dedicated section of the kerneldoc focused on</div>
<div class="ContentPasted0">> cache behavior.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0 elementToProof">Updated with kerneldoc and comments.</div>
<div class="ContentPasted0 elementToProof">As discussed off-line, I have also squashed the pte_encode patch.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com></div>
<div class="ContentPasted0">>>>> Cc: Matt Roper <matthew.d.roper@intel.com></div>
<div class="ContentPasted0">>>>> Signed-off-by: Fei Yang <fei.yang@intel.com></div>
<div class="ContentPasted0">>>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com></div>
<div class="ContentPasted0">>>>> ---</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/display/intel_dpt.c | 12 +--</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 45 ++++++----</div>
<div class="ContentPasted0">>>>> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 ++-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 51 +++++++++++-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 +</div>
<div class="ContentPasted0">>>>> .../gpu/drm/i915/gem/i915_gem_object_types.h | 25 +++++-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 16 ++--</div>
<div class="ContentPasted0">>>>> .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +-</div>
<div class="ContentPasted0">>>>> .../drm/i915/gem/selftests/i915_gem_migrate.c | 2 +-</div>
<div class="ContentPasted0">>>>> .../drm/i915/gem/selftests/i915_gem_mman.c | 2 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 10 ++-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 71 ++++++++--------</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/gen8_ppgtt.h | 3 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 82 +++++++++----------</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/intel_gtt.h | 20 ++---</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/intel_migrate.c | 47 ++++++-----</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/intel_migrate.h | 13 ++-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/intel_ppgtt.c | 6 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/selftest_migrate.c | 47 ++++++-----</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/selftest_reset.c | 8 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/selftest_timeline.c | 2 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/selftest_tlb.c | 4 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 10 ++-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/i915_debugfs.c | 52 +++++++++---</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/i915_gem.c | 16 +++-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/i915_gpu_error.c | 8 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/i915_vma.c | 16 ++--</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/i915_vma.h | 2 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/i915_vma_types.h | 2 -</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/selftests/i915_gem.c | 5 +-</div>
<div class="ContentPasted0">>>>> .../gpu/drm/i915/selftests/i915_gem_evict.c | 4 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--</div>
<div class="ContentPasted0">>>>> .../drm/i915/selftests/intel_memory_region.c | 4 +-</div>
<div class="ContentPasted0">>>>> drivers/gpu/drm/i915/selftests/mock_gtt.c | 8 +-</div>
<div class="ContentPasted0">>>>> 36 files changed, 391 insertions(+), 240 deletions(-)</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>>> index c5eacfdba1a5..7c5fddb203ba 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>>> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)</div>
<div class="ContentPasted0">>>>> static void dpt_insert_page(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>> dma_addr_t addr,</div>
<div class="ContentPasted0">>>>> u64 offset,</div>
<div class="ContentPasted0">>>>> - enum i915_cache_level level,</div>
<div class="ContentPasted0">>>>> + unsigned int pat_index,</div>
<div class="ContentPasted0">>>>> u32 flags)</div>
<div class="ContentPasted0">>>>> {</div>
<div class="ContentPasted0">>>>> struct i915_dpt *dpt = i915_vm_to_dpt(vm);</div>
<div class="ContentPasted0">>>>> gen8_pte_t __iomem *base = dpt->iomem;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,</div>
<div class="ContentPasted0">>>>> - vm->pte_encode(addr, level, flags));</div>
<div class="ContentPasted0">>>>> + vm->pte_encode(addr, pat_index, flags));</div>
<div class="ContentPasted0">>>>> }</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> static void dpt_insert_entries(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>> struct i915_vma_resource *vma_res,</div>
<div class="ContentPasted0">>>>> - enum i915_cache_level level,</div>
<div class="ContentPasted0">>>>> + unsigned int pat_index,</div>
<div class="ContentPasted0">>>>> u32 flags)</div>
<div class="ContentPasted0">>>>> {</div>
<div class="ContentPasted0">>>>> struct i915_dpt *dpt = i915_vm_to_dpt(vm);</div>
<div class="ContentPasted0">>>>> gen8_pte_t __iomem *base = dpt->iomem;</div>
<div class="ContentPasted0">>>>> - const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);</div>
<div class="ContentPasted0">>>>> + const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);</div>
<div class="ContentPasted0">>>>> struct sgt_iter sgt_iter;</div>
<div class="ContentPasted0">>>>> dma_addr_t addr;</div>
<div class="ContentPasted0">>>>> int i;</div>
<div class="ContentPasted0">>>>> @@ -83,7 +83,7 @@ static void dpt_clear_range(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>> static void dpt_bind_vma(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>> struct i915_vm_pt_stash *stash,</div>
<div class="ContentPasted0">>>>> struct i915_vma_resource *vma_res,</div>
<div class="ContentPasted0">>>>> - enum i915_cache_level cache_level,</div>
<div class="ContentPasted0">>>>> + unsigned int pat_index,</div>
<div class="ContentPasted0">>>>> u32 flags)</div>
<div class="ContentPasted0">>>>> {</div>
<div class="ContentPasted0">>>>> u32 pte_flags;</div>
<div class="ContentPasted0">>>>> @@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,</div>
<div class="ContentPasted0">>>>> if (vma_res->bi.lmem)</div>
<div class="ContentPasted0">>>>> pte_flags |= PTE_LM;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> - vm->insert_entries(vm, vma_res, cache_level, pte_flags);</div>
<div class="ContentPasted0">>>>> + vm->insert_entries(vm, vma_res, pat_index, pte_flags);</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>>> index d2d5a24301b2..ae99b4be5918 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>>> @@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>> if (IS_DGFX(i915))</div>
<div class="ContentPasted0">>>>> return false;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> - return !(obj->cache_level == I915_CACHE_NONE ||</div>
<div class="ContentPasted0">>>>> - obj->cache_level == I915_CACHE_WT);</div>
<div class="ContentPasted0">>>>> + return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>>>> + i915_gem_object_has_cache_level(obj, I915_CACHE_WT));</div>
<div class="ContentPasted0">>>>> }</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>> @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>>> {</div>
<div class="ContentPasted0">>>>> int ret;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> - if (obj->cache_level == cache_level)</div>
<div class="ContentPasted0">>>>> + if (i915_gem_object_has_cache_level(obj, cache_level))</div>
<div class="ContentPasted0">>>>> return 0;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> ret = i915_gem_object_wait(obj,</div>
<div class="ContentPasted0">>>>> @@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>>> return ret;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> /* Always invalidate stale cachelines */</div>
<div class="ContentPasted0">>>>> - if (obj->cache_level != cache_level) {</div>
<div class="ContentPasted0">>>>> - i915_gem_object_set_cache_coherency(obj, cache_level);</div>
<div class="ContentPasted0">>>>> - obj->cache_dirty = true;</div>
<div class="ContentPasted0">>>>> - }</div>
<div class="ContentPasted0">>>>> + i915_gem_object_set_cache_coherency(obj, cache_level);</div>
<div class="ContentPasted0">>>>> + obj->cache_dirty = true;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> /* The cache-level will be applied when each vma is rebound. */</div>
<div class="ContentPasted0">>>>> return i915_gem_object_unbind(obj,</div>
<div class="ContentPasted0">>>>> @@ -306,20 +304,22 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>> goto out;</div>
<div class="ContentPasted0">>>>> }</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> - switch (obj->cache_level) {</div>
<div class="ContentPasted0">>>>> - case I915_CACHE_LLC:</div>
<div class="ContentPasted0">>>>> - case I915_CACHE_L3_LLC:</div>
<div class="ContentPasted0">>>>> - args->caching = I915_CACHING_CACHED;</div>
<div class="ContentPasted0">>>>> - break;</div>
<div class="ContentPasted0">>>>> + /*</div>
<div class="ContentPasted0">>>>> + * This ioctl should be disabled for the objects with pat_index</div>
<div class="ContentPasted0">>>>> + * set by user space.</div>
<div class="ContentPasted0">>>>> + */</div>
<div class="ContentPasted0">>>>> + if (obj->pat_set_by_user) {</div>
<div class="ContentPasted0">>>>> + err = -EOPNOTSUPP;</div>
<div class="ContentPasted0">>>>> + goto out;</div>
<div class="ContentPasted0">>>>> + }</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> - case I915_CACHE_WT:</div>
<div class="ContentPasted0">>>>> + if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||</div>
<div class="ContentPasted0">>>>> + i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))</div>
<div class="ContentPasted0">>>>> + args->caching = I915_CACHING_CACHED;</div>
<div class="ContentPasted0">>>>> + else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))</div>
<div class="ContentPasted0">>>>> args->caching = I915_CACHING_DISPLAY;</div>
<div class="ContentPasted0">>>>> - break;</div>
<div class="ContentPasted0">>>>> -</div>
<div class="ContentPasted0">>>>> - default:</div>
<div class="ContentPasted0">>>>> + else</div>
<div class="ContentPasted0">>>>> args->caching = I915_CACHING_NONE;</div>
<div class="ContentPasted0">>>>> - break;</div>
<div class="ContentPasted0">>>>> - }</div>
<div class="ContentPasted0">>>>> out:</div>
<div class="ContentPasted0">>>>> rcu_read_unlock();</div>
<div class="ContentPasted0">>>>> return err;</div>
<div class="ContentPasted0">>>>> @@ -364,6 +364,15 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>> if (!obj)</div>
<div class="ContentPasted0">>>>> return -ENOENT;</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> + /*</div>
<div class="ContentPasted0">>>>> + * This ioctl should be disabled for the objects with pat_index</div>
<div class="ContentPasted0">>>>> + * set by user space.</div>
<div class="ContentPasted0">>>>> + */</div>
<div class="ContentPasted0">>>>> + if (obj->pat_set_by_user) {</div>
<div class="ContentPasted0">>>>> + ret = -EOPNOTSUPP;</div>
<div class="ContentPasted0">>>>> + goto out;</div>
<div class="ContentPasted0">>>>> + }</div>
<div class="ContentPasted0">>>>> +</div>
<div class="ContentPasted0">>>>> /*</div>
<div class="ContentPasted0">>>>> * The caching mode of proxy object is handled by its generator, and</div>
<div class="ContentPasted0">>>>> * not allowed to be changed by userspace.</div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c</div>
<div class="ContentPasted0">>>>> index 3aeede6aee4d..d42915516636 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c</div>
<div class="ContentPasted0">>>>> @@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> return (cache->has_llc ||</div>
<div class="ContentPasted0">>>>> obj->cache_dirty ||</div>
<div class="ContentPasted0">>>>> - obj->cache_level != I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>>> + !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));</div>
<div class="ContentPasted0">>>>> }</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> static int eb_reserve_vma(struct i915_execbuffer *eb,</div>
<div class="ContentPasted0">>>>> @@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,</div>
<div class="ContentPasted0">>>>> offset = cache->node.start;</div>
<div class="ContentPasted0">>>>> if (drm_mm_node_allocated(&cache->node)) {</div>
<div class="ContentPasted0">>>>> ggtt->vm.insert_page(&ggtt->vm,</div>
<div class="ContentPasted0">>>>> - i915_gem_object_get_dma_address(obj, page),</div>
<div class="ContentPasted0">>>>> - offset, I915_CACHE_NONE, 0);</div>
<div class="ContentPasted0">>>>> + i915_gem_object_get_dma_address(obj, page),</div>
<div class="ContentPasted0">>>>> + offset,</div>
<div class="ContentPasted0">>>>> + i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),</div>
<div class="ContentPasted0">>>>> + 0);</div>
<div class="ContentPasted0">>>>> } else {</div>
<div class="ContentPasted0">>>>> offset += page << PAGE_SHIFT;</div>
<div class="ContentPasted0">>>>> }</div>
<div class="ContentPasted0">>>>> @@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,</div>
<div class="ContentPasted0">>>>> reloc_cache_unmap(&eb->reloc_cache);</div>
<div class="ContentPasted0">>>>> mutex_lock(&vma->vm->mutex);</div>
<div class="ContentPasted0">>>>> err = i915_vma_bind(target->vma,</div>
<div class="ContentPasted0">>>>> - target->vma->obj->cache_level,</div>
<div class="ContentPasted0">>>>> + target->vma->obj->pat_index,</div>
<div class="ContentPasted0">>>>> PIN_GLOBAL, NULL, NULL);</div>
<div class="ContentPasted0">>>>> mutex_unlock(&vma->vm->mutex);</div>
<div class="ContentPasted0">>>>> reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);</div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c</div>
<div class="ContentPasted0">>>>> index 3dbacdf0911a..50c30efa08a3 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c</div>
<div class="ContentPasted0">>>>> @@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)</div>
<div class="ContentPasted0">>>>> }</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> /* Access to snoopable pages through the GTT is incoherent. */</div>
<div class="ContentPasted0">>>>> - if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {</div>
<div class="ContentPasted0">>>>> + if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>>>> + HAS_LLC(i915))) {</div>
<div class="ContentPasted0">>>>> ret = -EFAULT;</div>
<div class="ContentPasted0">>>>> goto err_unpin;</div>
<div class="ContentPasted0">>>>> }</div>
<div class="ContentPasted0">>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c</div>
<div class="ContentPasted0">>>>> index 8c70a0ec7d2f..46a19b099ec8 100644</div>
<div class="ContentPasted0">>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c</div>
<div class="ContentPasted0">>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c</div>
<div class="ContentPasted0">>>>> @@ -54,6 +54,24 @@ unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,</div>
<div class="ContentPasted0">>>>> return INTEL_INFO(i915)->cachelevel_to_pat[level];</div>
<div class="ContentPasted0">>>>> }</div>
<div class="ContentPasted0">>>>> </div>
<div class="ContentPasted0">>>>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>>> + enum i915_cache_level lvl)</div>
<div class="ContentPasted0">>>>> +{</div>
<div class="ContentPasted0">>>>> + /*</div>
<div class="ContentPasted0">>>>> + * In case the pat_index is set by user space, this kernel mode</div>
<div class="ContentPasted0">>>>> + * driver should leave the coherency to be managed by user space,</div>
<div class="ContentPasted0">>>>> + * simply return true here.</div>
<div class="ContentPasted0">>>>> + */</div>
<div class="ContentPasted0">>>>> + if (obj->pat_set_by_user)</div>
<div class="ContentPasted0">>>>> + return true;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> This function gets called in a few different places; if the question is</div>
<div class="ContentPasted0">>>> "does the object have specific cache behavior XX" then universally</div>
<div class="ContentPasted0">>>> returning "yes" if the user has taken direct control of PAT seems</div>
<div class="ContentPasted0">>>> confusing, especially since we're not looking at what cache level was</div>
<div class="ContentPasted0">>>> being queried nor what PAT index was selected by userspace. A return</div>
<div class="ContentPasted0">>>> value of 'true' here could be correct in some situations but not others,</div>
<div class="ContentPasted0">>>> depending on how the caller intends to use that information. I assume</div>
<div class="ContentPasted0">>>> you've gone through all the places this function gets called and ensured</div>
<div class="ContentPasted0">>>> that a return value of true results in the expected behavior; you may</div>
<div class="ContentPasted0">>>> want to elaborate on that in the commit message.</div>
<div class="ContentPasted0">>> There is a patch later in the series disabling {set|get}_caching ioctl for</div>
<div class="ContentPasted0">>> objects with pat_index set by userspace. With that the logic here would</div>
<div class="ContentPasted0">>> make sure kernel won't touch the cache settings for these objects. There</div>
<div class="ContentPasted0">>> is one caller vm_fault_gtt() might be a bit sensitive though, CI doesn't</div>
<div class="ContentPasted0">>> report any problem here, but if further changes in this part of the code</div>
<div class="ContentPasted0">>> were made, the logic here might need to be refined.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> From a quick skim, it looks like there are callsites at:</div>
<div class="ContentPasted0">> - gpu_write_needs_clflush()</div>
<div class="ContentPasted0">> - i915_gem_object_set_cache_level()</div>
<div class="ContentPasted0">> - i915_gem_get_caching_ioctl()</div>
<div class="ContentPasted0">> - use_cpu_reloc()</div>
<div class="ContentPasted0">> - vm_fault_gtt()</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> For some of them like the ioctl function, we'll presumably just never</div>
<div class="ContentPasted0">> get to the point in the function where it's called if userspace has</div>
<div class="ContentPasted0">> taken control of PAT indices, but that's the kind of thing that needs to</div>
<div class="ContentPasted0">> be clearly explained. These changes are a large change to how the</div>
<div class="ContentPasted0">> driver works and people are going to be looking to the commit messages,</div>
<div class="ContentPasted0">> existing kerneldoc, etc. to figure out how things should work down the</div>
<div class="ContentPasted0">> road. We can't just explain stuff in a mailing list thread and rely on</div>
<div class="ContentPasted0">> people remembering the details months or years from now. There may be</div>
<div class="ContentPasted0">> new places in the driver where i915_gem_object_has_cache_level() gets</div>
<div class="ContentPasted0">> used in the future and the non-obvious "always returns true" behavior</div>
<div class="ContentPasted0">> needs to be clearly documented to make sure that the new callers are</div>
<div class="ContentPasted0">> able to handle that properly.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">Added comments for gpu_write_needs_clflush(), use_cpu_reloc() and</div>
<div class="ContentPasted0">vm_fault_gtt(). Others have comments explaining when to skip already.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">> So I'm not saying any of the code is wrong today, I'm just saying that</div>
<div class="ContentPasted0">> we need to make sure it's properly documented so that it's maintainable</div>
<div class="ContentPasted0">> in the future.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Matt</div>
<br>
</div>
</body>
</html>