<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
<div class="ContentPasted0">>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com></div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Informal commit message for now.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> I got a bit impatient and curious to see if the idea we discussed would</div>
<div class="ContentPasted0">>>> work so sketched something out. I think it is what I was describing back</div>
<div class="ContentPasted0">>>> then..</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> So high level idea is to teach the driver what caching modes are hidden</div>
<div class="ContentPasted0">>>> behind PAT indices. Given you already had that in static tables, if we</div>
<div class="ContentPasted0">>>> just turn the tables a bit around and add a driver abstraction of caching</div>
<div class="ContentPasted0">>>> modes this is what happens:</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> * We can lose the ugly runtime i915_gem_get_pat_index.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> * We can have a smarter i915_gem_object_has_cache_level, which now can</div>
<div class="ContentPasted0">>>> use the above mentioned table to understand the caching modes and so</div>
<div class="ContentPasted0">>>> does not have to pessimistically return true for _any_ input when user</div>
<div class="ContentPasted0">>>> has set the PAT index. This may improve things even for MTL.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> * We can simplify the debugfs printout to be platform agnostic.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> * We are perhaps opening the door to un-regress the dodgy addition</div>
<div class="ContentPasted0">>>> made to i915_gem_object_can_bypass_llc? See QQQ/FIXME in the patch.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> I hope I did not forget anything, but anyway, please have a read and see</div>
<div class="ContentPasted0">>>> what you think. I think it has potential.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Proper commit message can come later.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com></div>
<div class="ContentPasted0">>>> Cc: Fei Yang <fei.yang@intel.com></div>
<div class="ContentPasted0">>>> ---</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/Makefile | 1 +</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/display/intel_dpt.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/display/intel_fb_pin.c | 2 +-</div>
<div class="ContentPasted0">>>> .../drm/i915/display/intel_plane_initial.c | 4 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 66 +++++-----</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_domain.h | 7 +-</div>
<div class="ContentPasted0">>>> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 13 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_internal.c | 6 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_mman.c | 10 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 116 +++++++++--------</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_object.h | 9 +-</div>
<div class="ContentPasted0">>>> .../gpu/drm/i915/gem/i915_gem_object_types.h | 117 +++---------------</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 10 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 13 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 43 ++++---</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +-</div>
<div class="ContentPasted0">>>> .../drm/i915/gem/selftests/huge_gem_object.c | 6 +-</div>
<div class="ContentPasted0">>>> .../gpu/drm/i915/gem/selftests/huge_pages.c | 8 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 4 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 19 +--</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 33 ++---</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c | 4 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/intel_gtt.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/intel_gtt.h | 3 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/intel_migrate.c | 11 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/intel_ppgtt.c | 6 +-</div>
<div class="ContentPasted0">>>> .../gpu/drm/i915/gt/intel_ring_submission.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/intel_timeline.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/selftest_migrate.c | 9 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/selftest_reset.c | 14 +--</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/selftest_timeline.c | 1 +</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/selftest_tlb.c | 5 +-</div>
<div class="ContentPasted0">>>> .../gpu/drm/i915/gt/selftest_workarounds.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 8 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_cache.c | 72 +++++++++++</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_cache.h | 49 ++++++++</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_debugfs.c | 65 +++-------</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_driver.c | 5 +</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_drv.h | 3 +</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_gem.c | 21 +---</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_gpu_error.c | 7 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_pci.c | 83 +++++++------</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/i915_perf.c | 2 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/intel_device_info.h | 6 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/selftests/i915_gem.c | 5 +-</div>
<div class="ContentPasted0">>>> .../gpu/drm/i915/selftests/i915_gem_evict.c | 8 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 13 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/selftests/igt_spinner.c | 2 +-</div>
<div class="ContentPasted0">>>> .../drm/i915/selftests/intel_memory_region.c | 4 +-</div>
<div class="ContentPasted0">>>> .../gpu/drm/i915/selftests/mock_gem_device.c | 12 +-</div>
<div class="ContentPasted0">>>> drivers/gpu/drm/i915/selftests/mock_region.c | 2 +-</div>
<div class="ContentPasted0">>>> 54 files changed, 440 insertions(+), 485 deletions(-)</div>
<div class="ContentPasted0">>>> create mode 100644 drivers/gpu/drm/i915/i915_cache.c</div>
<div class="ContentPasted0">>>> create mode 100644 drivers/gpu/drm/i915/i915_cache.h</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile</div>
<div class="ContentPasted0">>>> index 2be9dd960540..2c3da8f0c78e 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/Makefile</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/Makefile</div>
<div class="ContentPasted0">>>> @@ -30,6 +30,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)</div>
<div class="ContentPasted0">>>> # core driver code</div>
<div class="ContentPasted0">>>> i915-y += i915_driver.o \</div>
<div class="ContentPasted0">>>> i915_drm_client.o \</div>
<div class="ContentPasted0">>>> + i915_cache.o \</div>
<div class="ContentPasted0">>>> i915_config.o \</div>
<div class="ContentPasted0">>>> i915_getparam.o \</div>
<div class="ContentPasted0">>>> i915_ioctl.o \</div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>> index 7c5fddb203ba..5baf0d27a288 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c</div>
<div class="ContentPasted0">>>> @@ -268,7 +268,7 @@ intel_dpt_create(struct intel_framebuffer *fb)</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> ret = i915_gem_object_lock_interruptible(dpt_obj, NULL);</div>
<div class="ContentPasted0">>>> if (!ret) {</div>
<div class="ContentPasted0">>>> - ret = i915_gem_object_set_cache_level(dpt_obj, I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>> + ret = i915_gem_object_set_cache(dpt_obj, I915_CACHE(UC));</div>
<div class="ContentPasted0">>>> i915_gem_object_unlock(dpt_obj);</div>
<div class="ContentPasted0">>>> }</div>
<div class="ContentPasted0">>>> if (ret) {</div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb_pin.c b/drivers/gpu/drm/i915/display/intel_fb_pin.c</div>
<div class="ContentPasted0">>>> index fffd568070d4..dcf54b354a74 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/display/intel_fb_pin.c</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/display/intel_fb_pin.c</div>
<div class="ContentPasted0">>>> @@ -66,7 +66,7 @@ intel_pin_fb_obj_dpt(struct drm_framebuffer *fb,</div>
<div class="ContentPasted0">>>> continue;</div>
<div class="ContentPasted0">>>> }</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> - ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>> + ret = i915_gem_object_set_cache(obj, I915_CACHE(UC));</div>
<div class="ContentPasted0">>>> if (ret)</div>
<div class="ContentPasted0">>>> continue;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c</div>
<div class="ContentPasted0">>>> index 736072a8b2b0..9988f6562392 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c</div>
<div class="ContentPasted0">>>> @@ -121,8 +121,8 @@ initial_plane_vma(struct drm_i915_private *i915,</div>
<div class="ContentPasted0">>>> * cache_level during fbdev initialization. The</div>
<div class="ContentPasted0">>>> * unbind there would get stuck waiting for rcu.</div>
<div class="ContentPasted0">>>> */</div>
<div class="ContentPasted0">>>> - i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ?</div>
<div class="ContentPasted0">>>> - I915_CACHE_WT : I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>> + i915_gem_object_set_cache_coherency(obj, HAS_WT(i915) ? I915_CACHE(WT) :</div>
<div class="ContentPasted0">>>> + I915_CACHE(UC));</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> switch (plane_config->tiling) {</div>
<div class="ContentPasted0">>>> case I915_TILING_NONE:</div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>> index dfaaa8b66ac3..f899da2c622a 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c</div>
<div class="ContentPasted0">>>> @@ -8,6 +8,7 @@</div>
<div class="ContentPasted0">>>> #include "display/intel_frontbuffer.h"</div>
<div class="ContentPasted0">>>> #include "gt/intel_gt.h"</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> +#include "i915_cache.h"</div>
<div class="ContentPasted0">>>> #include "i915_drv.h"</div>
<div class="ContentPasted0">>>> #include "i915_gem_clflush.h"</div>
<div class="ContentPasted0">>>> #include "i915_gem_domain.h"</div>
<div class="ContentPasted0">>>> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>> if (IS_DGFX(i915))</div>
<div class="ContentPasted0">>>> return false;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> - /*</div>
<div class="ContentPasted0">>>> - * For objects created by userspace through GEM_CREATE with pat_index</div>
<div class="ContentPasted0">>>> - * set by set_pat extension, i915_gem_object_has_cache_level() will</div>
<div class="ContentPasted0">>>> - * always return true, because the coherency of such object is managed</div>
<div class="ContentPasted0">>>> - * by userspace. Othereise the call here would fall back to checking</div>
<div class="ContentPasted0">>>> - * whether the object is un-cached or write-through.</div>
<div class="ContentPasted0">>>> - */</div>
<div class="ContentPasted0">>>> - return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>>> - i915_gem_object_has_cache_level(obj, I915_CACHE_WT));</div>
<div class="ContentPasted0">>>> + return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 &&</div>
<div class="ContentPasted0">>>> + i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1;</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> This logic was changed for objects with pat index set by user here. It</div>
<div class="ContentPasted0">>> used to return false</div>
<div class="ContentPasted0">>> regardless the cache mode. But now it returns true if its cache mode is</div>
<div class="ContentPasted0">>> neither UC nor WT.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Yes, that was half of the motivation of the refactory. Before the PAT index series code was like this:</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> return !(obj->cache_level == I915_CACHE_NONE ||</div>
<div class="ContentPasted0">> obj->cache_level == I915_CACHE_WT);</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> So kernel knew it needs to flush only if caching mode is neither UC or WT. With the PAT index series you changed it to:</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">> i915_gem_object_has_cache_level(obj, I915_CACHE_WT));</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> And as i915_gem_object_has_cache_level was changed to always return true if PAT was set by user, that made the check meaningless for such objects.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">But the point is that the KMD should not flush the cache for objects with</div>
<div class="ContentPasted0">PAT set by user because UMD is handling the cache conherency. That is the</div>
<div class="ContentPasted0">design. Well doing so wouldn't cause functional issue, but it's unecessary</div>
<div class="ContentPasted0">and might have performance impact.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">> With my refactoring it becomes meaningful again and restores to the original behaviour. That's the intent at least.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>>> bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>> @@ -255,9 +249,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)</div>
<div class="ContentPasted0">>>> }</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> /**</div>
<div class="ContentPasted0">>>> - * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.</div>
<div class="ContentPasted0">>>> + * i915_gem_object_set_cache - Changes the cache-level of an object across all VMA.</div>
<div class="ContentPasted0">>>> * @obj: object to act on</div>
<div class="ContentPasted0">>>> - * @cache_level: new cache level to set for the object</div>
<div class="ContentPasted0">>>> + * @cache: new caching mode to set for the object</div>
<div class="ContentPasted0">>>> *</div>
<div class="ContentPasted0">>>> * After this function returns, the object will be in the new cache-level</div>
<div class="ContentPasted0">>>> * across all GTT and the contents of the backing storage will be coherent,</div>
<div class="ContentPasted0">>>> @@ -269,19 +263,28 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)</div>
<div class="ContentPasted0">>>> * cache coherency) and all non-MOCS GPU access will also be uncached so</div>
<div class="ContentPasted0">>>> * that all direct access to the scanout remains coherent.</div>
<div class="ContentPasted0">>>> */</div>
<div class="ContentPasted0">>>> -int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>> - enum i915_cache_level cache_level)</div>
<div class="ContentPasted0">>>> +int i915_gem_object_set_cache(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>> + i915_cache_t cache)</div>
<div class="ContentPasted0">>>> {</div>
<div class="ContentPasted0">>>> + struct drm_i915_private *i915 = to_i915(obj->base.dev);</div>
<div class="ContentPasted0">>>> int ret;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> - /*</div>
<div class="ContentPasted0">>>> - * For objects created by userspace through GEM_CREATE with pat_index</div>
<div class="ContentPasted0">>>> - * set by set_pat extension, simply return 0 here without touching</div>
<div class="ContentPasted0">>>> - * the cache setting, because such objects should have an immutable</div>
<div class="ContentPasted0">>>> - * cache setting by desgin and always managed by userspace.</div>
<div class="ContentPasted0">>>> - */</div>
<div class="ContentPasted0">>>> - if (i915_gem_object_has_cache_level(obj, cache_level))</div>
<div class="ContentPasted0">>>> + ret = i915_cache_find_pat(i915, cache);</div>
<div class="ContentPasted0">>>> + if (ret < 0) {</div>
<div class="ContentPasted0">>>> + struct drm_printer p =</div>
<div class="ContentPasted0">>>> + drm_err_printer("Attempting to use unknown caching mode ");</div>
<div class="ContentPasted0">>>> +</div>
<div class="ContentPasted0">>>> + i915_cache_print(&p, cache);</div>
<div class="ContentPasted0">>>> + drm_puts(&p, "!\n");</div>
<div class="ContentPasted0">>>> +</div>
<div class="ContentPasted0">>>> + return -EINVAL;</div>
<div class="ContentPasted0">>>> + } else if (ret == obj->pat_index) {</div>
<div class="ContentPasted0">>>> return 0;</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> We will have to do this conversion and checking again later in this</div>
<div class="ContentPasted0">>> function when calling i915_gem_object_set_cache_coherency().</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Yes the double lookup part is a bit naff. It is not super required</div>
<div class="ContentPasted0">> apart for validating internal callers (could be a debug build only</div>
<div class="ContentPasted0">> feature perhaps) and to see if PAT index has changed and so whether</div>
<div class="ContentPasted0">> it needs to call i915_gem_object_wait before proceeding to</div>
<div class="ContentPasted0">> i915_gem_object_set_cache_coherency...</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>> My thought was to simply remove the use of cache_level/cache and replace</div>
<div class="ContentPasted0">>> it with pat_idex. Conversion from cache modes to pat index should be done</div>
<div class="ContentPasted0">>> before calling any function used to consume cache_level/cache.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> ... I could probably call the setter which takes PAT index instead of</div>
<div class="ContentPasted0">> i915_gem_object_set_cache_coherency few lines below. That would skip the</div>
<div class="ContentPasted0">> double lookup and make you happy(-ier)?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">Do you see any problem just letting these functions take pat_index as the</div>
<div class="ContentPasted0">second argument? These functions are currently called with a constant</div>
<div class="ContentPasted0">cache_level/mode, if we have INTEL_INFO(i915)->pat_uc/wt/wb set properly,</div>
<div class="ContentPasted0">using pat index makes no difference, right?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>> + } else if (obj->pat_set_by_user) {</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> Shouldn't this if-statement be placed at the beginning of this function?</div>
<div class="ContentPasted0">>> the original</div>
<div class="ContentPasted0">>> i915_gem_object_has_cache_level() would return true if</div>
<div class="ContentPasted0">>> (obj->pat_set_by_user), so the</div>
<div class="ContentPasted0">>> function returns right away.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>> + drm_notice_once(&i915->drm,</div>
<div class="ContentPasted0">>>> + "Attempting to change caching mode on an object with fixed PAT!\n");</div>
<div class="ContentPasted0">>>> + return -EINVAL;</div>
<div class="ContentPasted0">>>> + }</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> ret = i915_gem_object_wait(obj,</div>
<div class="ContentPasted0">>>> I915_WAIT_INTERRUPTIBLE |</div>
<div class="ContentPasted0">>>> @@ -291,7 +294,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>> return ret;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> /* Always invalidate stale cachelines */</div>
<div class="ContentPasted0">>>> - i915_gem_object_set_cache_coherency(obj, cache_level);</div>
<div class="ContentPasted0">>>> + i915_gem_object_set_cache_coherency(obj, cache);</div>
<div class="ContentPasted0">>>> obj->cache_dirty = true;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> /* The cache-level will be applied when each vma is rebound. */</div>
<div class="ContentPasted0">>>> @@ -326,10 +329,9 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>> goto out;</div>
<div class="ContentPasted0">>>> }</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> - if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||</div>
<div class="ContentPasted0">>>> - i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))</div>
<div class="ContentPasted0">>>> + if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB))</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> This looks wrong, at least for MTL. Prior to MTL the GPU automatically snoop</div>
<div class="ContentPasted0">>> CPU cache, but from MTL onward you have to specify if WB or WB + 1-WAY</div>
<div class="ContentPasted0">>> COH is needed. And for KMD, cacheable mode means WB + 1-WAY COH for MTL to keep the</div>
<div class="ContentPasted0">>> behavior consistent.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> This used to be taken care of by i915_gem_get_pat_index() call. With</div>
<div class="ContentPasted0">>> that being replaced by i915_cache_find_pat(), you would need to do something there.</div>
<div class="ContentPasted0">>> But, without cachelevel_to_pat[], you might end up hard coding something</div>
<div class="ContentPasted0">>> directly in the function, and that is platform dependent. hmm..., I don't really</div>
<div class="ContentPasted0">>> like this idea.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> That's why I commented in v1 that we should use INTEL_INFO(i915)->pat_uc/wb/wt</div>
<div class="ContentPasted0">>> instead of enum i915_cache_level or i915_cache_t.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> I think I get it. I hope so.. So if I made the tables like this:</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> #define LEGACY_CACHE_MODES \</div>
<div class="ContentPasted0">> .cache_modes = { \</div>
<div class="ContentPasted0">> [0] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">> [1] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">> [2] = _I915_CACHE(WB, COH1W | L3), \ // 2way??</div>
<div class="ContentPasted0">> [3] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">> }</div>
<div class="ContentPasted0">> </div>
<div class="ContentPasted0">> #define GEN12_CACHE_MODES \</div>
<div class="ContentPasted0">> .cache_modes = { \</div>
<div class="ContentPasted0">> [0] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">> [1] = I915_CACHE(WC), \</div>
<div class="ContentPasted0">> [2] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">> [3] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">> }</div>
<div class="ContentPasted0">> </div>
<div class="ContentPasted0">> #define MTL_CACHE_MODES \</div>
<div class="ContentPasted0">> .cache_modes = { \</div>
<div class="ContentPasted0">> [0] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">> [1] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">> [2] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">> [3] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">> [4] = _I915_CACHE(WB, COH2W), \</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> And made i915->pat_wc look up _I915_CACHE(WB, COH1W) would that work?</div>
<div class="ContentPasted0">> Hmm and also all "has cache level" call sites would need to look not just</div>
<div class="ContentPasted0">> for WB but WB+COH1W.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Would it work? Too ugly?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">I don't think this would work. The cache_modes needs to be aligned with BSpec,</div>
<div class="ContentPasted0">otherwise checkings for INTEL_INFO(i915)->cache_modes[obj->pat_index] might</div>
<div class="ContentPasted0">become invalid. Also, COH1W/2W were not even there for platforms prior to MTL.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">I still think setting INTEL_INFO(i915)->pat_uc/wt/wb is the best solution.</div>
<div class="ContentPasted0">With that we can also eliminate the use of I915_CACHE({UC|WT|WB}).</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>> args->caching = I915_CACHING_CACHED;</div>
<div class="ContentPasted0">>>> - else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))</div>
<div class="ContentPasted0">>>> + else if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT))</div>
<div class="ContentPasted0">>>> args->caching = I915_CACHING_DISPLAY;</div>
<div class="ContentPasted0">>>> else</div>
<div class="ContentPasted0">>>> args->caching = I915_CACHING_NONE;</div>
<div class="ContentPasted0">>>> @@ -344,7 +346,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>> struct drm_i915_private *i915 = to_i915(dev);</div>
<div class="ContentPasted0">>>> struct drm_i915_gem_caching *args = data;</div>
<div class="ContentPasted0">>>> struct drm_i915_gem_object *obj;</div>
<div class="ContentPasted0">>>> - enum i915_cache_level level;</div>
<div class="ContentPasted0">>>> + i915_cache_t cache;</div>
<div class="ContentPasted0">>>> int ret = 0;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> if (IS_DGFX(i915))</div>
<div class="ContentPasted0">>>> @@ -355,7 +357,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> switch (args->caching) {</div>
<div class="ContentPasted0">>>> case I915_CACHING_NONE:</div>
<div class="ContentPasted0">>>> - level = I915_CACHE_NONE;</div>
<div class="ContentPasted0">>>> + cache = I915_CACHE(UC);</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> For code like this, my thought was</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> - level = I915_CACHE_NONE;</div>
<div class="ContentPasted0">>> + pat_index = INTEL_INFO(i915)->pat_uc;</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> And later the set_cache call should take pat_index as argument instead</div>
<div class="ContentPasted0">>> of cache mode.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>> break;</div>
<div class="ContentPasted0">>>> case I915_CACHING_CACHED:</div>
<div class="ContentPasted0">>>> /*</div>
<div class="ContentPasted0">>>> @@ -367,10 +369,10 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>> if (!HAS_LLC(i915) && !HAS_SNOOP(i915))</div>
<div class="ContentPasted0">>>> return -ENODEV;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> - level = I915_CACHE_LLC;</div>
<div class="ContentPasted0">>>> + cache = I915_CACHE(WB);</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> - level = I915_CACHE_LLC;</div>
<div class="ContentPasted0">>> + pat_index = INTEL_INFO(i915)->pat_wb;</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> This should take care of the WB + 1-WAY COH issue for MTL mentioned above,</div>
<div class="ContentPasted0">>> assuming the i915_cache_init() set pat_wb properly, and the</div>
<div class="ContentPasted0">>> i915_gem_object_set_cache() consumes pat_index instead of cache mode.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> That works too yes.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>>> break;</div>
<div class="ContentPasted0">>>> case I915_CACHING_DISPLAY:</div>
<div class="ContentPasted0">>>> - level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;</div>
<div class="ContentPasted0">>>> + cache = HAS_WT(i915) ? I915_CACHE(WT) : I915_CACHE(UC);</div>
<div class="ContentPasted0">>>> break;</div>
<div class="ContentPasted0">>>> default:</div>
<div class="ContentPasted0">>>> return -EINVAL;</div>
<div class="ContentPasted0">>>> @@ -409,7 +411,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>> if (ret)</div>
<div class="ContentPasted0">>>> goto out;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> - ret = i915_gem_object_set_cache_level(obj, level);</div>
<div class="ContentPasted0">>>> + ret = i915_gem_object_set_cache(obj, cache);</div>
<div class="ContentPasted0">>>> i915_gem_object_unlock(obj);</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> out:</div>
<div class="ContentPasted0">>>> @@ -448,9 +450,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,</div>
<div class="ContentPasted0">>>> * of uncaching, which would allow us to flush all the LLC-cached data</div>
<div class="ContentPasted0">>>> * with that bit in the PTE to main memory with just one PIPE_CONTROL.</div>
<div class="ContentPasted0">>>> */</div>
<div class="ContentPasted0">>>> - ret = i915_gem_object_set_cache_level(obj,</div>
<div class="ContentPasted0">>>> - HAS_WT(i915) ?</div>
<div class="ContentPasted0">>>> - I915_CACHE_WT : I915_CACHE_NONE);</div>
<div class="ContentPasted0">>>> + ret = i915_gem_object_set_cache(obj,</div>
<div class="ContentPasted0">>>> + HAS_WT(i915) ?</div>
<div class="ContentPasted0">>>> + I915_CACHE(WT) : I915_CACHE(UC));</div>
<div class="ContentPasted0">>>> if (ret)</div>
<div class="ContentPasted0">>>> return ERR_PTR(ret);</div>
<div class="ContentPasted0">>>></div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">[...]</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>> @@ -215,6 +222,7 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>> /*</div>
<div class="ContentPasted0">>>> * Always flush cache for UMD objects at creation time.</div>
<div class="ContentPasted0">>>> */</div>
<div class="ContentPasted0">>>> + /* QQQ/FIXME why? avoidable performance penalty? */</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">This is needed because UMD's assume zero-initialized BO's are really zero'ed out</div>
<div class="ContentPasted0">before getting the handles to the BO's (See VLK-46522). Otherwise UMD's could read</div>
<div class="ContentPasted0">stale data, thus cause security issues.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>> if (obj->pat_set_by_user)</div>
<div class="ContentPasted0">>>> return true;</div>
<div class="ContentPasted0">>>></div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">[...]</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>> index dbfe6443457b..f48a21895a85 100644</div>
<div class="ContentPasted0">>>> --- a/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>> +++ b/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>> @@ -27,6 +27,8 @@</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> #include <uapi/drm/i915_drm.h></div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> +#include "i915_cache.h"</div>
<div class="ContentPasted0">>>> +</div>
<div class="ContentPasted0">>>> #include "intel_step.h"</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> #include "gt/intel_engine_types.h"</div>
<div class="ContentPasted0">>>> @@ -243,8 +245,8 @@ struct intel_device_info {</div>
<div class="ContentPasted0">>>> */</div>
<div class="ContentPasted0">>>> const struct intel_runtime_info __runtime;</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> - u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];</div>
<div class="ContentPasted0">>>> - u32 max_pat_index;</div>
<div class="ContentPasted0">>>> + i915_cache_t cache_modes[9];</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> I was commenting on the array size here. It's probably better to make it 16</div>
<div class="ContentPasted0">>> because there are 4 PAT index bits defined in the PTE. Indices above</div>
<div class="ContentPasted0">>> max_pat_index</div>
<div class="ContentPasted0">>> are not used, but theoretically new mode could be added. Well, it's up</div>
<div class="ContentPasted0">>> to you,</div>
<div class="ContentPasted0">>> not likely to happen anyway.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Ah okay. I am not too concerned. Compiler will let us know if it happens.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Unrelated to this comment - what about i915_gem_object_can_bypass_llc?</div>
<div class="ContentPasted0">> Could we do better (less pessimistic) with something like my approach and</div>
<div class="ContentPasted0">> so maybe penalize MTL less?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">The problem is that, for the BO's managed by UMD's, the KMD doesn't know</div>
<div class="ContentPasted0">whether they are going to be mapped as cacheable or uncacheable on the CPU</div>
<div class="ContentPasted0">side. The PAT index controls GPU access only. That's why we make sure all</div>
<div class="ContentPasted0">BO's with PAT set by UMD (which means UMD will take control and managing the</div>
<div class="ContentPasted0">coherency) are clflush'ed.</div>
<div><br class="ContentPasted0">
</div>
<div>-Fei</div>
<div class="ContentPasted0"><br>
</div>
<div class="ContentPasted0">> Regards,</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Tvrtko</div>
<br>
</div>
</body>
</html>