<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof ContentPasted0">
>>>>> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
<div class="ContentPasted0">>>>>> if (IS_DGFX(i915))</div>
<div class="ContentPasted0">>>>>> return false;</div>
<div class="ContentPasted0">>>>>> - /*</div>
<div class="ContentPasted0">>>>>> - * For objects created by userspace through GEM_CREATE with pat_index</div>
<div class="ContentPasted0">>>>>> - * set by set_pat extension, i915_gem_object_has_cache_level() will</div>
<div class="ContentPasted0">>>>>> - * always return true, because the coherency of such object is managed</div>
<div class="ContentPasted0">>>>>> - * by userspace. Othereise the call here would fall back to checking</div>
<div class="ContentPasted0">>>>>> - * whether the object is un-cached or write-through.</div>
<div class="ContentPasted0">>>>>> - */</div>
<div class="ContentPasted0">>>>>> - return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>>>>> - i915_gem_object_has_cache_level(obj, I915_CACHE_WT));</div>
<div class="ContentPasted0">>>>>> + return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 &&</div>
<div class="ContentPasted0">>>>>> + i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1;</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> This logic was changed for objects with pat index set by user here. It</div>
<div class="ContentPasted0">>>>> used to return false regardless the cache mode. But now it returns true</div>
<div class="ContentPasted0">>>>> if its cache mode is neither UC nor WT.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Yes, that was half of the motivation of the refactory. Before the PAT</div>
<div class="ContentPasted0">>>> index series code was like this:</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> return !(obj->cache_level == I915_CACHE_NONE ||</div>
<div class="ContentPasted0">>>> obj->cache_level == I915_CACHE_WT);</div>
<div class="ContentPasted0">>>> So kernel knew it needs to flush only if caching mode is neither UC or WT.</div>
<div class="ContentPasted0">>>> With the PAT index series you changed it to:</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||</div>
<div class="ContentPasted0">>>> i915_gem_object_has_cache_level(obj, I915_CACHE_WT));</div>
<div class="ContentPasted0">>>> And as i915_gem_object_has_cache_level was changed to always return true</div>
<div class="ContentPasted0">>>> if PAT was set by user, that made the check meaningless for such objects.</div>
<div class="ContentPasted0">>></div>
<div class="ContentPasted0">>> But the point is that the KMD should not flush the cache for objects
</div>
<div class="ContentPasted0">>> with PAT set by user because UMD is handling the cache conherency.
</div>
<div class="ContentPasted0">>> That is the design. Well doing so wouldn't cause functional issue, but
</div>
<div class="ContentPasted0">>> it's unecessary and might have performance impact.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Not all i915_gem_object_has_cache_level() are even about flushing the cache</div>
<div class="ContentPasted0">> and if the kernel doesn't know what is behind a PAT index</div>
<div class="ContentPasted0">> (i915_gem_object_has_cache_level lies by always returning true) are we 100%</div>
<div class="ContentPasted0">> sure everything is functionally correct?</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> flush_write_domain() for instance uses it to determine whether to set
</div>
<div class="ContentPasted0">> obj->cache_dirty after GPU activity. How does the UMD manage that?</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Then use_cpu_reloc(). Another pointless/misleading question.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Finally vm_fault_gtt() rejects access based on it.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Perhaps the question is moot since the set pat extension is restricted to</div>
<div class="ContentPasted0">> MTL so some other conditions used in the above checks, like HAS_LLC and such,</div>
<div class="ContentPasted0">> make for no practical difference. Even if so, what if the extension was allowed</div>
<div class="ContentPasted0">> on other platforms as it was the plan until it was discovered there is no</div>
<div class="ContentPasted0">> userspace code for other platforms. Would the plan work on all platforms? And</div>
<div class="ContentPasted0">> even if it would I think the implementation is very non-obvious.</div>
<div class="ContentPasted0">></div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">Understand your point, perhaps we should let i915_gem_object_has_cache_mode()</div>
<div class="ContentPasted0">do what it supposed to do, and add a separate check for obj->pat_set_by_user</div>
<div class="ContentPasted0">in functions like gpu_write_needs_clflush(), use_cpu_reloc(), etc. Anyway,</div>
<div class="ContentPasted0">the design is to let UMD handle coherency for objects with pat set by user.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>> With my refactoring it becomes meaningful again and restores to the
</div>
<div class="ContentPasted0">>>> original behaviour. That's the intent at least.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>>> bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>>> @@ -255,9 +249,9 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)</div>
<div class="ContentPasted0">>>>>> }</div>
<div class="ContentPasted0">>>>>></div>
<div class="ContentPasted0">>>>>> /**</div>
<div class="ContentPasted0">>>>>> - * i915_gem_object_set_cache_level - Changes the cache-level of an object across all VMA.</div>
<div class="ContentPasted0">>>>>> + * i915_gem_object_set_cache - Changes the cache-level of an object across all VMA.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">[...]</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>>>> - if (i915_gem_object_has_cache_level(obj, cache_level)) </div>
<div class="ContentPasted0">>>>>> + ret = i915_cache_find_pat(i915, cache);</div>
<div class="ContentPasted0">>>>>> + if (ret < 0) {</div>
<div class="ContentPasted0">>>>>> + struct drm_printer p =</div>
<div class="ContentPasted0">>>>>> + drm_err_printer("Attempting to use unknown caching mode ");</div>
<div class="ContentPasted0">>>>>> + </div>
<div class="ContentPasted0">>>>>> + i915_cache_print(&p, cache);</div>
<div class="ContentPasted0">>>>>> + drm_puts(&p, "!\n");</div>
<div class="ContentPasted0">>>>>> +</div>
<div class="ContentPasted0">>>>>> + return -EINVAL;</div>
<div class="ContentPasted0">>>>>> + } else if (ret == obj->pat_index) {</div>
<div class="ContentPasted0">>>>>> return 0;</div>
<div class="ContentPasted0">>>>> We will have to do this conversion and checking again later in this</div>
<div class="ContentPasted0">>>>> function when calling i915_gem_object_set_cache_coherency().</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Yes the double lookup part is a bit naff. It is not super required </div>
<div class="ContentPasted0">>>> apart for validating internal callers (could be a debug build only</div>
<div class="ContentPasted0">>>> feature perhaps) and to see if PAT index has changed and so whether </div>
<div class="ContentPasted0">>>> it needs to call i915_gem_object_wait before proceeding to</div>
<div class="ContentPasted0">>>> i915_gem_object_set_cache_coherency...</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>> My thought was to simply remove the use of cache_level/cache and replace</div>
<div class="ContentPasted0">>>>> it with pat_idex. Conversion from cache modes to pat index should be done</div>
<div class="ContentPasted0">>>>> before calling any function used to consume cache_level/cache.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> ... I could probably call the setter which takes PAT index instead of</div>
<div class="ContentPasted0">>>> i915_gem_object_set_cache_coherency few lines below. That would skip the</div>
<div class="ContentPasted0">>>> double lookup and make you happy(-ier)?</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>> Do you see any problem just letting these functions take pat_index as
</div>
<div class="ContentPasted0">>> the second argument? These functions are currently called with a
</div>
<div class="ContentPasted0">>> constant cache_level/mode, if we have INTEL_INFO(i915)->pat_uc/wt/wb
</div>
<div class="ContentPasted0">>> set properly, using pat index makes no difference, right?</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Which ones?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">i915_gem_object_set_cache_level() and i915_gem_object_set_cache_coherency()</div>
<div class="ContentPasted0">are both being called with cache_level as of now. That is not necessary if</div>
<div class="ContentPasted0">platform specific INTEL_INFO(i915)->pat_uc/wt/wb are there. we can simply</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">s/I915_CACHE_NONE/INTEL_INFO(i915)->pat_uc</div>
<div class="ContentPasted0">s/I915_CACHE_WT/INTEL_INFO(i915)->pat_wt</div>
<div class="ContentPasted0">s/I915_CACHE_LLC/INTEL_INFO(i915)->pat_wb</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">[...]</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>>>> if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB))</div>
<div class="ContentPasted0">>>>> This looks wrong, at least for MTL. Prior to MTL the GPU automatically
</div>
<div class="ContentPasted0">>>>> snoop CPU cache, but from MTL onward you have to specify if WB or
</div>
<div class="ContentPasted0">>>>> WB + 1-WAY COH is needed. And for KMD, cacheable mode means WB +
</div>
<div class="ContentPasted0">>>>> 1-WAY COH for MTL to keep the behavior consistent.</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> This used to be taken care of by i915_gem_get_pat_index() call.
</div>
<div class="ContentPasted0">>>>> With that being replaced by i915_cache_find_pat(), you would need
</div>
<div class="ContentPasted0">>>>> to do something there.</div>
<div class="ContentPasted0">>>>> But, without cachelevel_to_pat[], you might end up hard coding
</div>
<div class="ContentPasted0">>>>> something directly in the function, and that is platform
</div>
<div class="ContentPasted0">>>>> dependent. hmm..., I don't really like this idea.</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> That's why I commented in v1 that we should use </div>
<div class="ContentPasted0">>>>> INTEL_INFO(i915)->pat_uc/wb/wt instead of enum i915_cache_level or
</div>
<div class="ContentPasted0">>>>> i915_cache_t.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> I think I get it. I hope so.. So if I made the tables like this:</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> #define LEGACY_CACHE_MODES \</div>
<div class="ContentPasted0">>>> .cache_modes = { \</div>
<div class="ContentPasted0">>>> [0] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">>>> [1] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">>>> [2] = _I915_CACHE(WB, COH1W | L3), \ // 2way??</div>
<div class="ContentPasted0">>>> [3] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">>>> }</div>
<div class="ContentPasted0">>>> #define GEN12_CACHE_MODES \</div>
<div class="ContentPasted0">>>> .cache_modes = { \</div>
<div class="ContentPasted0">>>> [0] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">>>> [1] = I915_CACHE(WC), \ </div>
<div class="ContentPasted0">>>> [2] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">>>> [3] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">>>> }</div>
<div class="ContentPasted0">>>> #define MTL_CACHE_MODES \</div>
<div class="ContentPasted0">>>> .cache_modes = { \</div>
<div class="ContentPasted0">>>> [0] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>This was a brain fart, should have just been WB.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>>> [1] = I915_CACHE(WT), \</div>
<div class="ContentPasted0">>>> [2] = I915_CACHE(UC), \</div>
<div class="ContentPasted0">>>> [3] = _I915_CACHE(WB, COH1W), \</div>
<div class="ContentPasted0">>>> [4] = _I915_CACHE(WB, COH2W), \</div>
<div class="ContentPasted0">>>> And made i915->pat_wc look up _I915_CACHE(WB, COH1W) would that work?</div>
<div class="ContentPasted0">>>> Hmm and also all "has cache level" call sites would need to look
</div>
<div class="ContentPasted0">>>> not just for WB but WB+COH1W.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Would it work? Too ugly?</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>> I don't think this would work. The cache_modes needs to be aligned
</div>
<div class="ContentPasted0">>> with BSpec, otherwise checkings for </div>
<div class="ContentPasted0">>> INTEL_INFO(i915)->cache_modes[obj->pat_index] might become invalid.
</div>
<div class="ContentPasted0">>> Also, COH1W/2W were not even there for platforms prior to MTL.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Not sure what would become invalid?</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">What if we want to check for a particular pat_index whether it means</div>
<div class="ContentPasted0">cached or uncached, whether it's 1-way coherent or not? if the cache_modes[]</div>
<div class="ContentPasted0">misaligned with bspec, then we would fail such check.
</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">> COH1W/2W are perhaps names associated</div>
<div class="ContentPasted0">> with MTL - but is Gen12 PAT 0 identical in behaviour to PAT 3 or PAT 4 on</div>
<div class="ContentPasted0">> MTL? If yes then we can introduce an i915 specific name for that coherency</div>
<div class="ContentPasted0">> mode and apply it to both platforms.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>> I still think setting INTEL_INFO(i915)->pat_uc/wt/wb is the best solution.</div>
<div class="ContentPasted0">>> With that we can also eliminate the use of I915_CACHE({UC|WT|WB}).</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> How for the call sites which are asking about caching mode characteristics?</div>
<div class="ContentPasted0">> We can't ask if something has PAT index X from the source code since that is</div>
<div class="ContentPasted0">> platform dependent.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">We can compare pat index directly for exact match. Even for the case we just</div>
<div class="ContentPasted0">want to distinguish cached or uncached, we can check the bit field of</div>
<div class="ContentPasted0">INTEL_INFO(i915)->cache_modes[obj->pat_index].</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">>>>>> +357,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>>> switch (args->caching) {</div>
<div class="ContentPasted0">>>>>> case I915_CACHING_NONE:</div>
<div class="ContentPasted0">>>>>> - level = I915_CACHE_NONE;</div>
<div class="ContentPasted0">>>>>> + cache = I915_CACHE(UC);</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> For code like this, my thought was </div>
<div class="ContentPasted0">>>>> - level = I915_CACHE_NONE;</div>
<div class="ContentPasted0">>>>> + pat_index = INTEL_INFO(i915)->pat_uc;</div>
<div class="ContentPasted0">>>>> And later the set_cache call should take pat_index as argument instead</div>
<div class="ContentPasted0">>>>> of cache mode.</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>>> break;</div>
<div class="ContentPasted0">>>>>> case I915_CACHING_CACHED:</div>
<div class="ContentPasted0">>>>>> /*</div>
<div class="ContentPasted0">>>>>> @@ -367,10 +369,10 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,</div>
<div class="ContentPasted0">>>>>> if (!HAS_LLC(i915) && !HAS_SNOOP(i915))</div>
<div class="ContentPasted0">>>>>> return -ENODEV; </div>
<div class="ContentPasted0">>>>>> - level = I915_CACHE_LLC;</div>
<div class="ContentPasted0">>>>>> + cache = I915_CACHE(WB);</div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>> - level = I915_CACHE_LLC;</div>
<div class="ContentPasted0">>>>> + pat_index = INTEL_INFO(i915)->pat_wb;</div>
<div class="ContentPasted0">>>>> This should take care of the WB + 1-WAY COH issue for MTL mentioned above,</div>
<div class="ContentPasted0">>>>> assuming the i915_cache_init() set pat_wb properly, and the</div>
<div class="ContentPasted0">>>>> i915_gem_object_set_cache() consumes pat_index instead of cache mode.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> That works too yes.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>>></div>
<div class="ContentPasted0">>>>>> break;</div>
<div class="ContentPasted0">>>>>> case I915_CACHING_DISPLAY:</div>
<div class="ContentPasted0">>>>>> - level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;</div>
<div class="ContentPasted0">>>>>> + cache = HAS_WT(i915) ? I915_CACHE(WT) : I915_CACHE(UC);</div>
<div class="ContentPasted0">>>>>> break;</div>
<div class="ContentPasted0">>>>>> default:</div>
<div class="ContentPasted0">>>>>> return -EINVAL;</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>> [...]</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>>>>></div>
<div class="ContentPasted0">>>>>> bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>>> @@ -215,6 +222,7 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)</div>
<div class="ContentPasted0">>>>>> /*</div>
<div class="ContentPasted0">>>>>> * Always flush cache for UMD objects at creation time.</div>
<div class="ContentPasted0">>>>>> */</div>
<div class="ContentPasted0">>>>>> + /* QQQ/FIXME why? avoidable performance penalty? */</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>> This is needed because UMD's assume zero-initialized BO's are really
</div>
<div class="ContentPasted0">>> zero'ed out before getting the handles to the BO's (See VLK-46522).
</div>
<div class="ContentPasted0">>> Otherwise UMD's could read stale data, thus cause security issues.</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Hah this comes exactly to my point from above. So it looks my propsal</div>
<div class="ContentPasted0">> would exactly solve this. Because i915 would know the caching mode and</div>
<div class="ContentPasted0">> know to flush if not coherent. And it would be better than flushing for</div>
<div class="ContentPasted0">> every obj->pat_set_by_user because that approach pessimistically flushes</div>
<div class="ContentPasted0">> even when it is not needed.</div>
<div><br class="ContentPasted0">
</div>
<div class="ContentPasted0">hmm..., This is only called at BO creation time. We do need to clflush all</div>
<div class="ContentPasted0">objects with pat_set_by_user, otherwise the user would get access to stale</div>
<div class="ContentPasted0">data.</div>
<div><br class="ContentPasted0">
</div>
<div>-Fei</div>
<div><br>
</div>
<div class="ContentPasted0">> Regards,</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">> Tvrtko</div>
<div class="ContentPasted0">></div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>>>>> if (obj->pat_set_by_user)</div>
<div class="ContentPasted0">>>>>> return true;</div>
<div class="ContentPasted0">>>>>></div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>> [...]</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>>>>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>>>> index dbfe6443457b..f48a21895a85 100644</div>
<div class="ContentPasted0">>>>>> --- a/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>>>> +++ b/drivers/gpu/drm/i915/intel_device_info.h</div>
<div class="ContentPasted0">>>>>> @@ -27,6 +27,8 @@</div>
<div class="ContentPasted0">>>>>></div>
<div class="ContentPasted0">>>>>> #include <uapi/drm/i915_drm.h></div>
<div class="ContentPasted0">>>>>></div>
<div class="ContentPasted0">>>>>> +#include "i915_cache.h"</div>
<div class="ContentPasted0">>>>>> +</div>
<div class="ContentPasted0">>>>>> #include "intel_step.h"</div>
<div class="ContentPasted0">>>>>></div>
<div class="ContentPasted0">>>>>> #include "gt/intel_engine_types.h"</div>
<div class="ContentPasted0">>>>>> @@ -243,8 +245,8 @@ struct intel_device_info { >>> */ </div>
<div class="ContentPasted0">>>>>> const struct intel_runtime_info __runtime;</div>
<div class="ContentPasted0">>>>>> - u32 cachelevel_to_pat[I915_MAX_CACHE_LEVEL];</div>
<div class="ContentPasted0">>>>>> - u32 max_pat_index;</div>
<div class="ContentPasted0">>>>>> + i915_cache_t cache_modes[9];</div>
<div class="ContentPasted0">>>>> I was commenting on the array size here. It's probably better to make</div>
<div class="ContentPasted0">>>>> it 16 because there are 4 PAT index bits defined in the PTE. Indices</div>
<div class="ContentPasted0">>>>> above max_pat_index are not used, but theoretically new mode could be</div>
<div class="ContentPasted0">>>>> added. Well, it's up to you, not likely to happen anyway.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Ah okay. I am not too concerned. Compiler will let us know if it happens.</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Unrelated to this comment - what about i915_gem_object_can_bypass_llc?</div>
<div class="ContentPasted0">>>> Could we do better (less pessimistic) with something like my approach and</div>
<div class="ContentPasted0">>>> so maybe penalize MTL less?</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>> The problem is that, for the BO's managed by UMD's, the KMD doesn't
</div>
<div class="ContentPasted0">>> know whether they are going to be mapped as cacheable or uncacheable
</div>
<div class="ContentPasted0">>> on the CPU side. The PAT index controls GPU access only. That's why we
</div>
<div class="ContentPasted0">>> make sure all BO's with PAT set by UMD (which means UMD will take
</div>
<div class="ContentPasted0">>> control and managing the</div>
<div class="ContentPasted0">>> coherency) are clflush'ed.</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>> -Fei</div>
<div class="ContentPasted0">>> </div>
<div class="ContentPasted0">>>> Regards,</div>
<div class="ContentPasted0">>>></div>
<div class="ContentPasted0">>>> Tvrtko</div>
<br>
</div>
</body>
</html>