[Intel-gfx] [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.

Wed Jan 11 12:40:05 UTC 2017

Chris Wilson <chris at chris-wilson.co.uk> writes:

> On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote:
>> Daniel Vetter <daniel at ffwll.ch> writes:
>> 
>> > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
>> >> The WaDisableLSQCROPERFforOCL workaround has the side effect of
>> >> disabling an L3SQ optimization that has huge performance implications
>> >> and is unlikely to be necessary for the correct functioning of usual
>> >> graphic workloads.  Userspace is free to re-enable the workaround on
>> >> demand, and is generally in a better position to determine whether the
>> >> workaround is necessary than the DRM is (e.g. only during the
>> >> execution of compute kernels that rely on both L3 fences and HDC R/W
>> >> requests).
>> >> 
>> >> The same workaround seems to apply to BDW (at least to production
>> >> stepping G1) and SKL as well (the internal workaround database claims
>> >> that it does for all steppings, while the BSpec workaround table only
>> >> mentions pre-production steppings), but the DRM doesn't do anything
>> >> beyond whitelisting the L3SQCREG4 register so userspace can enable it
>> >> when it sees fit.  Do the same on KBL platforms.
>> >> 
>> >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>> >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>> >> This is followed by a regression of 35% and 10% respectively for the
>> >> same benchmarks and platform caused by my recent patch series
>> >> switching userspace to use the dataport constant cache instead of the
>> >> sampler to implement uniform pull constant loads, which caused us to
>> >> hit more heavily the L3 cache (and on platforms other than KBL had the
>> >> opposite effect of improving performance of the same two benchmarks).
>> >> The overall effect on KBL of this change combined with the recent
>> >> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
>> >> was affected by the constant cache changes (though it improved as it
>> >> did on other platforms rather than regressing), but is not
>> >> significantly affected by this patch (with statistical significance of
>> >> 5% and sample size 20).
>> >> 
>> >> v2: Drop some more code to avoid unused variable warning.
>> >> 
>> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>> >> Signed-off-by: Francisco Jerez <currojerez at riseup.net>
>> >> Cc: Eero Tamminen <eero.t.tamminen at intel.com>
>> >> Cc: Jani Nikula <jani.nikula at intel.com>
>> >> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>> >> Cc: beignet at lists.freedesktop.org
>> >
>> > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
>> > for compute kernels? Are the patches for mesa compute/beignet
>> > ready&reviewed?
>> 
>> This is explicit setting on kbl/E0 only. So one could argue
>> that unless they filter based on PCI-IDs, things would already
>> blow up across the skl/kbl population, if they forgot
>> to set it. The whitelisting is in place and looks sane
>> so this E0 exception is a wart that got in by me reading wa
>> database slavishly without thinking.
>
> Add Fixes then?

Fixes: a4106a782d11 ("drm/i915/gen9: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround")

Looking at beignet source, they don't care about this register/bit (yet).

Also we need to get rid of KBL_REVID_E0 as there is no such thing.
Oddly kbl doesnt follow the logical x0->rev mapping but leave
holes. Were they afraid of running out of revids or what...

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre