[Intel-gfx] [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.

Thu Jan 12 13:58:47 UTC 2017

Francisco Jerez <currojerez at riseup.net> writes:

> Daniel Vetter <daniel at ffwll.ch> writes:
>
>> On Wed, Jan 11, 2017 at 12:24:59PM +0000, Chris Wilson wrote:
>>> On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote:
>>> > Daniel Vetter <daniel at ffwll.ch> writes:
>>> > 
>>> > > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
>>> > >> The WaDisableLSQCROPERFforOCL workaround has the side effect of
>>> > >> disabling an L3SQ optimization that has huge performance implications
>>> > >> and is unlikely to be necessary for the correct functioning of usual
>>> > >> graphic workloads.  Userspace is free to re-enable the workaround on
>>> > >> demand, and is generally in a better position to determine whether the
>>> > >> workaround is necessary than the DRM is (e.g. only during the
>>> > >> execution of compute kernels that rely on both L3 fences and HDC R/W
>>> > >> requests).
>>> > >> 
>>> > >> The same workaround seems to apply to BDW (at least to production
>>> > >> stepping G1) and SKL as well (the internal workaround database claims
>>> > >> that it does for all steppings, while the BSpec workaround table only
>>> > >> mentions pre-production steppings), but the DRM doesn't do anything
>>> > >> beyond whitelisting the L3SQCREG4 register so userspace can enable it
>>> > >> when it sees fit.  Do the same on KBL platforms.
>>> > >> 
>>> > >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>>> > >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>>> > >> This is followed by a regression of 35% and 10% respectively for the
>>> > >> same benchmarks and platform caused by my recent patch series
>>> > >> switching userspace to use the dataport constant cache instead of the
>>> > >> sampler to implement uniform pull constant loads, which caused us to
>>> > >> hit more heavily the L3 cache (and on platforms other than KBL had the
>>> > >> opposite effect of improving performance of the same two benchmarks).
>>> > >> The overall effect on KBL of this change combined with the recent
>>> > >> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
>>> > >> was affected by the constant cache changes (though it improved as it
>>> > >> did on other platforms rather than regressing), but is not
>>> > >> significantly affected by this patch (with statistical significance of
>>> > >> 5% and sample size 20).
>>> > >> 
>>> > >> v2: Drop some more code to avoid unused variable warning.
>>> > >> 
>>> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>>> > >> Signed-off-by: Francisco Jerez <currojerez at riseup.net>
>>> > >> Cc: Eero Tamminen <eero.t.tamminen at intel.com>
>>> > >> Cc: Jani Nikula <jani.nikula at intel.com>
>>> > >> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>>> > >> Cc: beignet at lists.freedesktop.org
>>> > >
>>> > > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
>>> > > for compute kernels? Are the patches for mesa compute/beignet
>>> > > ready&reviewed?
>>> > 
>>> > This is explicit setting on kbl/E0 only. So one could argue
>>> > that unless they filter based on PCI-IDs, things would already
>>> > blow up across the skl/kbl population, if they forgot
>>> > to set it. The whitelisting is in place and looks sane
>>> > so this E0 exception is a wart that got in by me reading wa
>>> > database slavishly without thinking.
>>> 
>>> Add Fixes then?
>>
>> Yeah, cc: stable would be good to make sure it shows up in all supported
>> kernels, fast. Otherwise we'll get some good wtf bug reports.
>
> Agreed -- It would be nice for this to get to stable kernel branches.
>

Added Fixes and stable tags and pushed to drm-intel-next-queued.

Thanks for patch,
-Mika

>> -Daniel
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch