[Intel-gfx] [PATCH] drm/i915: Update MOCS settings for gen 9

David Weinehall david.weinehall at linux.intel.com
Thu May 4 14:47:21 UTC 2017


On Thu, May 04, 2017 at 10:35:33AM +0200, Arkadiusz Hiler wrote:
> On Thu, Apr 27, 2017 at 05:23:16PM +0100, Chris Wilson wrote:
> > On Thu, Apr 27, 2017 at 06:30:42PM +0300, David Weinehall wrote:
> > > On Thu, Apr 27, 2017 at 04:55:20PM +0200, Arkadiusz Hiler wrote:
> > > > On Wed, Apr 26, 2017 at 06:00:41PM +0300, David Weinehall wrote:
> > > > > Add a bunch of MOCS entries for gen 9 that were missing from intel_mocs.
> > > > > Some of these are used by media-sdk; if these entries are missing
> > > > > the default will instead be to do everything uncached.
> > > > > 
> > > > > This patch improves media-sdk performance with up to 60%
> > > > > with the (admittedly synthetic) benchmarks we use in our nightly
> > > > > testing, without regressing any other benchmarks.
> > > > 
> > > > Hey David,
> > > > 
> > > > I am testing some of the extended MOCS with Mesa and the differences I
> > > > see fit in the margins of statistical error.
> > > > 
> > > > Odd, I thought, so to make sure I haven't messed up anything in the
> > > > process of compiling, setting LD_LIBRARY_PATH and benchmarking I turned
> > > > everything to UNCACHED - and I saw severe performance drop.
> > > > 
> > > > So here is the question it induced:
> > > > 
> > > > Have you used the "closest neighbour" from entries available or did you
> > > > defaulted to the UNCACHED ones? That could be the culprit.
> > > > 
> > > > Note: I have tested MOCS for VB and Render Target only, and only in a
> > > > few synthetic cases - it will require much more fine-tuning and
> > > > benchmarking before any final conclusions.
> > > 
> > > As I mentioned in the commit message, the improvements only manifest
> > > themselves for media-sdk workloads (and presumably other workloads
> > > that uses the same hardware); if you see any performance regressions
> > > with these additional entries I'd be interested to know.
> > 
> > But what is being counter suggested is that their is no reason for these
> > mocs entries. If the sdk is just using mocs registers without first
> > programming them outside of the kernel abi, then it will be hitting
> > uncached memory - and then the only benefit is from simply enabling
> > cached access. The kernel ABI is minimalist for a reason, and we want to
> > know why we should be adding tables that we need to maintain forever
> > (bonus points for making that a consistent interface for hardware for
> > years to come).
> > -Chris
> 
> Thanks for rephrasing - that's exactly what I am concerned with.
> 
> Did you just use the MediaSDK as it is - meaning that MOCS entries
> beyond the set of the 3 we have defined had been naively utilized?
> 
> If that's the case it is probably the cause of the performance
> difference - everything beyond "the 3" means UNCACHED.
> 
> Can you try changing MediaSDK to only use entries that are already in?
> How the performance differs in that case?

We're benchmarking using upstream MediaSDK without changes, since that's
the only thing that's relevant. Customising benchmarks to get better
results isn't really an acceptable solution :)

Obviously fixing MediaSDK upstream is a different story, in case one of
the three pre-defined entries we have turns out to be the best possible
MOCS-settings for that workload.


Kind regards, David


More information about the Intel-gfx mailing list