[Intel-gfx] [PATCH 1/6] drm/i915: Implement L3 partitioning set-up from the workaround list.

Thu Oct 8 02:17:11 PDT 2015

On Wed, Oct 07, 2015 at 04:30:35PM +0300, Francisco Jerez wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > On Wed, Oct 07, 2015 at 02:44:00PM +0300, Francisco Jerez wrote:
> >> This programs the L3 configuration based on the sizes given for each
> >> partition as arguments.  The relevant register writes are added to the
> >> workaround list so that they are re-applied to each context while it's
> >> initialized, preventing state leaks from other userspace processes
> >> which may have modified the L3 partitioning from its boot-up state,
> >> since all relevant registers are part of the software and hardware
> >> command checker whitelists.
> >> 
> >> Some userspace clients (DDX and current versions of Mesa not patched
> >> with my L3 partitioning series [1]) assume that the L3 configuration,
> >> in particular the URB size, comes up in certain state when a context
> >> is created, but nothing in the kernel guarantees this assumption, the
> >> registers that control the partitioning of the L3 cache were being
> >> left untouched.
> >> 
> >> Note that the VLV_L3SQCREG1_SQGHPCI_DEFAULT macro defined here has the
> >> same value as the previously defined VLV_B0_WA_L3SQCREG1_VALUE, but
> >> the latter will be removed in a future commit.
> >> 
> >> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-September/093550.html
> >
> > Merge this with 1 + 5 (or was it 4?)- not only does it introduce a
> > temporary unused function, but we have to jump between patches to see
> > whether the function is safe (especially given those BUGs), and you add
> > all w/a in the same patch so bisection is not improved.
> >
> What do you want me to merge it with?  This *is* PATCH 1, and PATCH 5 is
> largely unrelated.  I wouldn't have any objection to squashing PATCH 4
> into this commit though.

1+4. That just highlights the issue of having two very tightly coupled
patches arbitrary split - the split here doesn't even improve bisection.

> > Looking at the function itself, I am not convinced that it actually adds
> > anything over calling setting up the WA from the vfuncs - at least the
> > bdw/gen7 split is redundant (we split at the vfunc then call one
> > function where we replit again, but with nasty constraints on the
> > interface of the function for different gen, it's not a function for the
> > faint of heart).
> >
> The constraints are just the hardware's constraints on the L3
> partitioning.  The function is meant to implement the details of
> programming the L3 configuration, which is different for different gens
> even though the overall structure of the L3 partitioning is the same.
> Of course the constraints set by specific hardware on the partition
> sizes cannot be abstracted out.
> 
> I guess that splitting this out into two functions (one for gen7 and
> another for gen8) wouldn't hurt much, but open-coding the function in
> all its uses (5) would involve duplicating quite some code.

Having slept on it, and I think a gen7/gen8 split is best.

My issue with the current combined function is that the interface is
different based on generation, which says to me that it is multiple
functions. It scores very low on the hard-to-misuse ranking

http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html

And instead of the multiline multiple ternary operators, just have
an if-else-chain for hsw/vlv/ivb (for setting the cmd value).

> Assuming I split the function into gen7 and gen8 variants, would you
> like them to implement a common consistent interface (i.e. the same
> prototype that my init_l3_partitioning_workarounds() implements), or
> would you like the variant for each gen to implement an ad-hoc interface
> according to the partition configs actually needed on each gen?  I
> suspect the former would be cleaner.

No, the BUGs tell me that the registers do not have a consistent
interface.

> >> +static int init_l3_partitioning_workarounds(struct intel_engine_cs *ring,
> >> +					    unsigned int n_urb,
> >> +					    unsigned int n_ro,
> >> +					    unsigned int n_dc,
> >> +					    unsigned int n_all)
> >> +{
> >> +	struct drm_device *dev = ring->dev;
> >> +	struct drm_i915_private *dev_priv = dev->dev_private;
> >> +
> >> +	if (INTEL_INFO(dev)->gen >= 8) {
> >
> > Why use dev here? You already have dev_priv, so why chase the pointer
> > again?
> 
> Sorry?  Does INTEL_INFO() take a drm_device or a drm_i915_private
> pointer as argument?  The two types don't seem to be related by an
> inheritance relationship or anything similar AFAICT, and most other uses
> in this file seem to pass a drm_device as argument even where a
> drm_i915_private is available.

drm_i915_private is our primary internal pointer. We can shave over 8KiB
of object code by removing the pointer dance we have from still passing
around drm_device and jumping to drm_i915_private. New code is supposed
to be using drm_i915_private consistently for internal interfaces (we
can then clearly see the external entry points).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre