[PATCH] Remove spurious use of MI_SET_CONTEXT.

Mon Dec 3 15:50:33 PST 2007

Kristian Høgsberg wrote:
> On Dec 3, 2007 4:20 PM, Keith Whitwell <keith at tungstengraphics.com> wrote:
> ...
>>>> Just emit MI_SET_CONTEXT in place of the current batchbuffer preamble.
>>>> Nothing can interrupt a batchbuffer half-way through in our system.
>>> MI_SET_CONTEXT has to go in a ring buffer, it can't go in a batch
>>> buffer.
>> I had trouble understanding this -- but indeed the address for the
>> context is a *physical* address in system memory...  Fabulous.
> 
> No, it can be a GTT offset too, but it has to present when the context
> is swapped out.  That is not under our control, because it happens
> when the next context is swapped in.  So it has to be a no-evict
> buffer.  We don't want to fragment the GTT with no-evict buffers for
> this stuff, so the physical address option looks more interesting.

OK.

>> That makes the whole scheme less interesting to me as it does rely on
>> the kernel doing the state management, and at this point I don't really
>> see this as being hugely worthwhile.
> 
> What did you have in mind?  The context buffers are opaque, it's just
> a cookie you hold on to for the hardware.  You can't peek into them and
> manipulate the state anyway, so it doesn't seem to me that there's a
> big difference in whether the kernel or user space track the state
> buffers and issue the MI_SET_CONTEXT.

I don't want to peek into the contexts (though if you do, you'll find 
they aren't all that opaque).

I'm just saying I'd like to avoid adding complexity to the kernel module 
unless there's real evidence that 1) there's a performance gain and 2) 
that we can't achieve the same effect from userspace.

>> Before you do this, it would be worthwhile to disable the preamble emit
>> from the i915 driver and see if there is any measurable benefit.  As
>> long as you only run a single context and disable EXA, nothing bad
>> should happen.
> 
> I'll give that a try, and the thing to compare against is state
> emission on every batch buffer, which is what we'll be doing without
> the DRI lock.  

It's how the driver works at present.

> No matter how it turns out, I doubt it will be a
> performance regression, and will help with the EXA state tracking too.

The smaller the average batchbuffer size is, the more it's likely to 
help.  If exa is doing large numbers of tiny command streams, it could 
be a win.  3d tends to do large sequences of statechanges and rendering 
commands, so optimizing one away per batchbuffer isn't likely to be a 
huge win.

The i915 swz code incorporates a state-differencer in the 3d driver that 
  eliminates a large number of redundant state changes within the 
command stream in normal rendering -- it didn't seem to make a huge 
difference to performance in classic mode rendering.

Most often it seems the hardware is gated on some rasterization 
bottleneck - either pure bandwidth in and out of the chip, or execution 
resources for the pixel shader.

Keith