[Intel-gfx] [PATCH] drm/i915: Update write_domains on active list after flush.

Mon Feb 1 15:01:45 CET 2010

On Mon, Feb 01, 2010 at 01:06:49PM +0000, Chris Wilson wrote:
> On Mon, 1 Feb 2010 13:26:14 +0100, Daniel Vetter <daniel at ffwll.ch> wrote:
> > btw, I've thought a bit about why you want to do such crazy tricks with
> > flushes in the middle of batchbuffers.
> 
> If we were to be strict, then the 2D drivers would effectively have to
> emit one op per batch, as we often draw to a mask and then immediately use
> that in a composite operation. As such, we want to use pipelined flushes
> (rather than MI_FLUSH) so that we can actually utilize the GPU
> effectively. In the extreme, a sequence of 2D operations is still
> typically very short.

Gosh, I didn't know that batchbuffers for composite ops could be _that_
short. Of course, then some in-pipelined flushes are the way to go ...

[snip]

> It is easy enough to roughly halve the number of interrupt commands we
> make, but I haven't actually measured a performance difference to justify
> the patch. The critical thing is not to wait on the GPU in the first
> place. :)

I agree that as long as we still commonly stall, reducing interrupts may
even be counter-productive: We'd have to wait _longer_ because the gpu
would have to finish more work. At least on my i855 synchronous fence reg
management is the biggest offender in day-to-day use (composited kde4
desktop). I'll look into pipelined fence reg setup next. Looks like
MI_LOAD_REGISTER_IMM could help there.

Of course, there's also gtt access (due to fallbacks), which usually
completely stalls everything. But that's a worst-case scenario, anyway.

> > 3) Per execbuf overhead for the gpu due to the need to re-emit all the
> > necessary state (shaders, tex-units, ...). My idea to fix this:
> 
> [snip cookies]
> 
> Similar ideas here, but I was thinking of using hardware contexts and
> specifying in execbuffer2 the active context. Then the kernel can do
> save/restore from the ringbuffer as necessary.

Of course, we can transparently beef this up with nice hardware support
(like intel hw contexts). But after all, userspace should only know wheter
it has to do a full state re-emit or not. And the cookie thingy is enough
as a kernel<->userspace interface, I think.

> As for the problem of a GPU hog, we need an in-kernel GPU scheduler to
> prevent such DoS. Be evil and construct a circular chain of batch
> buffer(s).

Can you really preemptively kick a batchbuffer from the gpu? I've strolled
through the docs and haven't found anything. IIRC radeon can't do this,
too. So for a gpu scheduler we're essentially stuck with punishing
mis-behaving task (by forcing them to wait longer) only _after_ they
issued at least one batchbuffer that took too long to complete. Or if we
know how to reset the chip, we could try that after a long enough
time-out. Another nice problem is resuming batchbuffer execution ... ;)

-Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48