[Intel-gfx] [PATCH] drm/i915: Avoid using mappable space for relocation processing through the CPU

Tue Nov 29 18:15:02 CET 2011

On Tue, 29 Nov 2011 18:03:53 +0100, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Tue, Nov 29, 2011 at 04:48:15PM +0000, Chris Wilson wrote:
> > On Tue, 29 Nov 2011 16:34:41 +0100, Daniel Vetter <daniel at ffwll.ch> wrote:
> > > On Tue, Nov 29, 2011 at 03:12:40PM +0000, Chris Wilson wrote:
> > > > We try to avoid writing the relocations through the uncached GTT, if the
> > > > buffer is currently in the CPU write domain and so will be flushed out to
> > > > main memory afterwards anyway. Also on SandyBridge we can safely write
> > > > to the pages in cacheable memory, so long as the buffer is LLC mapped.
> > > > In either of these caches, we therefore do not need to force the
> > > > reallocation of the buffer into the mappable region of the GTT, reducing
> > > > the aperture pressure.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > 
> > > The error_state capture currently relies on us pinning buffers as mappable
> > > when they contain relocations (and userspace always submitting a
> > > batchbuffers containing relocations). You break that guarantee without
> > > fixing up the error capture code. Otherwise I like this.
> > 
> > I may have sent that patch a little earlier. ;-)
> 
> Yes, I know. My gripe is that this will reduce our chances of successfully
> capturing the error_state, because now we expect to hit that case in the
> error capture code whereas up to now it would have been a bug somewhere.
> So either
> - fixup the error_capture to fall back to cpu reads (needs the usual
>   clflush dance if the object is not llc cached)
> - or drop the pin mappable change in this patch.

Ah you forget, I volunteered you to do the error-state capture from a
workqueue so that we could add further complexity... 

We would then be able to allocate enough memory to capture auxiliary
buffers as well, etc.

In the meantime, the paths that hit this code are during warmup (before
any batches have been retired into the userspace bo cache), slow steady
state behaviour (when the caches are being reaped and repopulated), and
when thrashing the hardware. It also requires that the whole mappable
range had already been allocated.

Whilst not negligible, the risk imo is small and all will be solved with
the next generation i915_capture_error().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre