[Intel-gfx] [PATCH 2/2] intel: Use I915_EXEC_NO_RELOC when available

Daniel Vetter daniel at ffwll.ch
Wed Jan 21 01:11:32 PST 2015


On Tue, Jan 20, 2015 at 10:10:57PM -0800, Kristian Høgsberg wrote:
> On Tue, Jan 20, 2015 at 1:46 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > On Tue, Jan 20, 2015 at 12:53:35PM -0800, Kristian Høgsberg wrote:
> >> On Tue, Jan 20, 2015 at 12:42 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
> >> > My idea for all this would have been to create a per-thread execbuf
> >> > relocation context with a hashtab to map buffer pointers to execbuf index
> >> > and a bunch of arrays to prepare the reloc entry tables. If you do it
> >> > correctly all the per-reloc work should be a O(1) streaming writes to a
> >> > few arrays plus the hashtab lookup. With no code run at execbuf time
> >> > (except the ioctl ofc). Even the libdrm_bo->presumed_offset update after
> >> > execbuf could be done lockless (as long as readers are careful to never
> >> > reload it by using something similar to the kernel's READ_ONCE macro).
> >> >
> >> > But that means a completely new reloc api, so a lot more work. Also I
> >> > think it only makes sense do that for drivers that really care about the
> >> > last bit of performance, and then do it within the driver so that there's
> >> > no constraints about abi.
> >>
> >> Indeed, I moved it into mesa so I could rework that. bo_emit_reloc()
> >> is showing up in profiles. The patch below along with NO_RELOC and
> >> HANDLE_LUT flags gives me 5-10% improement on CPU bound benchmarks, so
> >> it's certainly worth it.  I'm skeptical that a hashtable lookup per
> >> reloc emit is going to perform better than just fixing up the relocs
> >> at execbuf2 time though.  It would be nice to not do any work at ioctl
> >> time, but for that you need a very fast way to map from bo to
> >> per-thread bo state as you go. Maybe a per-thread array mapping from
> >> gem handle to exec_object could work...
> >>
> >> WIP Patch is here:
> >>
> >> http://cgit.freedesktop.org/~krh/mesa/commit/?h=b0e4ce7bbce2a79ad37d6de460af88b9581ea1d7
> >
> > Hmm, that is actually pretty neat. My idle thought was to create
> > per-context batchmgr with its own view of the bo (to counter the
> > multithreaded free-for-all). In your patch, you neatly demonstrate that
> > you don't need per-context view of the bo, only of the relocations. And
> > it will make drm_intel_bo_emit_reloc() fixed cost, which should produce
> > most of your CPU overhead saving.
> >
> > However, I think if you do take it a step further with a batchmgr_bo,
> > you can make the drm_intel_bo_references() very cheap as well.
> 
> I did think about a per-thread/context bo wrapper that could track
> validation list index and presumed_offset, but the problem is that
> there's only one intel_mipmap_tree/intel_buffer_object struct to store
> it in.  We'd have to a per-thread wrapper for those too, and I'm not
> sure that's feasible.

Another thing to consider is that with full ppgtt reloc offset will be per
context. So as soon as you have multiple contexts with some shared bo you
most likely will always fallback to need_reloc=1 with your code. Not sure
whether we care.

And I guess your obj->flags = 0 is just because it's WIP? You'd need
something like

	if (relocs[i].write_domain)
		obj->flasg |= EXEC_OBJECT_WRITE;
	if (IS_GEN6 && relocs[i].write_domain == INSTRUCTION)
		obj->flasg |= EXEC_OBJECT_NEEDS_GTT;
	
before the continue.
-Daniel

> However, drm_intel_bo_reference/unreference() is showing up on the
> profiles now that relocations are cheaper, but I think the better way
> to make those cheaper is to move the ref count to the public struct
> and make the ref/unref inline functions (calling out to a destructor
> for the refcount==0 case, of course).  On that note, do you know why
> drm_intel_gem_bo_unreference() is so convoluted? What's the deal with
> the atomic_add_unless?
> 
> > Looks good.
> > -Chris
> >
> > --
> > Chris Wilson, Intel Open Source Technology Centre

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list