[Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

Chris Wilson chris at chris-wilson.co.uk
Mon Jul 6 13:19:50 PDT 2015


Just skipping to interesting comments for the moment.

On Mon, Jul 06, 2015 at 12:34:00PM -0700, Kenneth Graunke wrote:
> While I really like this idea in principle, the current patch is rather
> huge, making it difficult to review; bisectability would also suffer.
> Would it be possible to split it up into several smaller patches?

First time I tried I ended up with a series of patches of equal churn
and I wasn't happy with the results. Honestly, I find mechanical noise
in patches easy to tune out, you quickly recognise the pattern and
deviations should stand out.

Some of the churn I think is important as it highlight the relative
emphasis on different coding patterns. In particular, how the relocation
must always emit the address in the batch, so in quite a few functions
the code had to be rearranged in order to make that happen
 
> One idea I had for how you might accomplish that is to introduce the
> brw_bo struct and related functions, but make it simply contain a
> drm_intel_bo * field, and fall back to the existing libdrm-based code.

Sure, what I had in mind was a changeover to a struct brw_bo that just
wrapped drm_intel_bo for typesafefty (so the compiler was there to catch
mistakes and missing pieces). I felt the patch was still about 500 lines
of mechcanical changes, 1000 lines removing the intel_batchbuffer.c and
2000 lines adding brw_bo (it has grown somewhat with the comments!). The
api to review was in one place, and the bits that required fine changes
(mostly intel_bufferobj.c) were equally obvious.

But what you really need is a slow progression in feature development of
brw_batch. Splitting the mechanical changes off with a touch more churn,
and even converting functions over to new API in stages is relatively
trivial compared to building up the feature set required to do
batchbuffer construction from scratch.

- relocations come first, as request tracking cannot happen with
integration into the relocations.
- execbuf
- request tracking
- mmap functions
- exporting busyness

Honestly though the design has never existed in a piecemeal fashion, it
has always been requests first and that drove the infrastructure
required to enable the request tracking.

Anyway splitting out the mechanical changes is worth it if it gets more
eyes onto brw_batch.c itself and some of its knock-on effects.

> That way, you could put all the mechanical renaming and refactoring
> across the entire code-base in one patch that should have no functional
> change.  You could then replace the contents or brw_bo and those
> functions with your new improved batch manager.
> 
> We've talked about moving away from libdrm for a while, so having our
> own functions and structures makes a lot of sense.
> 
> I'm also curious about the performance on non-LLC platforms (CHV or
> BYT)?  You've dropped a number of non-LLC paths - which is probably
> fine, honestly...they were always of dubious value - but it'd be nice
> to know the record the effects of this change on non-LLC in the commit
> message.

Ah, I didn't actually remove non-LLC paths, I added equivalent detiled
CPU access for non-LLC (strictly non cache coherent bo, incl scanouts) as
for LLC. So I guess you mean the few places that lost the brw->has_llc
differentiation?
 
> > @@ -279,11 +254,6 @@ retry:
> >     brw->ctx.NewDriverState = ~0ull;
> >     brw->no_depth_or_stencil = false;
> >     brw->ib.type = -1;
> > -
> > -   /* Flush the sampler cache so any texturing from the destination is
> > -    * coherent.
> > -    */
> > -   brw_emit_mi_flush(brw);
> 
> It looks like you moved this flush earlier, so it's in the section of
> code that retries?  That's probably reasonable.  Seems worth keeping the
> comment, and this could be done as a separate patch...

True. The reason why it has to be earlier is that I am much stricter on
having all BEGIN_BATCH/ADVANCE_BATCH chunks being inside a
brw_batch_begin/brw_batch_end - since brw_emit_mi_flush() itself doesn't
start a new brw_batch_begin section, it had to go in blorp's. The comment
was removed because we now have accurate dirty texture cache tracking and
that itself is not the reason wy flush has to exist. (The flush is for a
change in hiz mode iirc, or some similar change in GPU state that
requires a flush between primitives.)
 
> [snip]
> > +static void
> > +load_sized_register_mem(struct brw_context *brw,
> > +                        uint32_t reg,
> > +                        struct brw_bo *bo,
> > +                        uint32_t read_domains, uint32_t write_domain,
> > +                        uint32_t offset,
> > +                        int size)
> > +{
> > +   int i;
> > +
> > +   /* MI_LOAD_REGISTER_MEM only exists on Gen7+. */
> > +   assert(brw->gen >= 7);
> > +
> > +   if (brw->gen >= 8) {
> > +      BEGIN_BATCH(4 * size);
> > +      for (i = 0; i < size; i++) {
> > +         OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (4 - 2));
> > +         OUT_BATCH(reg + i * 4);
> > +         OUT_RELOC64(bo, read_domains, write_domain, offset + i * 4);
> > +      }
> > +      ADVANCE_BATCH();
> > +   } else {
> > +      BEGIN_BATCH(3 * size);
> > +      for (i = 0; i < size; i++) {
> > +         OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (3 - 2));
> > +         OUT_BATCH(reg + i * 4);
> > +         OUT_RELOC(bo, read_domains, write_domain, offset + i * 4);
> > +      }
> > +      ADVANCE_BATCH();
> > +   }
> > +}
> > +
> > +void
> > +brw_load_register_mem(struct brw_context *brw,
> > +                      uint32_t reg,
> > +                      struct brw_bo *bo,
> > +                      uint32_t read_domains, uint32_t write_domain,
> > +                      uint32_t offset)
> > +{
> > +   load_sized_register_mem(brw, reg, bo, read_domains, write_domain, offset, 1);
> > +}
> > +
> > +void
> > +brw_load_register_mem64(struct brw_context *brw,
> > +                        uint32_t reg,
> > +                        struct brw_bo *bo,
> > +                        uint32_t read_domains, uint32_t write_domain,
> > +                        uint32_t offset)
> > +{
> > +   load_sized_register_mem(brw, reg, bo, read_domains, write_domain, offset, 2);
> > +}
> 
> Maybe just make these static inlines in brw_state.h or somewhere?

Sure, I thought they might have been wider used and stuck them in
brw_context.c because of the leading argument, but if we can make them
static that would be a good improvement.

Thanks for taking the time to wade into such a huge patch Kenneth!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the mesa-dev mailing list