<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Dec 21, 2014 at 9:11 PM, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote:<br>
> From: Ian Romanick <<a href="mailto:ian.d.romanick@intel.com">ian.d.romanick@intel.com</a>><br>
><br>
> Instead of having an extra pointer indirection in one of the hottest<br>
> loops in the driver.<br>
><br>
> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic<br>
> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects<br>
> Gl32Batch7:<br>
><br>
> 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)<br>
> 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)<br>
><br>
> Signed-off-by: Ian Romanick <<a href="mailto:ian.d.romanick@intel.com">ian.d.romanick@intel.com</a>><br>
> ---<br>
> src/mesa/drivers/dri/i965/brw_context.h | 2 +-<br>
> src/mesa/drivers/dri/i965/brw_state_upload.c | 19 ++++++++++++++++---<br>
> 2 files changed, 17 insertions(+), 4 deletions(-)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h<br>
> index 92eb022..5f5f807 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_context.h<br>
> +++ b/src/mesa/drivers/dri/i965/brw_context.h<br>
> @@ -1384,7 +1384,7 @@ struct brw_context<br>
> } perfmon;<br>
><br>
> int num_atoms;<br>
> - const struct brw_tracked_state **atoms;<br>
> + const struct brw_tracked_state atoms[64];<br>
<br>
</span>Making this atoms[57] would save memory, and the static asserts make it easy<br>
to remember to increase the size if we ever add additional atoms.<br>
<br>
Honestly, it seems like there should be a better way to do this. Currently,<br>
the atoms list only depends on the generation number, and thus is identical<br>
for all GL contexts. Storing an additional 1.5 kB of redundant data in each<br>
GL context seems a bit wasteful.<br>
<br>
Would it help to make brw_upload_state cache the array and size?<br>
<br>
void<br>
brw_upload_state(struct brw_context *brw)<br>
{<br>
static const struct brw_tracked_state *atoms = brw->atoms;<br>
static const int num_atoms = brw->num_atoms;<br>
<br>
...<br>
<br>
for (i = 0; i < num_atoms; i++) {<br>
...use atoms[i]...<br>
}<br>
<br>
...<br>
}<br>
<br>
It seems like that would achieve the same effect - one indirection.<br></blockquote><div><br></div><div>Yeah, but then we have deal with threading issues. It might be ok, but I think we'd have to be a bit more careful than that.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
One downside to that approach is that we couldn't have separate atom<br>
lists for API_CORE vs. API_COMPAT (while we could using your approach).<br>
(We could skip stippling for core contexts, and skip GS/HS/DS for legacy<br>
contexts, which might help a tiny bit...)<br></blockquote><div> <br></div><div>Another option would be to make the atoms list a screen thing and do the copy at screen creation time. That would allow it to be shared across contexts without the above threading issues. If we wanted, we could store one core version and one compat version and have the context point to whichever one it needs. The problem with this is that it's wasteful in the one-context case where we would have both core an compat lists. I guess you can't win them all.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
> /* If (INTEL_DEBUG & DEBUG_BATCH) */<br>
> struct {<br>
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c<br>
> index a579781..9a5d49a 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c<br>
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c<br>
> @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw )<br>
> const struct brw_tracked_state **atoms;<br>
> int num_atoms;<br>
><br>
> + STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms));<br>
> + STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms));<br>
> + STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms));<br>
> + STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms));<br>
<br>
</span>I do like the static asserts.<br>
<span class=""><br>
> +<br>
> brw_init_caches(brw);<br>
><br>
> if (brw->gen >= 8) {<br>
> @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw )<br>
> num_atoms = ARRAY_SIZE(gen4_atoms);<br>
> }<br>
><br>
> - brw->atoms = atoms;<br>
> brw->num_atoms = num_atoms;<br>
><br>
> + /* This is to work around brw_context::atoms being declared const. We want<br>
> + * it to be const, but it needs to be initialized somehow!<br>
> + */<br>
> + struct brw_tracked_state *context_atoms =<br>
> + (struct brw_tracked_state *) &brw->atoms[0];<br>
<br>
</span>Ugly, but... not sure what else to suggest.<br></blockquote><div><br></div><div>I don't like it either, but I did exactly the same thing when I wrote it, so I can't blame Ian.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I suppose this patch is fine. It does offer a small CPU overhead reduction in<br>
the hottest path, for a small memory usage penalty. I think we can do better,<br>
but I don't know that it makes sense to block your patch over nebulous ideas<br>
for future improvements. We can do those in the future...<br>
<br>
Reviewed-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
<div class="HOEnZb"><div class="h5"><br>
> +<br>
> + for (int i = 0; i < num_atoms; i++)<br>
> + context_atoms[i] = *atoms[i];<br>
> +<br>
> while (num_atoms--) {<br>
> assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw);<br>
> assert((*atoms)->emit);<br>
> @@ -607,7 +620,7 @@ void brw_upload_state(struct brw_context *brw)<br>
> prev = *state;<br>
><br>
> for (i = 0; i < brw->num_atoms; i++) {<br>
> - const struct brw_tracked_state *atom = brw->atoms[i];<br>
> + const struct brw_tracked_state *atom = &brw->atoms[i];<br>
> struct brw_state_flags generated;<br>
><br>
> if (check_state(state, &atom->dirty)) {<br>
> @@ -627,7 +640,7 @@ void brw_upload_state(struct brw_context *brw)<br>
> }<br>
> else {<br>
> for (i = 0; i < brw->num_atoms; i++) {<br>
> - const struct brw_tracked_state *atom = brw->atoms[i];<br>
> + const struct brw_tracked_state *atom = &brw->atoms[i];<br>
><br>
> if (check_state(state, &atom->dirty)) {<br>
> atom->emit(brw);</div></div><br>_______________________________________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
<br></blockquote></div></div></div>