[Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

Matt Turner mattst88 at gmail.com
Fri Jul 21 18:41:48 UTC 2017


On 07/21, aravindan.muthukumar at intel.com wrote:
>From: Aravindan Muthukumar <aravindan.muthukumar at intel.com>
>
>This patch improves CPI Rate(Cycles per Instruction) and branch miss predict
>for i965. The function check_state() was showing CPI retired rate.
>
>Performance stats with android:
>- CPI retired lowered by 28% (lower is better)
>- Branch missprediction lowered by 13% (lower is better)
>- 3DMark improved by 2%
>
>The dissassembly doesn't show difference, although above results were observed
>with patch.

If there's no difference in the assembly then whatever measurements you
have taken must be noise.

I applied the patch and inspected brw_state_upload.o. There are assembly
differences. I can produce the same assembly as this patch by just
pulling the if statement out of check_and_emit_atom() and into the
caller. The replacement of check_state() with a macro is completely
unnecessary.


diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c
index acaa97ee7d..b163e1490d 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -439,14 +439,12 @@ merge_ctx_state(struct brw_context *brw,
 }
 
 static inline void
-check_and_emit_atom(struct brw_context *brw,
-                    struct brw_state_flags *state,
-                    const struct brw_tracked_state *atom)
+emit_atom(struct brw_context *brw,
+          struct brw_state_flags *state,
+          const struct brw_tracked_state *atom)
 {
-   if (check_state(state, &atom->dirty)) {
-      atom->emit(brw);
-      merge_ctx_state(brw, state);
-   }
+   atom->emit(brw);
+   merge_ctx_state(brw, state);
 }
 
 static inline void
@@ -541,7 +539,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
 	 const struct brw_tracked_state *atom = &atoms[i];
 	 struct brw_state_flags generated;
 
-         check_and_emit_atom(brw, &state, atom);
+         if (check_state(&state, &atom->dirty)) {
+            emit_atom(brw, &state, atom);
+         }
 
 	 accumulate_state(&examined, &atom->dirty);
 
@@ -558,7 +558,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
       for (i = 0; i < num_atoms; i++) {
 	 const struct brw_tracked_state *atom = &atoms[i];
 
-         check_and_emit_atom(brw, &state, atom);
+         if (check_state(&state, &atom->dirty)) {
+            emit_atom(brw, &state, atom);
+         }
       }
    }


With that said, the assembly differences are not understandable to me.
Why should extracting an if statement from a static inline function into
the caller of that function cause any difference whatsoever?

I'm viewing the assembly differences with:

wdiff -n \
    -w $'\033[30;41m' -x $'\033[0m' \
    -y $'\033[30;42m' -z $'\033[0m' \
    <(objdump -d brw_state_upload.o.before | sed -e 's/^.*\t//') \
    <(objdump -d brw_state_upload.o.wtf    | sed -e 's/^.*\t//') | less -R

and the only real difference is the movement of some call and jmp
instructions.

I cannot take the patch without some reasonable explanation for the
change.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170721/73223268/attachment.sig>


More information about the mesa-dev mailing list