[Mesa-dev] glsl: Improvements to lower_jumps.cpp

Wed Jul 6 15:03:15 PDT 2011

On 6 July 2011 12:18, Ian Romanick <idr at freedesktop.org> wrote:
>
> The only other unit tests in Mesa are for the old matrix math routines
> (fixed function).  See src/mesa/math/m_debug*.c.
>
> This is an area that I've been thinking about lately.  I noticed that
> XCB uses a framework called check (http://check.sourceforge.net/), and
> I've been wanting to talk to Jamey and Josh about their experience with
> it, but I never seem to get around to it.  Maybe now is the right time. :)

Ok, I'll have a look at check.sourceforge.net and see if the unit
tests I've developed could be reshaped to use it.  My concern is that
since it would require the unit tests to be written in C, they would
take a lot longer to write (and be more prone to error) than tests
written in python.  But I'll give it some thought and let you know if
I come up with anything.

>> [PATCH 09/11] glsl: Use foreach_list in lower_jumps.cpp
>> [PATCH 10/11] glsl: In lower_jumps.cpp, lower both branches of a conditional.
>> [PATCH 11/11] glsl: Lower break instructions when necessary at the end of a loop.
>
> I need to think about these a bit more.  It seems like the first change
> (09/11) should allow the other changes to work, but it still makes me
> nervous.  All of these passes track some aspects of the CFG and do
> damage to it.  My concern is that assumptions about the state of the CFG
> may not hold after it gets modified.
>
> I don't really care if this means that we have to run multiple passes to
> get them all.  We don't have any indication that this is causing a
> particular CPU bottleneck.

I understand where you are coming from.  I believe the code is better
than it was (and I have the unit test results to back it up), but it's
still quite fragile and bugs may remain.  To clear up a
misunderstanding, though, the reason I am trying to avoid running
multiple passes is not to avoid a CPU bottleneck.  The problem is that
for every pass, lower_jumps creates a fresh set of temporary boolean
variables to simulate the control flow statements it is lowering.  So
if it takes multiple passes to catch all control flow statements, the
resulting IR gets bloated with multiple independent boolean temporary
variables.  Here's an example from the unit tests.  Without patch
11/11, the following IR:

((declare (in) float a) (declare (in) float cb) (declare (in) float ca)
 (declare (in) float ba)
 (declare (in) float bb)
 (function main
  (signature void (parameters)
   ((loop () () () ()
     ((if (expression bool > (var_ref a) (constant float (0.000000)))
       ((if (expression bool > (var_ref ba) (constant float (0.000000)))
         ((if (expression bool > (var_ref bb) (constant float (0.000000)))
           (continue)
           ()))
         ())
        (if (expression bool > (var_ref ca) (constant float (0.000000)))
         ((if (expression bool > (var_ref cb) (constant float (0.000000)))
           (break)
           ()))
         ()))
       ())
      break))))))

Gets lowered to this:

((declare (in) float bb) (declare (in) float ba)
 (declare (in) float ca)
 (declare (in) float cb)
 (declare (in) float a)
 (function main
  (signature void (parameters)
   ((declare (temporary) bool break_flag)
    (assign (x) (var_ref break_flag) (constant bool (0)))
    (declare (temporary) bool break_flag at 2)
    (assign (x) (var_ref break_flag at 2) (constant bool (0)))
    (loop () () () ()
     ((declare (temporary) bool execute_flag)
      (assign (x) (var_ref execute_flag) (constant bool (1)))
      (declare (temporary) bool execute_flag at 3)
      (assign (x) (var_ref execute_flag at 3) (constant bool (1)))
      (if (expression bool > (var_ref a) (constant float (0.000000)))
       ((if (expression bool > (var_ref ba) (constant float (0.000000)))
         ((if (expression bool > (var_ref bb) (constant float (0.000000)))
           ((assign (x) (var_ref execute_flag at 3) (constant bool (0))))
           ()))
         ())
        (if (var_ref execute_flag at 3)
         ((if (expression bool > (var_ref ca) (constant float (0.000000)))
           ((if (expression bool > (var_ref cb) (constant float (0.000000)))
             ((assign (x) (var_ref break_flag) (constant bool (1)))
              (assign (x) (var_ref execute_flag at 3) (constant bool (0))))
             ()))
           ()))
         ()))
       ())
      (if (var_ref execute_flag at 3)
       ((assign (x) (var_ref break_flag at 2) (constant bool (1)))
        (assign (x) (var_ref execute_flag) (constant bool (0))))
       ((if (var_ref break_flag)
         ((assign (x) (var_ref break_flag at 2) (constant bool (1)))
          (assign (x) (var_ref execute_flag) (constant bool (0))))
         ())))
      (if (var_ref break_flag at 2) (break) ())))))))

As you can see, lower_jumps created two break flags and two execute
flags, with one set of flags lowering one of the breaks, and the other
set of flags lowering the other break.  In principle a sufficiently
advanced set of optimization passes could clean up this mess, but it's
not clear to me that the optimization passes we have are sufficient to
do the trick.

With patch 11/11, it gets lowered to this, which is not as good as
hand-optimized code, but still much more sensible:

((declare (in) float a) (declare (in) float cb) (declare (in) float ca)
 (declare (in) float ba)
 (declare (in) float bb)
 (function main
  (signature void (parameters)
   ((declare (temporary) bool break_flag)
    (assign (x) (var_ref break_flag) (constant bool (0)))
    (loop () () () ()
     ((declare (temporary) bool execute_flag)
      (assign (x) (var_ref execute_flag) (constant bool (1)))
      (if (expression bool > (var_ref a) (constant float (0.000000)))
       ((if (expression bool > (var_ref ba) (constant float (0.000000)))
         ((if (expression bool > (var_ref bb) (constant float (0.000000)))
           ((assign (x) (var_ref execute_flag) (constant bool (0))))
           ()))
         ())
        (if (var_ref execute_flag)
         ((if (expression bool > (var_ref ca) (constant float (0.000000)))
           ((if (expression bool > (var_ref cb) (constant float (0.000000)))
             ((assign (x) (var_ref break_flag) (constant bool (1)))
              (assign (x) (var_ref execute_flag) (constant bool (0))))
             ()))
           ()))
         ()))
       ())
      (if (var_ref execute_flag)
       ((assign (x) (var_ref break_flag) (constant bool (1))))
       ())
      (if (var_ref break_flag) (break) ())))))))

If you are concerned about patches 10/11 and 11/11 because of code
fragility, then the alternative I'd favor is to modify lower_jumps so
that it isn't so fragile anymore.  Here's what I have in mind:

(1) Modify lower_jumps so that it can run in multiple passes without
having to generate an extra set of temporary variables on each pass.
(2) Simplify lower_jumps so that each time it modifies the IR, instead
of trying to fix up the control flow analysis and continue on to do
the right thing, it discards the now-defunct control flow analysis and
loops back to take another pass.

I'd be glad to pursue this for a while if you think it would be
worthwhile.  Let me know if you'd like me to give it a shot.

>
> One other question: does this series cause any piglit regressions on
> platforms that require flow-control lowering (e.g., i915 or r300)?

I don't know.  I will try to get ahold of a test platform when I'm in
the office tomorrow and find out.