[Mesa-dev] [PATCH 4/4] nv50: add PostRADualIssue Pass

Sat Aug 13 19:26:05 UTC 2016

So, I don't know much about how nv50 ir works, but to me this just
seems like a pretty slow implementation of a very limited instruction
scheduler. In addition to the runtime complexity problems you
mentioned, you're going to get a lot more benefit even from a very
simple list scheduler compared to this, and it generally only takes a
few hundred lines to write one. I'd send you some references, but I
don't have any good explanations handy -- any decent compiler book
should cover this, though. You could start at the wikipedia page for
instruction scheduling to give you some idea of the main concepts [1],
although it isn't great; for example, it doesn't mention the crucial
difference between top-down and bottom-up scheduling. To give a
concrete example, here's the skeleton of a basic list scheduler I
wrote for freedreno a while ago [2]. The i965 scheduler right now
isn't really great, and although I had a patch series to improve it a
bit, it's a bit more complex than you need right now since it's
designed to run pre-RA as well as post-RA so it needs to worry about
register pressure. Nevertheless, you can find the end-result of my
i965 scheduler series here [3]. If you ignore all the !post_ra stuff,
it's a good example of what your scheduling heuristic should look
like, particularly these lines [4]. Also, I'm here if you have any
questions!

[1] https://en.wikipedia.org/wiki/Instruction_scheduling
[2] https://cgit.freedesktop.org/~cwabbott0/mesa/tree/src/gallium/drivers/freedreno/ir3/ir3_sched.c?h=ir3-sched
[3] https://cgit.freedesktop.org/~cwabbott0/mesa/tree/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp?h=i965-sched-v3&id=f5ff9c87b06a7eae0a08d381c02d0e0e131212ad
[4] https://cgit.freedesktop.org/~cwabbott0/mesa/tree/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp?h=i965-sched-v3&id=f5ff9c87b06a7eae0a08d381c02d0e0e131212ad#n1544

On Sat, Aug 13, 2016 at 6:02 AM, Karol Herbst <karolherbst at gmail.com> wrote:
> this paths reorders instructions in a way, so that the hardware is able to
> dual issue more instructions leading to higher performance.
>
> with all previos commits put together
> changes for ./GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0
> /benchmark_duration_ms=60000 /width=1024 /height=640:
>
> inst_executed: 1.03G
> inst_issued1: 614M -> 500M
> inst_issued2: 213M -> 271M
>
> score: 1021 -> 1056
>
> Signed-off-by: Karol Herbst <karolherbst at gmail.com>
> ---
>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 59 ++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index a9172f8..88354c9 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -3385,6 +3385,62 @@ DeadCodeElim::checkSplitLoad(Instruction *ld1)
>
>  // =============================================================================
>
> +// Tries to improve dual issueing trivially
> +//
> +// the PASS checks for every instruction A,B with A->next == B and
> +// !canDualIssue(A,B), if there is a Instruction C in the same BB which can be
> +// moved between A and B and canDualIssue(A,C) is true
> +class PostRADualIssue : public Pass
> +{
> +private:
> +   virtual bool visit(BasicBlock *);
> +};
> +
> +static bool
> +isChainedCommutationLegal(Instruction *a, Instruction *b)
> +{
> +   Instruction *check = a;
> +   do {
> +      if (!check->isCommutationLegal(b) || (check->op == OP_TEX && check->op == OP_TEX))
> +         return false;
> +      check = check->next;
> +   } while (check != b);
> +   return true;
> +}
> +
> +bool
> +PostRADualIssue::visit(BasicBlock *bb)
> +{
> +   const Target *target = prog->getTarget();
> +   Instruction *i, *next, *check;
> +
> +   for (i = bb->getEntry(); i; i = next) {
> +      next = i->next;
> +
> +      // check next
> +      if (!next || next->fixed || next->join || next->asFlow() || target->canDualIssue(i, next))
> +         continue;
> +
> +      check = next->next;
> +      // we can't move fixed, flow instructions and instruction marked as join
> +      while (check && !check->fixed && !check->join && !check->asFlow() && check->prev->bb == check->bb && isChainedCommutationLegal(next, check)) {
> +         if (target->canDualIssue(i, check)) {
> +            // move the instruction after i step by step
> +            while (check->prev != i)
> +               bb->permuteAdjacent(check->prev, check);
> +
> +            // we have to always continue with the currently next instruction
> +            next = i->next;
> +            break;
> +         }
> +         check = check->next;
> +      }
> +   }
> +   return true;
> +}
> +
> +// =============================================================================
> +
>  #define RUN_PASS(l, n, f)                       \
>     if (level >= (l)) {                          \
>        if (dbgFlags & NV50_IR_DEBUG_VERBOSE)     \
> @@ -3420,6 +3476,9 @@ Program::optimizePostRA(int level)
>     RUN_PASS(2, FlatteningPass, run);
>     if (getTarget()->getChipset() < 0xc0)
>        RUN_PASS(2, NV50PostRaConstantFolding, run);
> +   // should be last
> +   if (getTarget()->hasDualIssueing())
> +      RUN_PASS(2, PostRADualIssue, run);
>
>     return true;
>  }
> --
> 2.9.2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev