Mesa (main): 22 new commits

Thu Jul 8 16:22:46 UTC 2021

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=266d3d58146cf2b6406344e91da4518726eb377b
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon Sep 14 12:30:12 2020 +0200

    tu: Update subgroup properties
    
    Everything should be in place for this to actually work. Support a size
    of 128, unlike the blob. I've also plumbed through ballot support, so
    enable that.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=75516e0595ffa9ba3f42a5145c1158d13aa30722
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon Jun 28 13:22:59 2021 +0200

    ir3/legalize: Fix loop convergence behavior
    
    This prevents the previous commit from being undone by the jump
    optimizations in legalize, and fixes another potential case where
    instead of a continue we have an if/else at the end of a loop.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=0fa93fb6626b4270ec0086d34a336046addd4462
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon Jun 28 12:56:15 2021 +0200

    ir3: Fix convergence behavior for loops with continues
    
    When loops have continue statements, it's expected that when we execute
    a divergent continue (i.e. a continue where not all of the threads
    active at the start take it) we keep going with the rest of the loop
    body and then reconverge at the start of the next iteration. However the
    Adreno ISA seems to always take a branch that jumps backwards, assuming
    it's the bottom of a loop, so we get a different, undesired convergence
    behavior. There's no way I know of to control this behavior in the
    instruction set, so we have to instead insert a "continue block" at the
    end of the loop where continue statements reconverge which then jumps
    back to the top of the loop. Since this doesn't correspond 1:1 with any
    NIR block we have to make control flow handling in NIR->ir3 a bit more
    complicated, unfortunately.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=b1b80c06a78e62b2d8477b07f12b0153435b66a8
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon May 31 12:58:26 2021 +0200

    ir3: Implement nir subgroup intrinsics
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=5d5d7523194df66eb1159c99a646082b570b4729
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon May 31 12:21:29 2021 +0200

    ir3: Handle shared registers in lower_parallelcopy
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=17f7453d45c79bb8c52c4f9d491a3b63c1fcb76a
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon May 31 12:09:42 2021 +0200

    ir3: Add subgroup pseudoinstructions
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=232ec710fd0fbeccc4836d05caafce5b7adbf857
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Fri Sep 11 13:17:40 2020 +0200

    ir3: Support any/all/getone branches
    
    This plumbs through the support in the IR.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=7a8e0b15e2f87589c58695bd6557a1cc8fd8aaa3
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Tue Sep 1 15:22:14 2020 +0200

    ir3: Cleanup ir3_legalize jump optimization
    
    Do the optimization parts in their own loop, and be more robust when
    detecting the useless jumps.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=43e926a3afadf1503ad1d00b62b10e78520d79ac
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Fri May 28 17:31:48 2021 +0200

    ir3/sched: Handle branch condition in split_pred()
    
    Before this, if there was a block with multiple things writing p0.x,
    it was a tossup whether the right one would be used as the branch
    condition. Found by inspection.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=bb3212dd4d80fd945654f3ca5380fd472fba92a4
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon Jun 28 18:41:41 2021 +0200

    ir3: Fix infinite loop in scheduler when splitting
    
    When we go to split e.g. a p0.x producer, the only other instructions
    ready to schedule are often only p0.x producers. It could happen that
    they all have a lower priority than the split instruction. Then we would
    immediately schedule the split instruction again, then again try to
    schedule one of the other producers, be blocked, and split it, around
    and around again, leading to an infinite loop. The following commit
    triggered this with
    dEQP-GLES3.functional.shaders.discard.dynamic_loop_always on a3xx.
    
    Fixes: d2f4d33 ("freedreno/ir3: new pre-RA scheduler")
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=2ff3ab0aed747cbb59d3b71ef459e70e9d346cdd
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Fri May 28 16:03:16 2021 +0200

    ir3: Make MOVMSK use repeat
    
    MOVMSK is a bit of a special case, because it takes multiple cycles (and
    therefore reduces the nops needed if it's between some other assigner
    and consumer) however weird things happen if you try to start reading
    the first component while it isn't finished yet. On balance making it
    use repeat seems to result in a fewer special cases.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=66a275d50f8f7a431e3d9e6c38b64fa73a7e55ba
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Fri May 28 15:41:56 2021 +0200

    ir3: Fix shared reg delay
    
    Based on computerator experiments, this is actually 6, including for
    movmsk.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=b1b4ce7be2c5025a6b4d1169a1b12e99fcfd4390
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Fri Jun 25 16:51:22 2021 +0200

    ir3: Actually allow shared reg moves to be folded
    
    I realized that shared registers were never actually getting folded,
    even after adding them to valid_flags, because the move wasn't even
    being considered.
    
    I looked at the other uses of is_same_type_mov(), and they should be ok
    with this.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=b32188cdba4bc3c1beec8957f48e29b960408227
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Fri Jun 25 17:50:47 2021 +0200

    ir3: Better valid flags for shared regs
    
    Shared registers seem to use the same port as consts, so the same
    restrictions for cat2/cat3 apply to them.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=590efd180be05817163d1b70990273b535a82afe
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Thu May 27 17:28:09 2021 +0200

    ir3: Prevent propagating shared regs out of loops
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=394c597b1b31842b3943e30ab7f21359b0076b13
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon Jun 28 14:48:08 2021 +0200

    ir3: Handle unreachable blocks
    
    This fixes a pre-existing bug in ir3, but it showed up even more due to
    other changes in this series and it interacts with the logical/physical
    CFG split. When both sides of an if end with a jump, a block may become
    unreachable via the logical CFG, which can cause problems because it has
    no predecessors to figure out the location of live-in non-shared
    values. In this case we assume that nir_opt_if has removed any code in
    these blocks and just skip processing live-ins for these blocks,
    pretending that they aren't live.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=22ae91b28405b121cbb94badcb9381db94358a0e
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Thu May 27 16:54:20 2021 +0200

    ir3: Handle shared register liveness correctly
    
    As explained in the comments added, we need to add extra edges to the
    CFG which are ignored except for shared registers. This plumbs through
    support for this.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=8176657ead5a05022d8f93a6bcd31b22e5b40504
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Fri Sep 11 13:31:44 2020 +0200

    ir3/nir: Call nir_lower_subgroups
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=68b8b9e9e13b358ae43f967e84e4e3c1eef5f48d
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon May 31 14:21:04 2021 +0200

    tu, ir3: Plumb through support for CS subgroup size/id
    
    The way that the blob obtains the subgroup id on compute shaders is by
    just and'ing gl_LocalInvocationIndex with 63, since it advertizes a
    subgroupSize of 64. In order to support VK_EXT_subgroup_size_control and
    expose a subgroupSize of 128, we'll have to do something a little more
    flexible. Sometimes we have to fall back to a subgroup size of 64 due to
    various constraints, and in that case we have to fake a subgroup size of
    128 while actually using 64 under the hood, by just pretending that the
    upper 64 invocations are all disabled. However when computing the
    subgroup id we need to use the "real" subgroup size. For this purpose we
    plumb through a driver param which exposes the real subgroup size. If
    the user forces a particular subgroup size then we lower
    load_subgroup_size in nir_lower_subgroups, otherwise we let it through,
    and we assume when translating to ir3 that load_subgroup_size means
    "give me the *actual* subgroup size that you decided in RA" and give you
    the driver param.
    
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=cc514bfa0e29a46498f88ffa4a9e6dd92b3e3d58
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Mon Sep 14 10:14:55 2020 +0200

    nir: Add read_invocation_cond_ir3 intrinsic
    
    On qualcomm, we have shared registers similar to SGPR's on AMD. However,
    there is no readlane or readfirstlane primitive. shared registers can
    only be written to when just one lane is active. This means that we have
    to lower readInvocation(val, id) to something like:
    
    if (gl_SubgroupInvocation == id) {
        scalar_reg = val;
    }
    
    return scalar_reg;
    
    However it's a bit difficult to actually get the value of
    gl_SubgroupInvocation in the backend, because for compute it requires
    some calculations and we don't have any CSE support in the backend. This
    intrinsic lets us turn it into
    "readInvocationCond(val, id == gl_SubgroupInvocation)" in NIR at which
    point the backend code generation is a lot easier.
    
    Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=e4e79de2a420128190b28b39b87f6de39b1b7060
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Thu Sep 10 18:48:04 2020 +0200

    nir/subgroups: Support > 1 ballot components
    
    Qualcomm has a mode with a subgroup size of 128, so just emitting larger
    integer operations and then lowering them later isn't an option. This
    makes the pass able to handle the lowering itself, so that we don't have
    to go down to 64-thread wavefronts when ballots are used.
    
    (The GLSL and legacy SPIR-V extensions only support a maximum of 64
    threads, but I guess we'll cross that bridge when we come to it...)
    
    Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>

URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=90819b9b0ea0ea8ffe4bd34100ee12dce8f63ebf
Author: Connor Abbott <cwabbott0 at gmail.com>
Date:   Fri Sep 11 13:07:48 2020 +0200

    nir/subgroups: Replace lower_vote_eq_to_ballot with lower_vote_eq
    
    Lower it to a vote instead of a ballot. This was only used for AMD, and
    in that case they're pretty much the same. However Qualcomm has a vote
    builtin, which we want to use instead of ballots.
    
    Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
    Acked-by: Rhys Perry <pendingchaos02 at gmail.com>
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>