[Mesa-dev] [PATCH v2 00/78] i965: A new vec4 backend based on NIR

Mon Aug 3 09:51:18 PDT 2015

On Mon, Aug 3, 2015 at 8:54 AM, Eduardo Lima Mitev <elima at igalia.com> wrote:
> On 08/03/2015 04:57 PM, Jason Ekstrand wrote:
>>
>> On Aug 3, 2015 4:17 AM, "Eduardo Lima Mitev" <elima at igalia.com
>> <mailto:elima at igalia.com>> wrote:
>>>
>>> On 08/01/2015 02:59 PM, Eduardo Lima Mitev wrote:
>>> > On 07/31/2015 10:48 AM, Eduardo Lima Mitev wrote:
>>> >> On 07/30/2015 09:48 PM, Jason Ekstrand wrote:
>>> >>>
>>> >>> On Jul 27, 2015 3:39 PM, "Jason Ekstrand" <jason at jlekstrand.net
>> <mailto:jason at jlekstrand.net>
>>> >>> <mailto:jason at jlekstrand.net <mailto:jason at jlekstrand.net>>> wrote:
>>> >>>>
>>> >>>> On Mon, Jul 27, 2015 at 2:07 PM, Eduardo Lima Mitev
>> <elima at igalia.com <mailto:elima at igalia.com>
>>> >>> <mailto:elima at igalia.com <mailto:elima at igalia.com>>> wrote:
>>> >>>>> On 07/25/2015 03:08 AM, Jason Ekstrand wrote:
>>> >>>>>> Alright, I got through it again...
>>> >>>>>>
>>> >>>>>> I asked for a few trivial changes on a few of the patches.
>> With those
>>> >>>>>> fixed, everything except patch 65 and 66 are
>>> >>>>>>
>>> >>>>>
>>> >>>>> Thanks! we have fixed most of the patches already.
>>> >>>>>
>>> >>>>>> Reviewed-by: Jason Ekstrand <jason.ekstrand at intel.com
>> <mailto:jason.ekstrand at intel.com>
>>> >>> <mailto:jason.ekstrand at intel.com <mailto:jason.ekstrand at intel.com>>>
>>> >>>>>>
>>> >>>>>> While the requested changes on the texturing patches are not
>>> >>>>>> complicated, I would like to see the updated version of those two
>>> >>>>>> patches before I give a formal R-B.
>>> >>>>>>
>>> >>>>>
>>> >>>>> Sure, we will send new versions of patch 65 and 66 as soon as we
>> have
>>> >>>>> them ready and tested.
>>> >>>>>
>>> >>>>>> There is one caveat to the above review: Something is badly
>> broken on
>>> >>>>>> HSW.  I'm not sure what, but one of the patches badly breaks
>> HSW even
>>> >>>>>> in the non-NIR case.  We need to figure out what that is before
>>> >>>>>> pushing anything.  That said, please don't re-send the full series;
>>> >>>>>> just figure out where the bug is, fix it, and re-send the one
>> patch.
>>> >>>>>>
>>> >>>>>
>>> >>>>> We have done extensive testing today, trying to find something
>> strange
>>> >>>>> that explains this breakage you are experiencing. Most of us
>> have HSW
>>> >>>>> laptops so it is our normal development environments.
>>> >>>>>
>>> >>>>> However, we have not been able to reproduce any fail that would
>> suggest
>>> >>>>> our branch breaks something. Using current master as baseline, I
>> have
>>> >>>>> tested both nir-vec4 and vec4_visitor against it (piglit and
>> dEQP), and
>>> >>>>> we see no regressions other that the one we described in the cover
>>> >>> letter.
>>> >>>>>
>>> >>>>> Please, could you be more specific about the symptoms of this
>> breakage
>>> >>>>> and give us any possible info that would help us reproduce it
>> locally?
>>> >>>>> Maybe details of your system, kernel and gcc version; anything that
>>> >>>>> might be different to a standard linux system.
>>> >>>>
>>> >>>> I'm not sure what happened there.  I may have had a bad rebase or
>>> >>>> something.  I pulled your branch, rebased on master, and tried it
>>> >>>> again.  Now it seems to be ok.  On Broadwell and Cherryview, however,
>>> >>>> it's not so ok.  It looks like you may have accidentally broken
>> scalar
>>> >>>> VS or something like that.  I can't even run glxgears on BDW without
>>> >>>> hitting an assert.  If your dev machine is a HSW, you can use
>>> >>>> INTEL_DEVID_OVERRIDE to tell it to compile shaders as if it's a BDW
>>> >>>> and figure out why it's crashing.  The patch titled
>> "nir/nir_lower_io:
>>> >>>> Add vec4 support" makes glxgears render black on BDW.  It seems
>> like a
>>> >>>> later patch is causing the crash.  In any case, you should be able to
>>> >>>> use INTEL_DEVID_OVERRIDE and compare the shader results.
>>> >>>>
>>> >>>> --Jason
>>> >>>
>>> >>> Any progress on this?
>>> >>>
>>> >>
>>> >> Yes, I think I found the issue and fixed it, (at least the only one I
>>> >> was able to find so far); and it was indeed the one causing the assert
>>> >> in glxgears.
>>> >>
>>> >> Basically, it was a silly mistake in one of the patches I authored:
>>> >>
>> https://github.com/Igalia/mesa/commit/2d72ae5f9e26c3c11dce9e779407984ef9774a52
>>> >>
>>> >> You can see that I passed 'false' in 'is_scalar' argument of
>>> >> brw_create_nir() for ARB_vertex_programs, ignoring that in BDW
>> those are
>>> >> scalar too. The fix was to pass 'brw->intelScreen->compiler->scalar_vs'
>>> >> instead of 'false' there.
>>> >>
>>> >> Indirectly, this error was messing up the lowering passes in brw_nir,
>>> >> assuming non-scalar when it was scalar. Specifically, the pass
>>> >> nir_lower_alu_to_scalar() was not called because we disabled it on
>>> >> non-scalar shaders.
>>> >>
>>> >> I will send a new version of the patch I changed, in case you want to
>>> >> have a look. I also updated the git-tree of the branch with the fix and
>>> >> rebased against master:
>>> >>
>>> >
>>> > I sent new versions of patches 09/78 and 74/78, which are the ones
>>> > modified to fix the issue in BDW. As I commented there, the changes are
>>> > trivial, but I send the new versions for the sake of reference and
>>> > completeness.
>>> >
>>> >> https://github.com/Igalia/mesa/tree/nir-vec4-v2
>>> >>
>>> >
>>> > This branch has also been updated with the changes. In general, it is
>>> > safe to assume that this branch always contain the latest version of the
>>> > series we have.
>>> >
>>> >> Please, let us know if you find more issues.
>>> >>
>>> >>
>>> >> Right now we have ~8 new piglit regressions (in HSW) after rebasing
>>> >> master. I'm currently analyzing them and hope to fix them soon.
>> They are
>>> >> under "arb_shader_atomic_counters" set, so that you know in case
>> you see
>>> >> them.
>>> >>
>>> >
>>> > I fixed this regressions already. Patch "i965: Lift the constness
>>> > restriction on surface indices passed to untyped ops." broke our the
>>> > atomics intrinsic code, but the fix was easy to trace.
>>> >
>>> > So now the branch is back at zero regressions at least in 6 <= gen < 8.
>>> >
>>>
>>> Hi Jason,
>>>
>>> I think we are ready to merge the branch to master as soon as you can
>>> confirm that the issues on BDW are gone. As far as we can tell, all the
>>> patches in the series have R-b from you.
>>>
>>> This morning I rebased master again and did another round of testing. No
>>> regressions.
>>
>> I'll run it on or CI system some time this morning.  Once I've verified
>> that it at least doesn't break anything in the non-NIR case, feel free
>> to push. If you'd like, I can push it once the tests come back. Either
>> is fine with me.
>
> Ok, then please push it once you think it is safe. Thanks a lot!

Pushed!  Thanks a bunch!

I also pushed a version to our CI system that switched the
INTEL_USE_NIR default to true and things look pretty good over-all.
On ILK (gen5) and 965 (gen 4), there were a bunch of tests that failed
most of which seemed to be focussed around either variable indexing or
comparison operations.  It looks like it's probably the same problem
(or problems) on both gens, so once we get one fixed, the other will
most likely follow.  On gen8, it looks like integers are broken.

I don't know if any of you have the hardware to solve those problems.
If not, someone from the Oregon team can look into it.
--Jason

> Eduardo
>
>> --Jason
>>
>>> Thanks!
>>>
>>> >> Thank you Jason!
>>> >>
>>> >> Eduardo
>>> >>
>>> >>>>> Thanks again for the review and patience.
>>> >>>>>
>>> >>>>> Eduardo
>>> >>>>>
>>> >>>>>> --Jason
>>> >>>>>>
>>> >>>>>> On Thu, Jul 23, 2015 at 3:16 AM, Eduardo Lima Mitev
>>> >>> <elima at igalia.com <mailto:elima at igalia.com>
>> <mailto:elima at igalia.com <mailto:elima at igalia.com>>> wrote:
>>> >>>>>>> Hi,
>>> >>>>>>>
>>> >>>>>>> This is the second version of the series that adds a new vec4
>>> >>> backend for i965 based on NIR. The first series was sent some
>> weeks ago
>>> >>> [1] and went through review by Jason and Kenneth (thanks a lot!). This
>>> >>> series is a result of addressing issues detected during that feedback.
>>> >>>>>>>
>>> >>>>>>> This series also adds support for the NIR->vec4 pass on geometry
>>> >>> shaders and on ARB_vertex_programs. Both supports were
>> work-in-progress
>>> >>> by the time we sent the first version, and are now completed. The
>>> >>> patch-sets for GS and ARB_vertex_program were added at the end of the
>>> >>> series.
>>> >>>>>>>
>>> >>>>>>> Like the first version, we tested the backend on gen6 and gen7
>>> >>> (specifically, SNB, IVB and HSW); and with both piglit and dEQP. No
>>> >>> regressions observed with piglit. The issues related with register
>>> >>> spilling that we had in the first version have been fixed.
>>> >>>>>>>
>>> >>>>>>> With dEQP, we observe a regression that is causing the following 3
>>> >>> tests to fail:
>>> >>>>>>>
>>> >>>
>> dEQP-GLES3.functional.shaders.loops.for_dynamic_iterations.vector_counter_vertex
>>> >>>>>>>
>>> >>>
>> dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.vector_counter_vertex
>>> >>>>>>>
>>> >>>
>> dEQP-GLES3.functional.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex
>>> >>>>>>>
>>> >>>>>>> We have a patch that solves those specific cases, but we would
>>> >>> like to discuss it independently, before including it in the
>> series. We
>>> >>> believe these regressions should not block the review of the series.
>>> >>>>>>>
>>> >>>>>>> As usual, a git tree is available at
>>> >>> https://github.com/Igalia/mesa/tree/nir-vec4-v2 to ease testing.
>>> >>>>>>>
>>> >>>>>>> cheers,
>>> >>>>>>> Eduardo
>>> >>>>>>>
>>> >>>>>>> [1]
>>> >>> http://lists.freedesktop.org/archives/mesa-dev/2015-June/087448.html
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Alejandro Piñeiro (9):
>>> >>>>>>>   i965/vec4: Redefine make_reg_for_system_value() to allow
>> reuse in
>>> >>>>>>>     NIR->vec4 pass
>>> >>>>>>>   i965/nir/vec4: Add setup for system values
>>> >>>>>>>   i965/nir/vec4: Implement intrinsics that load system values
>>> >>>>>>>   i965/nir/vec4: Implement atomic counter intrinsics (read, inc
>>> >>> and dec)
>>> >>>>>>>   i965/nir: Disable alu_to_scalar pass on non-scalar shaders
>>> >>>>>>>   i965/vec4: Change vec4_visitor::swizzle_result() method to allow
>>> >>> reuse
>>> >>>>>>>   i965/vec4: Add a new dst_reg constructor accepting a
>> brw_reg_type
>>> >>>>>>>   i965/ir/vec4: Refactor visit(ir_texture *ir)
>>> >>>>>>>   i965/nir/vec4: Add implementation of nir_emit_texture()
>>> >>>>>>>
>>> >>>>>>> Antia Puentes (36):
>>> >>>>>>>   i965/nir/vec4: Implement loading values from an UBO
>>> >>>>>>>   i965/nir/vec4: Prepare source and destination registers for ALU
>>> >>>>>>>     operations
>>> >>>>>>>   i965/nir/vec4: Implement single-element "mov" operations
>>> >>>>>>>   i965/nir/vec4: Lower "vecN" instructions and mark them
>> unreachable
>>> >>>>>>>   i965/nir/vec4: Implement int<->float format conversion ops
>>> >>>>>>>   i965/nir/vec4: Implement the addition operation
>>> >>>>>>>   i965/nir/vec4: Implement multiplication
>>> >>>>>>>   i965/vec4: Return the last emitted instruction in emit_math()
>>> >>>>>>>   i965/nir/vec4: Implement more math operations
>>> >>>>>>>   i965/nir/vec4: Implement carry/borrow for addition/subtraction
>>> >>>>>>>   i965/nir/vec4: Implement various rounding functions
>>> >>>>>>>   i965/vec4: Return the emitted instruction in emit_minmax()
>>> >>>>>>>   i965/nir/vec4: Implement min/max operations
>>> >>>>>>>   i965/nir/vec4: Derivatives are not allowed in VS
>>> >>>>>>>   i965/nir: Add utility method for comparisons
>>> >>>>>>>   i965/nir/vec4: Implement non-vector comparison ops
>>> >>>>>>>   i965/nir/vec4: Implement equality ops on vectors
>>> >>>>>>>   i965/nir/vec4: Implement non-equality ops on vectors
>>> >>>>>>>   i965/nir/vec4: Implement logical operators
>>> >>>>>>>   i965/nir/vec4: Implement "bool<->int,float" format conversion
>>> >>>>>>>   i965/nir/vec4: "noise" ops should already be lowered
>>> >>>>>>>   i965/nir/vec4: Implement pack/unpack operations
>>> >>>>>>>   i965/nir/vec4: Implement bit operations
>>> >>>>>>>   i965/nir/vec4: Implement the "sign" operation
>>> >>>>>>>   i965/nir/vec4: Implement "shift" operations
>>> >>>>>>>   i965/nir/vec4: Implement floating-point fused multiply-add
>>> >>>>>>>   i965/vec4: Return the emitted instruction in emit_lrp()
>>> >>>>>>>   i965/nir/vec4: Implement linear interpolation
>>> >>>>>>>   i965/nir/vec4: Implement conditional select
>>> >>>>>>>   i965/nir/vec4: Implement the dot product operation
>>> >>>>>>>   i965/nir/vec4: Implement vector "any" operation
>>> >>>>>>>   i965/nir/vec4: Mark as unreachable ops that should be already
>>> >>> lowered
>>> >>>>>>>   i965/vec4: Enable NIR-vec4 pass on ARB_vertex_programs
>>> >>>>>>>   i965/vec4: Generate missing NIR for fixed-function vertex
>> programs
>>> >>>>>>>   i965/nir/vec4: Handle uniforms on vertex programs
>>> >>>>>>>   i965/vec4: Handle uniform and GRF array access on vertex
>> programs
>>> >>>>>>>     (NIR)
>>> >>>>>>>
>>> >>>>>>> Eduardo Lima Mitev (20):
>>> >>>>>>>   i965/nir/vec4: Add implementation placeholders for a new
>> NIR->vec4
>>> >>>>>>>     pass
>>> >>>>>>>   i965/nir/vec4: Select between new nir_vec4 or current
>> vec4_visitor
>>> >>>>>>>     code-paths
>>> >>>>>>>   i965/vec4: Move type_size() method to brw_vec4_visitor class
>>> >>>>>>>   i965/nir/vec4: Add setup of input variables in NIR->vec4 pass
>>> >>>>>>>   i965/nir/vec4: Add shader function implementation
>>> >>>>>>>   i965/nir: Pass a is_scalar boolean to brw_create_nir()
>>> >>>>>>>   i965/nir/vec4: Implement load_const intrinsic
>>> >>>>>>>   i965/nir: Move brw_type_for_nir_type() to brw_nir to allow reuse
>>> >>>>>>>   i965/vec4: Add auxiliary func to build a writemask from a
>> component
>>> >>>>>>>     size
>>> >>>>>>>   i965/nir/vec4: Add get_nir_dst() and get_nir_src() methods
>>> >>>>>>>   i965/nir/vec4: Implement loop statements (nir_cf_node_loop)
>>> >>>>>>>   i965/nir/vec4: Implement load_input intrinsic
>>> >>>>>>>   i965/vec4: Make sure that register types always match during
>>> >>>>>>>     emit_urb_slot()
>>> >>>>>>>   nir-lower_io: Store data.location instead, in const_index[0] of
>>> >>>>>>>     store_output
>>> >>>>>>>   i965/nir/vec4: Implement store_output intrinsic
>>> >>>>>>>   i965/nir/vec4: Implement nir_emit_jump
>>> >>>>>>>   i965/nir: Add new utility method
>> brw_glsl_base_type_for_nir_type()
>>> >>>>>>>   i965/vec4: Move is_high_sample() method to vec4_visitor class
>>> >>>>>>>   i965/vec4: Change vec4_visitor::emit_mcs_fetch() method to allow
>>> >>> reuse
>>> >>>>>>>   i965/vec4: Change vec4_visitor::gather_channel() method to allow
>>> >>> reuse
>>> >>>>>>>
>>> >>>>>>> Iago Toral Quiroga (11):
>>> >>>>>>>   i965/nir/vec4: Add setup of uniform variables
>>> >>>>>>>   nir/nir_lower_io: Add vec4 support
>>> >>>>>>>   i965/nir: Dot not assign direct uniform locations first for
>>> >>> vec4-based
>>> >>>>>>>     shaders
>>> >>>>>>>   i965/nir/vec4: Implement conditional statements (nir_cf_node_if)
>>> >>>>>>>   i965/nir/vec4: Implement load_uniform intrinsic
>>> >>>>>>>   i965/nir: Enable NIR-vec4 pass on geometry shaders
>>> >>>>>>>   i965/gs: Refactor ir_emit_vertex and ir_end_primitive
>>> >>>>>>>   i965/nir/gs: Handle geometry shaders inputs
>>> >>>>>>>   i965/nir/gs: Implement EmitVertex and EndPrimitive
>>> >>>>>>>   i965/nir/gs: Implement support for gl_InvocationID system value
>>> >>>>>>>   i965/nir: Do not scalarize phis in non-scalar setups
>>> >>>>>>>
>>> >>>>>>> Samuel Iglesias Gonsalvez (2):
>>> >>>>>>>   nir: Fix output swizzle in get_mul_for_src
>>> >>>>>>>   i965/gs/gen6: Refactor ir_emit_vertex and ir_end_primitive
>> for gen6
>>> >>>>>>>
>>> >>>>>>>  src/glsl/nir/nir.h                                |   18 +-
>>> >>>>>>>  src/glsl/nir/nir_lower_io.c                       |   99 +-
>>> >>>>>>>  src/glsl/nir/nir_opt_peephole_ffma.c              |   13 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/Makefile.sources        |    2 +
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp          |   34 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_ir_vec4.h           |    2 +
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_nir.c               |   93 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_nir.h               |    8 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_program.c           |    6 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_reg.h               |    6 +
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_shader.cpp          |   24 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp            |   53 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vec4.h              |   82 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp     |  118 ++
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |   27 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h   |    9 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp        | 1507
>>> >>> +++++++++++++++++++++
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp    |  470 ++++---
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp |    5 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/brw_vs.h                |    3 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp     |   13 +-
>>> >>>>>>>  src/mesa/drivers/dri/i965/gen6_gs_visitor.h       |    2 +
>>> >>>>>>>  22 files changed, 2263 insertions(+), 331 deletions(-)
>>> >>>>>>>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
>>> >>>>>>>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>>> >>>>>>>
>>> >>>>>>> --
>