[Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

Iago Toral itoral at igalia.com
Wed Dec 7 09:03:00 UTC 2016


On Mon, 2016-12-05 at 15:21 -0800, Matt Turner wrote:
> On 10/11, Iago Toral Quiroga wrote:
> > 
> > It's been some time since
> ... anyone has reviewed your patches. Sorry. :(
> 
> I'm going to review from your rebased i965-fp64-gen7-scalar-vec4-rc2
> branch. There have probably been some reorderings or other changes
> due
> to rebasing since the patches were sent, so I'm going to paste the
> list
> of patches below and then attempt to list any review comments after
> the
> patch name.

Hey Matt, thanks for reviewing this! I am on holidays this week but
I'll go through all your comments starting Monday next week.

Iago

> 
> A couple of patches have an extra newline in the commit message
> between
> *-by: tags. Would be nice to make a pass through and fix that.
> 
> 
> i965/nir: double/dvec2 uniforms only need to be padded to a single
> vec4 slot
> i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
> i965/vec4/nir: allocate two registers for dvec3/dvec4
> i965/vec4/nir: Add bit-size information to types
> i965/vec4/nir: support doubles in ALU operations
> i965/vec4/nir: set the right type for 64-bit registers
> i965/vec4/nir: fix emitting 64-bit immediates
> i965/vec4: add support for printing DF immediates
> i965/vec4: add double/float conversion pseudo-opcodes
> 
> 	I wonder if we should allow MOV F/DF and DF/F operations in the
> 	IR and then have a lowering pass that "legalizes" them. I'm
> 	happy to leave that experiment for after this series lands.
> 
> i965/vec4: translate d2f/f2d
> i965: add brw_vecn_grf()
> i965/vec4: set correct register regions for 32-bit and 64-bit
> i965/disasm: align16 DF source regions have a width of 2
> 
> 	It's actually kind of weird to print width and horizontal
> stride
> 	for align16 sources, since they don't exist in the instruction
> 	word. We should probably print only the vertical stride. I
> don't
> 	care if that's fixed a part of this series.
> 
> i965/vec4: We only support 32-bit integer ALU operations for now
> i965/vec4: add dst_null_df()
> i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes
> i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes
> 
> 	If I understand correctly, these opcodes map to instructions
> 	like
> 
> 		mov(XXX) dst<1>UD  src<8,4,2>:UD
> 
> 	Is the exec_size 4? I ask, because if it's 8 (and the source
> 	region spans two registers and the dest region spans one)
> that's
> 	not a legal instruction. If it's 4, then it's legal.
> 
> i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT
> i965/vec4: don't copy propagate vector opcodes that operate in align1
> mode
> i965/vec4: implement double unpacking
> 
> 	This emits 
> 	
> 		MOV        dvec4_tmp, op[0]
> 		PICK_LO/HI uvec4_tmp, dvec4_tmp
> 		MOV        dst, uvec4_tmp
> 	
> 	I'm confused about the purpose of the MOVs. It seems like op[0]
> 	should already be a dvec and dst should already be a uvec.
> 
> i965/vec4: implement double packing
> 
> 	More or less the same thing here. Looks like we don't need all
> 	of the MOVs.
> 
> i965/vec4/nir: implement double comparisons
> 
> 	Trivial: A newline before the if() would be nice.
> 
> 	I have a memory of Curro telling me that the hardware maps each
> 	32-bit chunk in the dst to a single bit in the flag register.
> 	Maybe that's only on IVB, and maybe I'm misremembering. I'm
> 	concerned that while the PICK_LOW+MOV will properly handle the
> 	result that is written to the destination, the result written
> to
> 	the flag register might be incorrect.
> 
> 	My commit d9b09f8a30 fixed some problems that seems similar in
> 	my mind.
> 
> i965/vec4: fix indentation in get_nir_src()
> i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations
> i965/vec4: make opt_vector_float ignore doubles
> i965/vec4: fix register allocation for 64-bit undef sources
> i965/vec4: Rename DF to/from F generator opcodes
> 
> 	I'm not sure replacing "float" with "single" implies that the
> 	opcodes can handle other 32-bit (integer) types, since "single"
> 	is actually the name of the "float" type in some other
> 	programming languages.
> 
> 	Maybe call them VEC4_OPCODE_TO_DOUBLE and
> 	VEC4_OPCODE_FROM_DOUBLE?
> 
> i965/vec4: add helpers for conversions to/from doubles
> 
> 	Same thing here.
> 
> 	Also, same confusion about the purpose of the MOVs.
> 
> i965/vec4: implement hardware workaround for align16 double to float
> conversion
> 
> 	This always seemed like a really strange hardware bug, and one
> 	that no one should ever hit.
> 
> 	I'd prefer that, instead of loading an immediate double and
> then
> 	performing a conversion to float, that we just convert the
> 	double to float in the compiler and emit an instruction to load
> 	that.
> 
> i965/vec4: implement d2i, d2u, i2d and u2d
> i965/vec4: implement d2b
> 
> 	Trivial: s/Curo/Curro/ in commit message.
> 
> 	Trivial: The comment says "predicated MOV", but it's actually a
> 	MOV with conditional_mod.
> 
> i965/vec4: implement fsign() for doubles
> 
> 	Trivial: v2 comment and comment in code say "predicated MOV"
> 	like previous patch
> 
> i965/vec4: fix optimize predicate for doubles
> 
> 	I guess I'm pleasantly surprised that the any/all predicates do
> 	the right thing in the presence of doubles...
> 
> 	(Are you sure? :)
> 
> i965/vec4: add a helper function to create double immediates
> 
> 	Can leave for later: Shouldn't we use the DIM instruction (on
> 	HSW)?
> 
> 	I'm not sure if this should be fixed now or later, but
> shouldn't
> 	we use NibCtrl on these two instructions instead of
> 	force_writemask_all? I think this is a case where NibCtrl is
> 	useful.
> 
> i965/vec4: use the new helper function to create double immediates
> i965: move exec_size from fs_instruction to backend_instruction
> i965/vec4: fix size_written for doubles
> i965/vec4: fix regs_read() for doubles
> i965/vec4: use the IR's execution size
> i965/vec4: dump the instruction execution size
> 
> 
> 
> ==== I made it to here today. ====
> 
> Anything with no or trivial feedback is
> 
> Reviewed-by: Matt Turner <mattst88 at gmail.com>
> 
> I will continue with these tomorrow:
> 
> i965/vec4: handle 32 and 64 bit channels in liveness analysis
> i965/vec4: add a horiz_offset() helper
> i965: move the group field from fs_inst to backend_instruction.
> i965/vec4: add a SIMD lowering pass
> i965/vec4: make the generator set correct NibCtrl for SIMD4 DF
> instructions
> i965/vec4: dump NibCtrl for instructions with execsize != 8
> i965/disasm: print NibCtrl for instructions with execsize < 8
> i965/vec4: teach CSE about exec_size, group and doubles
> i965/vec4: teach cmod propagation about different execution sizes
> i965/vec4: split double-precision bcsel
> i965/vec4: add a scalarization pass for double-precision instructions
> i965/vec4: translate 64-bit swizzles to 32-bit
> i965/vec4: implement access to DF source components Z/W
> i965/disasm: fix subreg for dst in Align16 mode
> i965/vec4: teach register coalescing about 64-bit
> i965/vec4: fix pack_uniform_registers for doubles
> i965/vec4: fix indentation in pack_uniform_registers
> i965/vec4: Skip swizzle to subnr in 3src instructions with DF
> operands
> i965/vec4/nir: do not emit 64-bit MAD
> i965/vec4: do not emit 64-bit MAD
> i965/vec4: support multiple dispatch widths and groups in the IR
> builder.
> i965/vec4: Add a shuffle_64bit_data helper
> i965/vec4: Fix UBO loads for 64-bit data
> i965/vec4: Fix SSBO loads for 64-bit data
> i965/vec4: Fix SSBO stores for 64-bit data
> i965/vec4: don't constant propagate 64-bit immediates
> i965/vec4: prevent copy-propagation from values with a different type
> size
> i965/vec4: Prevent copy propagation from violating pre-gen8
> restrictions
> i965/vec4: don't propagate single-precision uniforms into 4-wide
> instructions
> i965/vec4: don't copy propagate misaligned registers
> i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8
> platforms
> i965/vec4: Do not use DepCtrl with 64-bit instructions
> i965/vec4: do not split scratch read/write opcodes
> i965/vec4: fix scratch offset for 64bit data
> i965/vec4: fix scratch reads for 64bit data
> i965/vec4: fix scratch writes for 64bit data
> i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-
> bit data
> i965/vec4: fix indentation in move_push_constants_to_pull_constants()
> i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit
> data
> i965/vec4: make emit_pull_constant_load support 64-bit loads
> i965/vec4: fix indentation in lower_attributes_to_hw_regs()
> i965/vec4: fix attribute setup for doubles
> i965/vec4: fix store output for 64-bit types
> i965/vec4/gs: fix input loading for 64bit data
> i965/vec4/tcs: fix input loading for 64-bit data
> i965/vec4/tcs: fix outputs for 64-bit data
> i965/vec4/tes: fix input loading for 64bit data types
> i965/vec4/tes: fix setup_payload() for 64bit data types
> i965/vec4/tes: consider register offsets during attribute setup
> i965/vec4: dump subnr for FIXED_GRF
> i965/vec4: split instructions that read 64-bit interleaved attributes
> i965/vec4/scalarize_df: do not scalarize swizzles that we can support
> natively
> i965/vec4/scalarize_df: support more swizzles via vstride=0
> i965/vec4: prevent src/dst hazards during 64-bit register allocation
> i965/vec4: run scalarize_df() after spilling
> i965/vec4: support basic spilling of 64-bit registers
> i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit
> access
> i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination
> i965/vec4: adjust spilling costs for 64-bit registers.
> i965/vec4: enable ARB_gpu_shader_fp64 for Haswell
> i965/gen7: expose OpenGL 4.0 on Haswell


More information about the mesa-dev mailing list