[Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

Matt Turner mattst88 at gmail.com
Mon Dec 5 23:21:45 UTC 2016


On 10/11, Iago Toral Quiroga wrote:
>It's been some time since

... anyone has reviewed your patches. Sorry. :(

I'm going to review from your rebased i965-fp64-gen7-scalar-vec4-rc2
branch. There have probably been some reorderings or other changes due
to rebasing since the patches were sent, so I'm going to paste the list
of patches below and then attempt to list any review comments after the
patch name.


A couple of patches have an extra newline in the commit message between
*-by: tags. Would be nice to make a pass through and fix that.


i965/nir: double/dvec2 uniforms only need to be padded to a single vec4 slot
i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
i965/vec4/nir: allocate two registers for dvec3/dvec4
i965/vec4/nir: Add bit-size information to types
i965/vec4/nir: support doubles in ALU operations
i965/vec4/nir: set the right type for 64-bit registers
i965/vec4/nir: fix emitting 64-bit immediates
i965/vec4: add support for printing DF immediates
i965/vec4: add double/float conversion pseudo-opcodes

	I wonder if we should allow MOV F/DF and DF/F operations in the
	IR and then have a lowering pass that "legalizes" them. I'm
	happy to leave that experiment for after this series lands.

i965/vec4: translate d2f/f2d
i965: add brw_vecn_grf()
i965/vec4: set correct register regions for 32-bit and 64-bit
i965/disasm: align16 DF source regions have a width of 2

	It's actually kind of weird to print width and horizontal stride
	for align16 sources, since they don't exist in the instruction
	word. We should probably print only the vertical stride. I don't
	care if that's fixed a part of this series.

i965/vec4: We only support 32-bit integer ALU operations for now
i965/vec4: add dst_null_df()
i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes
i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes

	If I understand correctly, these opcodes map to instructions
	like

		mov(XXX) dst<1>UD  src<8,4,2>:UD

	Is the exec_size 4? I ask, because if it's 8 (and the source
	region spans two registers and the dest region spans one) that's
	not a legal instruction. If it's 4, then it's legal.

i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT
i965/vec4: don't copy propagate vector opcodes that operate in align1 mode
i965/vec4: implement double unpacking

	This emits 
	
		MOV        dvec4_tmp, op[0]
		PICK_LO/HI uvec4_tmp, dvec4_tmp
		MOV        dst, uvec4_tmp
	
	I'm confused about the purpose of the MOVs. It seems like op[0]
	should already be a dvec and dst should already be a uvec.

i965/vec4: implement double packing

	More or less the same thing here. Looks like we don't need all
	of the MOVs.

i965/vec4/nir: implement double comparisons

	Trivial: A newline before the if() would be nice.

	I have a memory of Curro telling me that the hardware maps each
	32-bit chunk in the dst to a single bit in the flag register.
	Maybe that's only on IVB, and maybe I'm misremembering. I'm
	concerned that while the PICK_LOW+MOV will properly handle the
	result that is written to the destination, the result written to
	the flag register might be incorrect.

	My commit d9b09f8a30 fixed some problems that seems similar in
	my mind.

i965/vec4: fix indentation in get_nir_src()
i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations
i965/vec4: make opt_vector_float ignore doubles
i965/vec4: fix register allocation for 64-bit undef sources
i965/vec4: Rename DF to/from F generator opcodes

	I'm not sure replacing "float" with "single" implies that the
	opcodes can handle other 32-bit (integer) types, since "single"
	is actually the name of the "float" type in some other
	programming languages.

	Maybe call them VEC4_OPCODE_TO_DOUBLE and
	VEC4_OPCODE_FROM_DOUBLE?

i965/vec4: add helpers for conversions to/from doubles

	Same thing here.

	Also, same confusion about the purpose of the MOVs.

i965/vec4: implement hardware workaround for align16 double to float conversion

	This always seemed like a really strange hardware bug, and one
	that no one should ever hit.

	I'd prefer that, instead of loading an immediate double and then
	performing a conversion to float, that we just convert the
	double to float in the compiler and emit an instruction to load
	that.

i965/vec4: implement d2i, d2u, i2d and u2d
i965/vec4: implement d2b

	Trivial: s/Curo/Curro/ in commit message.

	Trivial: The comment says "predicated MOV", but it's actually a
	MOV with conditional_mod.

i965/vec4: implement fsign() for doubles

	Trivial: v2 comment and comment in code say "predicated MOV"
	like previous patch

i965/vec4: fix optimize predicate for doubles

	I guess I'm pleasantly surprised that the any/all predicates do
	the right thing in the presence of doubles...

	(Are you sure? :)

i965/vec4: add a helper function to create double immediates

	Can leave for later: Shouldn't we use the DIM instruction (on
	HSW)?

	I'm not sure if this should be fixed now or later, but shouldn't
	we use NibCtrl on these two instructions instead of
	force_writemask_all? I think this is a case where NibCtrl is
	useful.

i965/vec4: use the new helper function to create double immediates
i965: move exec_size from fs_instruction to backend_instruction
i965/vec4: fix size_written for doubles
i965/vec4: fix regs_read() for doubles
i965/vec4: use the IR's execution size
i965/vec4: dump the instruction execution size



==== I made it to here today. ====

Anything with no or trivial feedback is

Reviewed-by: Matt Turner <mattst88 at gmail.com>

I will continue with these tomorrow:

i965/vec4: handle 32 and 64 bit channels in liveness analysis
i965/vec4: add a horiz_offset() helper
i965: move the group field from fs_inst to backend_instruction.
i965/vec4: add a SIMD lowering pass
i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions
i965/vec4: dump NibCtrl for instructions with execsize != 8
i965/disasm: print NibCtrl for instructions with execsize < 8
i965/vec4: teach CSE about exec_size, group and doubles
i965/vec4: teach cmod propagation about different execution sizes
i965/vec4: split double-precision bcsel
i965/vec4: add a scalarization pass for double-precision instructions
i965/vec4: translate 64-bit swizzles to 32-bit
i965/vec4: implement access to DF source components Z/W
i965/disasm: fix subreg for dst in Align16 mode
i965/vec4: teach register coalescing about 64-bit
i965/vec4: fix pack_uniform_registers for doubles
i965/vec4: fix indentation in pack_uniform_registers
i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands
i965/vec4/nir: do not emit 64-bit MAD
i965/vec4: do not emit 64-bit MAD
i965/vec4: support multiple dispatch widths and groups in the IR builder.
i965/vec4: Add a shuffle_64bit_data helper
i965/vec4: Fix UBO loads for 64-bit data
i965/vec4: Fix SSBO loads for 64-bit data
i965/vec4: Fix SSBO stores for 64-bit data
i965/vec4: don't constant propagate 64-bit immediates
i965/vec4: prevent copy-propagation from values with a different type size
i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions
i965/vec4: don't propagate single-precision uniforms into 4-wide instructions
i965/vec4: don't copy propagate misaligned registers
i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms
i965/vec4: Do not use DepCtrl with 64-bit instructions
i965/vec4: do not split scratch read/write opcodes
i965/vec4: fix scratch offset for 64bit data
i965/vec4: fix scratch reads for 64bit data
i965/vec4: fix scratch writes for 64bit data
i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data
i965/vec4: fix indentation in move_push_constants_to_pull_constants()
i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data
i965/vec4: make emit_pull_constant_load support 64-bit loads
i965/vec4: fix indentation in lower_attributes_to_hw_regs()
i965/vec4: fix attribute setup for doubles
i965/vec4: fix store output for 64-bit types
i965/vec4/gs: fix input loading for 64bit data
i965/vec4/tcs: fix input loading for 64-bit data
i965/vec4/tcs: fix outputs for 64-bit data
i965/vec4/tes: fix input loading for 64bit data types
i965/vec4/tes: fix setup_payload() for 64bit data types
i965/vec4/tes: consider register offsets during attribute setup
i965/vec4: dump subnr for FIXED_GRF
i965/vec4: split instructions that read 64-bit interleaved attributes
i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively
i965/vec4/scalarize_df: support more swizzles via vstride=0
i965/vec4: prevent src/dst hazards during 64-bit register allocation
i965/vec4: run scalarize_df() after spilling
i965/vec4: support basic spilling of 64-bit registers
i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit access
i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination
i965/vec4: adjust spilling costs for 64-bit registers.
i965/vec4: enable ARB_gpu_shader_fp64 for Haswell
i965/gen7: expose OpenGL 4.0 on Haswell
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161205/7dfa6254/attachment.sig>


More information about the mesa-dev mailing list