[Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

Iago Toral itoral at igalia.com
Tue Dec 13 12:24:06 UTC 2016


On Tue, 2016-12-13 at 13:22 +0100, Samuel Iglesias Gonsálvez wrote:
> On Sun, 2016-12-11 at 15:00 -0800, Matt Turner wrote:
> > 
> > i965/vec4: handle 32 and 64 bit channels in liveness analysis
> > 
> > 	Please indent the returned multiline expressions in
> > 	var_from_reg() like we do elsewhere, so that the second line
> > 	begins on the same column as the first line.
> > 
> > 	*/ goes on its own line.
> > 
> > 	I'm having a hard time reviewing this one. The logic is rather
> > 	complex. I'll ask someone to help me review it on Monday at the
> > 	office.
> > 
> OK, no problem!

I think for this one you want to ping Curro, he reviewed parts of it
when Juan wrote it and he suggested changes so he is familiar with the
solution.

Iago

> > 
> > i965/vec4: add a horiz_offset() helper
> > i965: move the group field from fs_inst to backend_instruction.
> > i965/vec4: add a SIMD lowering pass
> > 
> > 	In the commit message, you say
> > 
> > 	    For now the pass only handles the gen7 restriction where
> > any
> > 	    instruction that writes 2 registers also needs to read 2
> > 	    registers.  This affects double-precision instructions
> > 	    reading uniforms, for example. Later patches will extend
> > the
> > 	    lowering pass adding a few more cases.
> > 
> > 	But the rule about if-writing-two-regs, must-read-two-regs
> > 	says that scalar sources are an exception:
> > 
> > 		"When source is scalar, the source registers are not
> > 		 incremented."
> > 
> > 	I don't see any code that allows us to avoid splitting an
> > 	instruction if it's writing two registers but sourcing a scalar
> > 	uniform. Maybe this doesn't apply because we have to use a non
> > 	scalar swizzle (.xy) to access a single fp64 component?
> > 
> Right, however, this is necessary in align16 because uniforms are
> vectors (hstride != 0) so they are not scalars.
> 
> > 
> > i965/vec4: make the generator set correct NibCtrl for SIMD4 DF
> > instructions
> > i965/vec4: dump NibCtrl for instructions with execsize != 8
> > i965/disasm: print NibCtrl for instructions with execsize < 8
> > i965/vec4: teach CSE about exec_size, group and doubles
> > i965/vec4: teach cmod propagation about different execution sizes
> > i965/vec4: split double-precision bcsel
> > 
> > 	bcsel is the NIR opcode. I'd change references to bcsel to SEL.
> > 
> > 	Very interesting find...
> > 
> > i965/vec4: add a scalarization pass for double-precision
> > instructions
> > 
> > 	Don't indent case inside a switch.
> > 
> > i965/vec4: translate 64-bit swizzles to 32-bit
> > i965/vec4: implement access to DF source components Z/W
> > 
> > 	Wow, bien hecho!
> > 
> > i965/disasm: fix subreg for dst in Align16 mode
> > i965/vec4: teach register coalescing about 64-bit
> > i965/vec4: fix pack_uniform_registers for doubles
> > i965/vec4: fix indentation in pack_uniform_registers
> > i965/vec4: Skip swizzle to subnr in 3src instructions with DF
> > operands
> > 
> > 	s/need/needs/ in the comment.
> > 
> > i965/vec4/nir: do not emit 64-bit MAD
> > i965/vec4: do not emit 64-bit MAD
> > 
> > 	I might change the name of this commit to "i965/vec4: Lower
> > 	64-bit MAD" or "i965/vec4: Lower DF MAD"
> > 
> > 	I think I'd change the name of the function as well, maybe to
> > 	lower_64bit_mad[_to_mul_add] or something.
> > 
> OK, we will do the rename.
> 
> > 
> > i965/vec4: support multiple dispatch widths and groups in the IR
> > builder.
> > i965/vec4: Add a shuffle_64bit_data helper
> > 
> > 	I was initially confused by r0.0:DF/r0.1:DF, thinking that .1
> > in
> > 	r0.1:DF was a subreg offset. But I think it's actually the
> > 	register offset (i.e., .offset)?
> > 
> > 	If that's the case, I think it would be clearer just to
> > 	increment the register number in the example:
> > 
> > 		r0.0:DF  x0 y0 z0 w0
> > 		r1.0:DF  x1 y1 z1 w1
> > 
> > 	s/opperation/operation/ in the comment.
> > 
> > 	On the multiline bld.group(...), I think Curro's style is to
> > 	align with the '.'. For instance,
> > 
> > 	inst = bld.group(4, for_write ? 1 : 0)
> > 	          .MOV(writemask(dst, WRITEMASK_ZW),
> > 	               swizzle(byte_offset(src, REG_SIZE),
> > BRW_SWIZZLE_XYXY));
> > 
> > 	so that group and MOV align, with the '.' on the same line as
> > 	the MOV.
> > 
> Regarding the example, yes, you are right. We are going to fix it.
> Thanks for the rest of suggestions, we will do them too :-)
> 
> 
> > 
> > i965/vec4: Fix UBO loads for 64-bit data
> > i965/vec4: Fix SSBO loads for 64-bit data
> > i965/vec4: Fix SSBO stores for 64-bit data
> > i965/vec4: don't constant propagate 64-bit immediates
> > i965/vec4: prevent copy-propagation from values with a different
> > type
> > size
> > i965/vec4: Prevent copy propagation from violating pre-gen8
> > restrictions
> > 
> > 	Similar comment as before about being allowed to write two
> > 	registers while sourcing a scalar. Maybe doesn't apply because
> > 	of the double swizzle.
> > 
> Same reply as to the other question.
> 
> > 
> > i965/vec4: don't propagate single-precision uniforms into 4-wide
> > instructions
> > i965/vec4: don't copy propagate misaligned registers
> > i965/vec4: extend the DWORD multiply DepCtrl restriction to all
> > gen8
> > platforms
> > 
> > 	I don't see this in the BDW PRMs, and the internal
> > documentation
> > 	says that it applies to "CHV, BXT"
> > 
> > 	I suggest dropping this patch (or replacing it with one that
> > 	adds || devinfo->is_broxton).
> > 
> It is present in BDW PRMs, Volume 7: 3D-Media-GPGPU, "Special
> Requirements for Handling Double Precision Data Types",
> page 837:
>     
> "When source or destination datatype is 64b or operation is integer
> DWord multiply, DepCtrl must not be used."
> 
> Regarding Broxton, there is no public PRMs here [0]. So if your
> internal docs say it is needed, then we must fix this patch.
> 
> [0] https://01.org/linuxgraphics/documentation/hardware-specification
> -p
> rms
> 
> 
> > 
> > i965/vec4: Do not use DepCtrl with 64-bit instructions
> > i965/vec4: do not split scratch read/write opcodes
> > i965/vec4: fix scratch offset for 64bit data
> > i965/vec4: fix scratch reads for 64bit data
> > i965/vec4: fix scratch writes for 64bit data
> > i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-
> > bit data
> > i965/vec4: fix indentation in
> > move_push_constants_to_pull_constants()
> > i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit
> > data
> > i965/vec4: make emit_pull_constant_load support 64-bit loads
> > i965/vec4: fix indentation in lower_attributes_to_hw_regs()
> > i965/vec4: fix attribute setup for doubles
> > i965/vec4: fix store output for 64-bit types
> > i965/vec4/gs: fix input loading for 64bit data
> > i965/vec4/tcs: fix input loading for 64-bit data
> > i965/vec4/tcs: fix outputs for 64-bit data
> > i965/vec4/tes: fix input loading for 64bit data types
> > i965/vec4/tes: fix setup_payload() for 64bit data types
> > i965/vec4/tes: consider register offsets during attribute setup
> > i965/vec4: dump subnr for FIXED_GRF
> > i965/vec4: split instructions that read 64-bit interleaved
> > attributes
> > 
> > 	I think you want to make stage_uses_interleaved_attributes
> > 	static.
> > 
> > 	Also don't indent case inside switch.
> > 
> > i965/vec4/scalarize_df: do not scalarize swizzles that we can
> > support
> > natively
> > i965/vec4/scalarize_df: support more swizzles via vstride=0
> > i965/vec4: prevent src/dst hazards during 64-bit register
> > allocation
> > i965/vec4: run scalarize_df() after spilling
> > 
> > 	Calling scalarize_df() on code that's already been scalarized
> > 	seems a little scary...
> > 
> > 	Are you sure that scalarize_df() will not modify any code that
> > 	has already been scalarized?
> > 
> Yes, it is safe. The scalarization pass operates on logical
> channels, 
> so for something like:
> 
> 		mov r0.xy:df r1.zw:df
> 
> it produces:
> 
> 		mov r0.x:df r1.z:dfmov r0.y:df r1.w:df
> 
> and if we run the pass again nothing will change for these
> instructions
> because they only operate on a single channel, so we end up with the
> same thing. It would have been a different story if we also
> translated
> the logical channels into 32-bit channels during the scalarization
> process, but that won't happen until right before codegen precisely
> so
> we can avoid this sort of problems (specifically we do it in
> convert_to_hw_regs).
> 
> > 
> > i965/vec4: support basic spilling of 64-bit registers
> > i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit
> > access
> > i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination
> > i965/vec4: adjust spilling costs for 64-bit registers.
> > i965/vec4: enable ARB_gpu_shader_fp64 for Haswell
> > i965/gen7: expose OpenGL 4.0 on Haswell
> > 
> > 	There's been a lot of discussion about this patch that I
> > haven't
> > 	been involved in. Presumably given my review of the previous
> > 100
> > 	patches, someone previously involved can be bothered to
> > rereview
> > 	this one...
> OK, no problem. Thanks for reviewing all the other patches! We will
> ping Curro about this one since he was the one who triggered the v3.
> 
> Sam


More information about the mesa-dev mailing list