[Mesa-dev] [PATCH] R600/SI: Add pattern for truncating i32 to i1

Tom Stellard tom at stellard.net
Mon Jan 27 07:48:21 PST 2014


On Mon, Jan 27, 2014 at 04:43:14PM +0900, Michel Dänzer wrote:
> On Fre, 2014-01-24 at 07:40 -0800, Tom Stellard wrote:
> > On Fri, Jan 24, 2014 at 01:27:00PM +0900, Michel Dänzer wrote:
> > > From: Michel Dänzer <michel.daenzer at amd.com>
> > > 
> > > Fixes half a dozen piglit tests with radeonsi.
> > > 
> > > Signed-off-by: Michel Dänzer <michel.daenzer at amd.com>
> > > ---
> > >  lib/Target/R600/SIInstructions.td |  5 +++++
> > >  test/CodeGen/R600/trunc.ll        | 10 ++++++++++
> > >  2 files changed, 15 insertions(+)
> > > 
> > > diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> > > index 03e7e32..b7b710f 100644
> > > --- a/lib/Target/R600/SIInstructions.td
> > > +++ b/lib/Target/R600/SIInstructions.td
> > > @@ -2126,6 +2126,11 @@ def : Pat <
> > >    (EXTRACT_SUBREG $a, sub0)
> > >  >;
> > >  
> > > +def : Pat <
> > > +  (i1 (trunc i32:$a)),
> > > +  (V_CMP_EQ_I32_e64 (V_AND_B32_e32 (i32 1), $a), 1)
> > > +>;
> > > +
> > 
> > I'm guessing you added V_CMP_EQ_I32_e64 in order to make the types match.
> 
> Not really. The truncation is used for testing whether the LSB of an i32
> value is set, and storing the resulting boolean as an i1 value. My
> pattern does this for the VGPRs in all thread of a wavefront in
> parallel, storing the resulting boolean bits in a 64-bit SGPR.
> 

Ok, I didn't realize this pattern was meant to be used with control flow
instructions.  The pattern is fine as is.  The patch is:

Reviewed-by: Tom Stellard <thomas.stellard at amd.com>

> 
> > Try this pattern instead:
> > 
> > def : Pat <
> >   (i1 (trunc i32:$a)),
> >   (COPY_TO_REGCLASS (V_AND_B32_e32 (i32 1), $a), VReg_32)
> 
> I don't understand the idea behind your suggestion, can you elaborate?
> 

Without the COPY_TO_REGCLASS, LLVM tablegen will complain because the
output register of V_AND_V32_e32 (VReg_32) does not support i1 types.
Adding the COPY_TO_REGCLASS allows tablegen to accept the pattern,
because COPY_TO_REGCLASS is untyped.


> Anyway, it fails for one of the relevant piglit tests:
> 
> amd_vertex_shader_layer-layered-2d-texture-render: /home/daenzer/src/llvm-git/llvm/lib/Target/R600/SIInstrInfo.cpp:133: virtual void llvm::SIInstrInfo::copyPhysReg(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::iterator, llvm::DebugLoc, unsigned int, unsigned int, bool) const: Assertion `AMDGPU::VReg_64RegClass.contains(SrcReg) || AMDGPU::SReg_64RegClass.contains(SrcReg)' failed.
> 
> 
> -- 
> Earthling Michel Dänzer            |                  http://www.amd.com
> Libre software enthusiast          |                Mesa and X developer
> 


More information about the mesa-dev mailing list