[Mesa-dev] [PATCH] R600/SI: Add pattern for truncating i32 to i1
Tom Stellard
tom at stellard.net
Mon Jan 27 07:48:21 PST 2014
On Mon, Jan 27, 2014 at 04:43:14PM +0900, Michel Dänzer wrote:
> On Fre, 2014-01-24 at 07:40 -0800, Tom Stellard wrote:
> > On Fri, Jan 24, 2014 at 01:27:00PM +0900, Michel Dänzer wrote:
> > > From: Michel Dänzer <michel.daenzer at amd.com>
> > >
> > > Fixes half a dozen piglit tests with radeonsi.
> > >
> > > Signed-off-by: Michel Dänzer <michel.daenzer at amd.com>
> > > ---
> > > lib/Target/R600/SIInstructions.td | 5 +++++
> > > test/CodeGen/R600/trunc.ll | 10 ++++++++++
> > > 2 files changed, 15 insertions(+)
> > >
> > > diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> > > index 03e7e32..b7b710f 100644
> > > --- a/lib/Target/R600/SIInstructions.td
> > > +++ b/lib/Target/R600/SIInstructions.td
> > > @@ -2126,6 +2126,11 @@ def : Pat <
> > > (EXTRACT_SUBREG $a, sub0)
> > > >;
> > >
> > > +def : Pat <
> > > + (i1 (trunc i32:$a)),
> > > + (V_CMP_EQ_I32_e64 (V_AND_B32_e32 (i32 1), $a), 1)
> > > +>;
> > > +
> >
> > I'm guessing you added V_CMP_EQ_I32_e64 in order to make the types match.
>
> Not really. The truncation is used for testing whether the LSB of an i32
> value is set, and storing the resulting boolean as an i1 value. My
> pattern does this for the VGPRs in all thread of a wavefront in
> parallel, storing the resulting boolean bits in a 64-bit SGPR.
>
Ok, I didn't realize this pattern was meant to be used with control flow
instructions. The pattern is fine as is. The patch is:
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
>
> > Try this pattern instead:
> >
> > def : Pat <
> > (i1 (trunc i32:$a)),
> > (COPY_TO_REGCLASS (V_AND_B32_e32 (i32 1), $a), VReg_32)
>
> I don't understand the idea behind your suggestion, can you elaborate?
>
Without the COPY_TO_REGCLASS, LLVM tablegen will complain because the
output register of V_AND_V32_e32 (VReg_32) does not support i1 types.
Adding the COPY_TO_REGCLASS allows tablegen to accept the pattern,
because COPY_TO_REGCLASS is untyped.
> Anyway, it fails for one of the relevant piglit tests:
>
> amd_vertex_shader_layer-layered-2d-texture-render: /home/daenzer/src/llvm-git/llvm/lib/Target/R600/SIInstrInfo.cpp:133: virtual void llvm::SIInstrInfo::copyPhysReg(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::iterator, llvm::DebugLoc, unsigned int, unsigned int, bool) const: Assertion `AMDGPU::VReg_64RegClass.contains(SrcReg) || AMDGPU::SReg_64RegClass.contains(SrcReg)' failed.
>
>
> --
> Earthling Michel Dänzer | http://www.amd.com
> Libre software enthusiast | Mesa and X developer
>
More information about the mesa-dev
mailing list