[Mesa-dev] [PATCH] radeon/llvm: Handle TGSI KIL opcode for SI.

Tue Aug 28 11:11:04 PDT 2012

On Die, 2012-08-28 at 12:07 -0400, Tom Stellard wrote: 
> On Tue, Aug 28, 2012 at 04:26:43PM +0200, Michel Dänzer wrote:
> > From: Michel Dänzer <michel.daenzer at amd.com>
> > 
> > Fixes piglit fp-kil with radeonsi.
> >
> 
> > Signed-off-by: Michel Dänzer <michel.daenzer at amd.com>
> > ---
> >  src/gallium/drivers/radeon/SIISelLowering.cpp |   35 +++++++++++++++++++++++++
> >  src/gallium/drivers/radeon/SIISelLowering.h   |    2 ++
> >  src/gallium/drivers/radeon/SIInstructions.td  |    7 +++++
> >  3 files changed, 44 insertions(+)
> > 
> > diff --git a/src/gallium/drivers/radeon/SIISelLowering.cpp b/src/gallium/drivers/radeon/SIISelLowering.cpp
> > index 092c2fa..f5eac16 100644
> > --- a/src/gallium/drivers/radeon/SIISelLowering.cpp
> > +++ b/src/gallium/drivers/radeon/SIISelLowering.cpp
> > @@ -129,6 +129,9 @@ MachineBasicBlock * SITargetLowering::EmitInstrWithCustomInserter(
> >    case AMDGPU::SI_INTERP_CONST:
> >      LowerSI_INTERP_CONST(MI, *BB, I);
> >      break;
> > +  case AMDGPU::SI_KIL:
> > +    LowerSI_KIL(MI, *BB, I, MRI);
> > +    break;
> >    case AMDGPU::SI_V_CNDLT:
> >      LowerSI_V_CNDLT(MI, *BB, I, MRI);
> >      break;
> > @@ -193,6 +196,38 @@ void SITargetLowering::LowerSI_INTERP_CONST(MachineInstr *MI,
> >    MI->eraseFromParent();
> >  }
> >  
> > +void SITargetLowering::LowerSI_KIL(MachineInstr *MI, MachineBasicBlock &BB,
> > +    MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const
> > +{
> > +  /* Clear this pixel from the exec mask if the operand is negative */
> 
> Please use // style comments in the LLVM code.

Okay.

> > +  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::V_CMPX_LE_F32_e32),
> > +          AMDGPU::VCC)
> > +          .addReg(AMDGPU::SREG_LIT_0)
> > +          .addOperand(MI->getOperand(0));
> > +
> > +  /* If the exec mask is non-zero, skip the next two instructions */
> 
> This comment is misleading, because it is branching on the VCC status
> and not the exec mask status.

Right, the VCC and exec mask are identical at this point, but I guess
using S_CBRANCH_EXECNZ would be clearer.

> > +  BuildMI(BB, I, BB.findDebugLoc(I), TII->get(AMDGPU::S_CBRANCH_VCCNZ))
> > +          .addImm(3)
> > +          .addReg(AMDGPU::VCC);
> > +
> 
> I'm a little confused about how this is supposed to work. As I understand
> it, the program will branch even if just one of the waves in the wave front
> sets their VCC bit (which in this case means the pixel is not killed).
> Do we also need to export the exec_mask in the very last export of the
> program?

Yes, see si_llvm_emit_epilogue():

	/* Specify whether the EXEC mask represents the valid mask */
	last_args[1] = lp_build_const_int32(base->gallivm,
					    si_shader_ctx->type == TGSI_PROCESSOR_FRAGMENT);

I'll follow up with updated patches.

-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer