[Mesa-dev] [PATCH] R600: Make sure OQAP defs and uses happen in the same clause

Fri Oct 25 14:42:13 CEST 2013

This patch should work when checking than no OQAP is used before beeing queued, assuming that a value in OQAP is consumed
and cannot be read twice. However I'm not sure I cover all LDS instructions that queues a value, I only use LDS_RET_READ in switch case.

Vincent

----- Mail original -----
> De : Tom Stellard <tom at stellard.net>
> À : Vincent Lejeune <vljn at ovi.com>
> Cc : "llvm-commits at cs.uiuc.edu" <llvm-commits at cs.uiuc.edu>; "mesa-dev at lists.freedesktop.org" <mesa-dev at lists.freedesktop.org>; Tom Stellard <thomas.stellard at amd.com>
> Envoyé le : Mardi 22 octobre 2013 23h20
> Objet : Re: [PATCH] R600: Make sure OQAP defs and uses happen in the same clause
> 
> Hi Vincent,
> 
> Here is an updated patch.  I wasn't sure where to put the assertion to
> check that UnscheduledNoLiveOut{Defs,Uses} is empty when switching to a
> new clause.  I tried adding it to R600SchedStartegy::schedNode() behind
> the if (NextInstKind != CurInstKind) condition, but it always failed.
> Any suggestions on where I should but it?
> 
> -Tom
> 
> 
> On Mon, Oct 21, 2013 at 12:40:28PM -0700, Vincent Lejeune wrote:
>> 
>> 
>> 
>> 
>>  ----- Mail original -----
>>  > De : Tom Stellard <tom at stellard.net>
>>  > À : llvm-commits at cs.uiuc.edu
>>  > Cc : mesa-dev at lists.freedesktop.org; Tom Stellard 
> <thomas.stellard at amd.com>
>>  > Envoyé le : Vendredi 11 octobre 2013 20h10
>>  > Objet : [PATCH] R600: Make sure OQAP defs and uses happen in the same 
> clause
>>  > 
>>  > From: Tom Stellard <thomas.stellard at amd.com>
>>  > 
>>  > Reading the special OQAP register pops the top value off the LDS
>>  > input queue and returns it to the instruction.  This queue is
>>  > invalidated at the end of an ALU clause and leaving values in the 
> queue
>>  > can lead to GPU hangs.  This means that if we load a value into the 
> queue,
>>  > we must use it before the end of the clause.
>>  > 
>>  > This fixes some hangs in the OpenCV test suite.
>>  > ---
>>  > lib/Target/R600/R600MachineScheduler.cpp | 25 
> +++++++++++++------------
>>  > lib/Target/R600/R600MachineScheduler.h   |  4 ++--
>>  > test/CodeGen/R600/lds-input-queue.ll     | 26 
> ++++++++++++++++++++++++++
>>  > 3 files changed, 41 insertions(+), 14 deletions(-)
>>  > create mode 100644 test/CodeGen/R600/lds-input-queue.ll
>>  > 
>>  > diff --git a/lib/Target/R600/R600MachineScheduler.cpp 
>>  > b/lib/Target/R600/R600MachineScheduler.cpp
>>  > index 6c26d9e..611b7f4 100644
>>  > --- a/lib/Target/R600/R600MachineScheduler.cpp
>>  > +++ b/lib/Target/R600/R600MachineScheduler.cpp
>>  > @@ -93,11 +93,12 @@ SUnit* R600SchedStrategy::pickNode(bool 
> &IsTopNode) 
>>  > {
>>  >    }
>>  > 
>>  > 
>>  > -  // We want to scheduled AR defs as soon as possible to make sure 
> they 
>>  > aren't
>>  > -  // put in a different ALU clause from their uses.
>>  > -  if (!SU && !UnscheduledARDefs.empty()) {
>>  > -      SU = UnscheduledARDefs[0];
>>  > -      UnscheduledARDefs.erase(UnscheduledARDefs.begin());
>>  > +  // We want to scheduled defs that cannot be live outside of this 
> clause 
>>  > +  // as soon as possible to make sure they aren't put in a 
> different
>>  > +  // ALU clause from their uses.
>>  > +  if (!SU && !UnscheduledNoLiveOutDefs.empty()) {
>>  > +      SU = UnscheduledNoLiveOutDefs[0];
>>  > +      
> UnscheduledNoLiveOutDefs.erase(UnscheduledNoLiveOutDefs.begin());
>>  >        NextInstKind = IDAlu;
>>  >    }
>>  > 
>>  > @@ -132,9 +133,9 @@ SUnit* R600SchedStrategy::pickNode(bool 
> &IsTopNode) 
>>  > {
>>  > 
>>  >    // We want to schedule the AR uses as late as possible to make sure 
> that
>>  >    // the AR defs have been released.
>>  > -  if (!SU && !UnscheduledARUses.empty()) {
>>  > -      SU = UnscheduledARUses[0];
>>  > -      UnscheduledARUses.erase(UnscheduledARUses.begin());
>>  > +  if (!SU && !UnscheduledNoLiveOutUses.empty()) {
>>  > +      SU = UnscheduledNoLiveOutUses[0];
>>  > +      
> UnscheduledNoLiveOutUses.erase(UnscheduledNoLiveOutUses.begin());
>> 
>>  Can we use std::queue<SUnit*> instead of a std::vector for 
> UnscheduledNoLiveOutUses ?
>>  I had to use a vector because I needed to be able to pop non topmost SUnit 
> in some case
>>  (to fit Instruction Group const read limitation) but I would rather avoid 
> erase(iterator) call
>>  when possible.
>> 
>> 
>>  >        NextInstKind = IDAlu;
>>  >    }
>>  > 
>>  > @@ -217,15 +218,15 @@ void R600SchedStrategy::releaseBottomNode(SUnit 
> *SU) 
>>  > {
>>  > 
>>  >    int IK = getInstKind(SU);
>>  > 
>>  > -  // Check for AR register defines
>>  > +  // Check for registers that do not live across ALU clauses.
>>  >    for (MachineInstr::const_mop_iterator I = 
>>  > SU->getInstr()->operands_begin(),
>>  >                                          E = 
>>  > SU->getInstr()->operands_end();
>>  >                                          I != E; ++I) {
>>  > -    if (I->isReg() && I->getReg() == AMDGPU::AR_X) 
> {
>>  > +    if (I->isReg() && (I->getReg() == AMDGPU::AR_X || 
>>  > I->getReg() == AMDGPU::OQAP)) {
>>  >        if (I->isDef()) {
>>  > -        UnscheduledARDefs.push_back(SU);
>>  > +        UnscheduledNoLiveOutDefs.push_back(SU);
>>  >        } else {
>>  > -        UnscheduledARUses.push_back(SU);
>>  > +        UnscheduledNoLiveOutUses.push_back(SU);
>>  >        }
>>  >        return;
>>  >      }
>>  > diff --git a/lib/Target/R600/R600MachineScheduler.h 
>>  > b/lib/Target/R600/R600MachineScheduler.h
>>  > index 0a6f120..db2e188 100644
>>  > --- a/lib/Target/R600/R600MachineScheduler.h
>>  > +++ b/lib/Target/R600/R600MachineScheduler.h
>>  > @@ -53,8 +53,8 @@ class R600SchedStrategy : public 
> MachineSchedStrategy {
>>  > 
>>  >    std::vector<SUnit *> Available[IDLast], Pending[IDLast];
>>  >    std::vector<SUnit *> AvailableAlus[AluLast];
>>  > -  std::vector<SUnit *> UnscheduledARDefs;
>>  > -  std::vector<SUnit *> UnscheduledARUses;
>>  > +  std::vector<SUnit *> UnscheduledNoLiveOutDefs;
>>  > +  std::vector<SUnit *> UnscheduledNoLiveOutUses;
>>  >    std::vector<SUnit *> PhysicalRegCopy;
>>  > 
>>  >    InstKind CurInstKind;
>>  > diff --git a/test/CodeGen/R600/lds-input-queue.ll 
>>  > b/test/CodeGen/R600/lds-input-queue.ll
>>  > new file mode 100644
>>  > index 0000000..548b41c
>>  > --- /dev/null
>>  > +++ b/test/CodeGen/R600/lds-input-queue.ll
>>  > @@ -0,0 +1,26 @@
>>  > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
>> 
>>  Does the test work with -verify-machineinstrs flag set ?
>> 
>>  > +;
>>  > +; This test checks that the lds input queue will is empty at the end 
> of
>>  > +; the ALU clause.
>>  > +
>>  > +; CHECK-LABEL: @lds_input_queue
>>  > +; CHECK: LDS_READ_RET * OQAP
>>  > +; CHECK-NOT: ALU clause
>>  > +; CHECK: MOV T{{[0-9]\.[XYZW]}}, OQAP
>>  > +
>>  > + at local_mem = internal addrspace(3) unnamed_addr global [2 x i32] [i32 
> 1, i32 
>>  > 2], align 4
>>  > +
>>  > +define void @lds_input_queue(i32 addrspace(1)* %out, i32 
> addrspace(1)* %in, i32 
>>  > %index) {
>>  > +entry:
>>  > +  %0 = getelementptr inbounds [2 x i32] addrspace(3)* @local_mem, i32 
> 0, i32 
>>  > %index
>>  > +  %1 = load i32 addrspace(3)* %0
>>  > +  call void @llvm.AMDGPU.barrier.local()
>>  > +
>>  > +  ; This will start a new clause for the vertex fetch
>>  > +  %2 = load i32 addrspace(1)* %in
>>  > +  %3 = add i32 %1, %2
>>  > +  store i32 %3, i32 addrspace(1)* %out
>>  > +  ret void
>>  > +}
>>  > +
>>  > +declare void @llvm.AMDGPU.barrier.local()
>>  > -- 
>>  > 1.7.11.4
>>  > 
>>  > _______________________________________________
>>  > llvm-commits mailing list
>>  > llvm-commits at cs.uiuc.edu
>>  > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>  >
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-R600-Uses-Assert-for-OQAP.patch
Type: text/x-patch
Size: 3054 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131025/2dec9009/attachment-0001.bin>