[Mesa-dev] [PATCH] R600: Make sure OQAP defs and uses happen in the same clause

Tom Stellard tom at stellard.net
Tue Oct 22 23:20:50 CEST 2013


Hi Vincent,

Here is an updated patch.  I wasn't sure where to put the assertion to
check that UnscheduledNoLiveOut{Defs,Uses} is empty when switching to a
new clause.  I tried adding it to R600SchedStartegy::schedNode() behind
the if (NextInstKind != CurInstKind) condition, but it always failed.
Any suggestions on where I should but it?

-Tom

On Mon, Oct 21, 2013 at 12:40:28PM -0700, Vincent Lejeune wrote:
> 
> 
> 
> 
> ----- Mail original -----
> > De : Tom Stellard <tom at stellard.net>
> > À : llvm-commits at cs.uiuc.edu
> > Cc : mesa-dev at lists.freedesktop.org; Tom Stellard <thomas.stellard at amd.com>
> > Envoyé le : Vendredi 11 octobre 2013 20h10
> > Objet : [PATCH] R600: Make sure OQAP defs and uses happen in the same clause
> > 
> > From: Tom Stellard <thomas.stellard at amd.com>
> > 
> > Reading the special OQAP register pops the top value off the LDS
> > input queue and returns it to the instruction.  This queue is
> > invalidated at the end of an ALU clause and leaving values in the queue
> > can lead to GPU hangs.  This means that if we load a value into the queue,
> > we must use it before the end of the clause.
> > 
> > This fixes some hangs in the OpenCV test suite.
> > ---
> > lib/Target/R600/R600MachineScheduler.cpp | 25 +++++++++++++------------
> > lib/Target/R600/R600MachineScheduler.h   |  4 ++--
> > test/CodeGen/R600/lds-input-queue.ll     | 26 ++++++++++++++++++++++++++
> > 3 files changed, 41 insertions(+), 14 deletions(-)
> > create mode 100644 test/CodeGen/R600/lds-input-queue.ll
> > 
> > diff --git a/lib/Target/R600/R600MachineScheduler.cpp 
> > b/lib/Target/R600/R600MachineScheduler.cpp
> > index 6c26d9e..611b7f4 100644
> > --- a/lib/Target/R600/R600MachineScheduler.cpp
> > +++ b/lib/Target/R600/R600MachineScheduler.cpp
> > @@ -93,11 +93,12 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) 
> > {
> >    }
> > 
> > 
> > -  // We want to scheduled AR defs as soon as possible to make sure they 
> > aren't
> > -  // put in a different ALU clause from their uses.
> > -  if (!SU && !UnscheduledARDefs.empty()) {
> > -      SU = UnscheduledARDefs[0];
> > -      UnscheduledARDefs.erase(UnscheduledARDefs.begin());
> > +  // We want to scheduled defs that cannot be live outside of this clause 
> > +  // as soon as possible to make sure they aren't put in a different
> > +  // ALU clause from their uses.
> > +  if (!SU && !UnscheduledNoLiveOutDefs.empty()) {
> > +      SU = UnscheduledNoLiveOutDefs[0];
> > +      UnscheduledNoLiveOutDefs.erase(UnscheduledNoLiveOutDefs.begin());
> >        NextInstKind = IDAlu;
> >    }
> > 
> > @@ -132,9 +133,9 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) 
> > {
> > 
> >    // We want to schedule the AR uses as late as possible to make sure that
> >    // the AR defs have been released.
> > -  if (!SU && !UnscheduledARUses.empty()) {
> > -      SU = UnscheduledARUses[0];
> > -      UnscheduledARUses.erase(UnscheduledARUses.begin());
> > +  if (!SU && !UnscheduledNoLiveOutUses.empty()) {
> > +      SU = UnscheduledNoLiveOutUses[0];
> > +      UnscheduledNoLiveOutUses.erase(UnscheduledNoLiveOutUses.begin());
> 
> Can we use std::queue<SUnit*> instead of a std::vector for UnscheduledNoLiveOutUses ?
> I had to use a vector because I needed to be able to pop non topmost SUnit in some case
> (to fit Instruction Group const read limitation) but I would rather avoid erase(iterator) call
> when possible.
> 
> 
> >        NextInstKind = IDAlu;
> >    }
> > 
> > @@ -217,15 +218,15 @@ void R600SchedStrategy::releaseBottomNode(SUnit *SU) 
> > {
> > 
> >    int IK = getInstKind(SU);
> > 
> > -  // Check for AR register defines
> > +  // Check for registers that do not live across ALU clauses.
> >    for (MachineInstr::const_mop_iterator I = 
> > SU->getInstr()->operands_begin(),
> >                                          E = 
> > SU->getInstr()->operands_end();
> >                                          I != E; ++I) {
> > -    if (I->isReg() && I->getReg() == AMDGPU::AR_X) {
> > +    if (I->isReg() && (I->getReg() == AMDGPU::AR_X || 
> > I->getReg() == AMDGPU::OQAP)) {
> >        if (I->isDef()) {
> > -        UnscheduledARDefs.push_back(SU);
> > +        UnscheduledNoLiveOutDefs.push_back(SU);
> >        } else {
> > -        UnscheduledARUses.push_back(SU);
> > +        UnscheduledNoLiveOutUses.push_back(SU);
> >        }
> >        return;
> >      }
> > diff --git a/lib/Target/R600/R600MachineScheduler.h 
> > b/lib/Target/R600/R600MachineScheduler.h
> > index 0a6f120..db2e188 100644
> > --- a/lib/Target/R600/R600MachineScheduler.h
> > +++ b/lib/Target/R600/R600MachineScheduler.h
> > @@ -53,8 +53,8 @@ class R600SchedStrategy : public MachineSchedStrategy {
> > 
> >    std::vector<SUnit *> Available[IDLast], Pending[IDLast];
> >    std::vector<SUnit *> AvailableAlus[AluLast];
> > -  std::vector<SUnit *> UnscheduledARDefs;
> > -  std::vector<SUnit *> UnscheduledARUses;
> > +  std::vector<SUnit *> UnscheduledNoLiveOutDefs;
> > +  std::vector<SUnit *> UnscheduledNoLiveOutUses;
> >    std::vector<SUnit *> PhysicalRegCopy;
> > 
> >    InstKind CurInstKind;
> > diff --git a/test/CodeGen/R600/lds-input-queue.ll 
> > b/test/CodeGen/R600/lds-input-queue.ll
> > new file mode 100644
> > index 0000000..548b41c
> > --- /dev/null
> > +++ b/test/CodeGen/R600/lds-input-queue.ll
> > @@ -0,0 +1,26 @@
> > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
> 
> Does the test work with -verify-machineinstrs flag set ?
> 
> > +;
> > +; This test checks that the lds input queue will is empty at the end of
> > +; the ALU clause.
> > +
> > +; CHECK-LABEL: @lds_input_queue
> > +; CHECK: LDS_READ_RET * OQAP
> > +; CHECK-NOT: ALU clause
> > +; CHECK: MOV T{{[0-9]\.[XYZW]}}, OQAP
> > +
> > + at local_mem = internal addrspace(3) unnamed_addr global [2 x i32] [i32 1, i32 
> > 2], align 4
> > +
> > +define void @lds_input_queue(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 
> > %index) {
> > +entry:
> > +  %0 = getelementptr inbounds [2 x i32] addrspace(3)* @local_mem, i32 0, i32 
> > %index
> > +  %1 = load i32 addrspace(3)* %0
> > +  call void @llvm.AMDGPU.barrier.local()
> > +
> > +  ; This will start a new clause for the vertex fetch
> > +  %2 = load i32 addrspace(1)* %in
> > +  %3 = add i32 %1, %2
> > +  store i32 %3, i32 addrspace(1)* %out
> > +  ret void
> > +}
> > +
> > +declare void @llvm.AMDGPU.barrier.local()
> > -- 
> > 1.7.11.4
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
-------------- next part --------------
>From b995bbac04b3f05564008132a3b8a90cb08662de Mon Sep 17 00:00:00 2001
From: Tom Stellard <thomas.stellard at amd.com>
Date: Thu, 5 Sep 2013 08:59:32 -0700
Subject: [PATCH] R600: Make sure OQAP defs and uses happen in the same clause
 v2

Reading the special OQAP register pops the top value off the LDS
input queue and returns it to the instruction.  This queue is
invalidated at the end of an ALU clause and leaving values in the queue
can lead to GPU hangs.  This means that if we load a value into the queue,
we must use it before the end of the clause.

This fixes some hangs in the OpenCV test suite.

v2:
  - Use a std::queue for tracking OQAP defs and uses
  - Add -verify-machineinstrs to the test case
---
 lib/Target/R600/R600MachineScheduler.cpp | 25 +++++++++++++------------
 lib/Target/R600/R600MachineScheduler.h   |  4 ++--
 test/CodeGen/R600/lds-input-queue.ll     | 26 ++++++++++++++++++++++++++
 3 files changed, 41 insertions(+), 14 deletions(-)
 create mode 100644 test/CodeGen/R600/lds-input-queue.ll

diff --git a/lib/Target/R600/R600MachineScheduler.cpp b/lib/Target/R600/R600MachineScheduler.cpp
index 6c26d9e..d89506f 100644
--- a/lib/Target/R600/R600MachineScheduler.cpp
+++ b/lib/Target/R600/R600MachineScheduler.cpp
@@ -93,11 +93,12 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) {
   }
 
 
-  // We want to scheduled AR defs as soon as possible to make sure they aren't
-  // put in a different ALU clause from their uses.
-  if (!SU && !UnscheduledARDefs.empty()) {
-      SU = UnscheduledARDefs[0];
-      UnscheduledARDefs.erase(UnscheduledARDefs.begin());
+  // We want to scheduled defs that cannot be live outside of this clause 
+  // as soon as possible to make sure they aren't put in a different
+  // ALU clause from their uses.
+  if (!SU && !UnscheduledNoLiveOutDefs.empty()) {
+      SU = UnscheduledNoLiveOutDefs.front();
+      UnscheduledNoLiveOutDefs.pop();
       NextInstKind = IDAlu;
   }
 
@@ -132,9 +133,9 @@ SUnit* R600SchedStrategy::pickNode(bool &IsTopNode) {
 
   // We want to schedule the AR uses as late as possible to make sure that
   // the AR defs have been released.
-  if (!SU && !UnscheduledARUses.empty()) {
-      SU = UnscheduledARUses[0];
-      UnscheduledARUses.erase(UnscheduledARUses.begin());
+  if (!SU && !UnscheduledNoLiveOutUses.empty()) {
+      SU = UnscheduledNoLiveOutUses.front();
+      UnscheduledNoLiveOutUses.pop();
       NextInstKind = IDAlu;
   }
 
@@ -217,15 +218,15 @@ void R600SchedStrategy::releaseBottomNode(SUnit *SU) {
 
   int IK = getInstKind(SU);
 
-  // Check for AR register defines
+  // Check for registers that do not live across ALU clauses.
   for (MachineInstr::const_mop_iterator I = SU->getInstr()->operands_begin(),
                                         E = SU->getInstr()->operands_end();
                                         I != E; ++I) {
-    if (I->isReg() && I->getReg() == AMDGPU::AR_X) {
+    if (I->isReg() && (I->getReg() == AMDGPU::AR_X || I->getReg() == AMDGPU::OQAP)) {
       if (I->isDef()) {
-        UnscheduledARDefs.push_back(SU);
+        UnscheduledNoLiveOutDefs.push(SU);
       } else {
-        UnscheduledARUses.push_back(SU);
+        UnscheduledNoLiveOutUses.push(SU);
       }
       return;
     }
diff --git a/lib/Target/R600/R600MachineScheduler.h b/lib/Target/R600/R600MachineScheduler.h
index 0a6f120..c01decd 100644
--- a/lib/Target/R600/R600MachineScheduler.h
+++ b/lib/Target/R600/R600MachineScheduler.h
@@ -53,8 +53,8 @@ class R600SchedStrategy : public MachineSchedStrategy {
 
   std::vector<SUnit *> Available[IDLast], Pending[IDLast];
   std::vector<SUnit *> AvailableAlus[AluLast];
-  std::vector<SUnit *> UnscheduledARDefs;
-  std::vector<SUnit *> UnscheduledARUses;
+  std::queue<SUnit *> UnscheduledNoLiveOutDefs;
+  std::queue<SUnit *> UnscheduledNoLiveOutUses;
   std::vector<SUnit *> PhysicalRegCopy;
 
   InstKind CurInstKind;
diff --git a/test/CodeGen/R600/lds-input-queue.ll b/test/CodeGen/R600/lds-input-queue.ll
new file mode 100644
index 0000000..d2410d4
--- /dev/null
+++ b/test/CodeGen/R600/lds-input-queue.ll
@@ -0,0 +1,26 @@
+; RUN: llc < %s -march=r600 -mcpu=redwood -verify-machineinstrs | FileCheck %s
+;
+; This test checks that the lds input queue will is empty at the end of
+; the ALU clause.
+
+; CHECK-LABEL: @lds_input_queue
+; CHECK: LDS_READ_RET * OQAP
+; CHECK-NOT: ALU clause
+; CHECK: MOV T{{[0-9]\.[XYZW]}}, OQAP
+
+ at local_mem = internal addrspace(3) unnamed_addr global [2 x i32] [i32 1, i32 2], align 4
+
+define void @lds_input_queue(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %index) {
+entry:
+  %0 = getelementptr inbounds [2 x i32] addrspace(3)* @local_mem, i32 0, i32 %index
+  %1 = load i32 addrspace(3)* %0
+  call void @llvm.AMDGPU.barrier.local()
+
+  ; This will start a new clause for the vertex fetch
+  %2 = load i32 addrspace(1)* %in
+  %3 = add i32 %1, %2
+  store i32 %3, i32 addrspace(1)* %out
+  ret void
+}
+
+declare void @llvm.AMDGPU.barrier.local()
-- 
1.7.11.4



More information about the mesa-dev mailing list