[Mesa-dev] [PATCH 1/1] r600/sb: Fix loop optimization related hangs on eg
Marek Olšák
maraeo at gmail.com
Tue Jan 3 20:59:34 UTC 2017
Pushed, thanks.
Marek
On Sun, Nov 20, 2016 at 2:42 PM, Heiko Przybyl <lil_tux at web.de> wrote:
> Make sure unused ops and their references are removed, prior to entering
> the GCM (global code motion) pass, to stop GCM from breaking the loop
> logic and thus hanging the GPU.
>
> Turns out, that sb has problems with loops and node optimizations
> regarding associative folding:
>
> - the global code motion (gcm) pass moves ops up a loop level/basic block
> until they've fulfilled their total usage count
> - if there are ops folded into others, the usage count won't be
> fulfilled and thus the op moved way up to the top
> - within GCM the op would be visited and their deps would be moved
> alongside it, to fulfill the src constaints
> - in a loop, an unused op is moved out of the loop and GCM would move
> the src value ops up as well
> - now here arises the problem: if the loop counter is one of the src
> values it would get moved up as well, the loop break condition would
> never get hit and the shader turn into an endless loop, resulting in the
> GPU hanging and being reset
>
> A reduced (albeit nonsense) piglit example would be:
>
> [require]
> GLSL >= 1.20
>
> [fragment shader]
>
> uniform int SIZE;
> uniform vec4 lights[512];
>
> void main()
> {
> float x = 0;
> for(int i = 0; i < SIZE; i++)
> x += lights[2*i+1].x;
> }
>
> [test]
> uniform int SIZE 1
> draw rect -1 -1 2 2
>
> Which gets optimized to:
>
> ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN =====
> ===== 42 dw ===== 1 gprs ===== 2 stack =========================================
> ALU 3 @24
> 1 y: MOV R0.y, 0
> t: MULLO_UINT R0.w, [0x00000002 2.8026e-45].x, R0.z
>
> LOOP_START_DX10 @22
> PUSH @6
> ALU 1 @30 KC0[CB0:0-15]
> 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x
> JUMP @14 POP:1
> LOOP_BREAK @20
> POP @14 POP:1
> ALU 2 @32
> 3 x: ADD_INT R0.x, R0.w, [0x00000002 2.8026e-45].x
>
> TEX 1 @36
> VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..]
> ALU 1 @40
> 4 y: ADD R0.y, R0.y, R0.x
> LOOP_END @4
> EXPORT_DONE PIXEL 0 R0.____ EOP
> ===== SHADER_END ===============================================================
>
> Notice R0.z being the loop counter/break condition relevant register
> and being never incremented at all. Also some of the loop content
> has been moved out of it, to fulfill the requirements for the one unused
> op.
>
> With a debug build of mesa this would produce an error like
> error at : PRED_SETGE_INT __, __, EM.2, R1.x.2||FP at R0.z, C0.x
> : operand value R1.x.2||FP at R0.z was not previously written to its gpr
> and the compilation would fail due to this. On a release build it gets
> passed to the GPU.
>
> When using this patch, the loop remains intact:
>
> ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN =====
> ===== 48 dw ===== 1 gprs ===== 2 stack =========================================
> ALU 2 @24
> 1 y: MOV R0.y, 0
> z: MOV R0.z, 0
> LOOP_START_DX10 @22
> PUSH @6
> ALU 1 @28 KC0[CB0:0-15]
> 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x
> JUMP @14 POP:1
> LOOP_BREAK @20
> POP @14 POP:1
> ALU 4 @30
> 3 t: MULLO_UINT T0.x, [0x00000002 2.8026e-45].x, R0.z
>
> 4 x: ADD_INT R0.x, T0.x, [0x00000002 2.8026e-45].x
>
> TEX 1 @40
> VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..]
> ALU 2 @44
> 5 y: ADD R0.y, R0.y, R0.x
> z: ADD_INT R0.z, R0.z, 1
> LOOP_END @4
> EXPORT_DONE PIXEL 0 R0.____ EOP
> ===== SHADER_END ===============================================================
>
> Piglit: ./piglit summary console -d results/*_gpu_noglx
> name: unpatched_gpu_noglx patched_gpu_noglx
> ---- ------------------- -----------------
> pass: 18016 18021
> fail: 748 743
> crash: 7 7
> skip: 1124 1124
> timeout: 0 0
> warn: 13 13
> incomplete: 0 0
> dmesg-warn: 0 0
> dmesg-fail: 0 0
> changes: 0 5
> fixes: 0 5
> regressions: 0 0
> total: 19908 19908
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900
> Tested-by: Heiko Przybyl <lil_tux at web.de>
> Tested-on: Barts PRO HD6850
> Signed-off-by: Heiko Przybyl <lil_tux at web.de>
> ---
> src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp | 25 ++++++++++++++-
> src/gallium/drivers/r600/sb/sb_gcm.cpp | 7 ++---
> src/gallium/drivers/r600/sb/sb_ir.cpp | 4 +--
> src/gallium/drivers/r600/sb/sb_ir.h | 14 +++++----
> src/gallium/drivers/r600/sb/sb_pass.h | 6 +++-
> src/gallium/drivers/r600/sb/sb_valtable.cpp | 42 ++++++++++++++++----------
> 6 files changed, 68 insertions(+), 30 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
> index 79aef9106a..abae2bf69d 100644
> --- a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
> +++ b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
> @@ -30,6 +30,18 @@
>
> namespace r600_sb {
>
> +int dce_cleanup::run() {
> + int r;
> +
> + // Run cleanup for as long as there are unused nodes.
> + do {
> + nodes_changed = false;
> + r = vpass::run();
> + } while (r == 0 && nodes_changed);
> +
> + return r;
> +}
> +
> bool dce_cleanup::visit(node& n, bool enter) {
> if (enter) {
> } else {
> @@ -110,7 +122,18 @@ bool dce_cleanup::visit(region_node& n, bool enter) {
> void dce_cleanup::cleanup_dst(node& n) {
> if (!cleanup_dst_vec(n.dst) && remove_unused &&
> !n.dst.empty() && !(n.flags & NF_DONT_KILL) && n.parent)
> + {
> + // Delete use references to the removed node from the src values.
> + for (vvec::iterator I = n.src.begin(), E = n.src.end(); I != E; ++I) {
> + value* v = *I;
> + if (v && v->def && v->uses.size())
> + {
> + v->remove_use(&n);
> + }
> + }
> n.remove();
> + nodes_changed = true;
> + }
> }
>
> bool dce_cleanup::visit(container_node& n, bool enter) {
> @@ -130,7 +153,7 @@ bool dce_cleanup::cleanup_dst_vec(vvec& vv) {
> if (v->gvn_source && v->gvn_source->is_dead())
> v->gvn_source = NULL;
>
> - if (v->is_dead() || (remove_unused && !v->is_rel() && !v->uses))
> + if (v->is_dead() || (remove_unused && !v->is_rel() && !v->uses.size()))
> v = NULL;
> else
> alive = true;
> diff --git a/src/gallium/drivers/r600/sb/sb_gcm.cpp b/src/gallium/drivers/r600/sb/sb_gcm.cpp
> index 236b2ea003..9c75389ada 100644
> --- a/src/gallium/drivers/r600/sb/sb_gcm.cpp
> +++ b/src/gallium/drivers/r600/sb/sb_gcm.cpp
> @@ -199,10 +199,9 @@ void gcm::td_release_val(value *v) {
> sblog << "\n";
> );
>
> - use_info *u = v->uses;
> - while (u) {
> + for (uselist::iterator I = v->uses.begin(), E = v->uses.end(); I != E; ++I) {
> + use_info *u = *I;
> if (u->op->parent != &pending) {
> - u = u->next;
> continue;
> }
>
> @@ -212,6 +211,7 @@ void gcm::td_release_val(value *v) {
> sblog << "\n";
> );
>
> + assert(uses[u->op] > 0);
> if (--uses[u->op] == 0) {
> GCM_DUMP(
> sblog << "td released : ";
> @@ -222,7 +222,6 @@ void gcm::td_release_val(value *v) {
> pending.remove_node(u->op);
> ready.push_back(u->op);
> }
> - u = u->next;
> }
>
> }
> diff --git a/src/gallium/drivers/r600/sb/sb_ir.cpp b/src/gallium/drivers/r600/sb/sb_ir.cpp
> index 5226893de7..d989dce62c 100644
> --- a/src/gallium/drivers/r600/sb/sb_ir.cpp
> +++ b/src/gallium/drivers/r600/sb/sb_ir.cpp
> @@ -255,7 +255,7 @@ void container_node::expand() {
> void node::remove() {parent->remove_node(this);
> }
>
> -value_hash node::hash_src() {
> +value_hash node::hash_src() const {
>
> value_hash h = 12345;
>
> @@ -269,7 +269,7 @@ value_hash node::hash_src() {
> }
>
>
> -value_hash node::hash() {
> +value_hash node::hash() const {
>
> if (parent && parent->subtype == NST_LOOP_PHI_CONTAINER)
> return 47451;
> diff --git a/src/gallium/drivers/r600/sb/sb_ir.h b/src/gallium/drivers/r600/sb/sb_ir.h
> index 4fc4da2fb2..74c0549a81 100644
> --- a/src/gallium/drivers/r600/sb/sb_ir.h
> +++ b/src/gallium/drivers/r600/sb/sb_ir.h
> @@ -446,15 +446,16 @@ enum use_kind {
> };
>
> struct use_info {
> - use_info *next;
> node *op;
> use_kind kind;
> int arg;
>
> - use_info(node *n, use_kind kind, int arg, use_info* next)
> - : next(next), op(n), kind(kind), arg(arg) {}
> + use_info(node *n, use_kind kind, int arg)
> + : op(n), kind(kind), arg(arg) {}
> };
>
> +typedef std::list< use_info * > uselist;
> +
> enum constraint_kind {
> CK_SAME_REG,
> CK_PACKED_BS,
> @@ -498,7 +499,7 @@ public:
> value_hash ghash;
>
> node *def, *adef;
> - use_info *uses;
> + uselist uses;
>
> ra_constraint *constraint;
> ra_chunk *chunk;
> @@ -585,6 +586,7 @@ public:
> }
>
> void add_use(node *n, use_kind kind, int arg);
> + void remove_use(const node *n);
>
> value_hash hash();
> value_hash rel_hash();
> @@ -790,8 +792,8 @@ public:
> void replace_with(node *n);
> void remove();
>
> - virtual value_hash hash();
> - value_hash hash_src();
> + virtual value_hash hash() const;
> + value_hash hash_src() const;
>
> virtual bool fold_dispatch(expr_handler *ex);
>
> diff --git a/src/gallium/drivers/r600/sb/sb_pass.h b/src/gallium/drivers/r600/sb/sb_pass.h
> index 0346df1b16..e878f8c70c 100644
> --- a/src/gallium/drivers/r600/sb/sb_pass.h
> +++ b/src/gallium/drivers/r600/sb/sb_pass.h
> @@ -124,7 +124,9 @@ class dce_cleanup : public vpass {
> public:
>
> dce_cleanup(shader &s) : vpass(s),
> - remove_unused(s.dce_flags & DF_REMOVE_UNUSED) {}
> + remove_unused(s.dce_flags & DF_REMOVE_UNUSED), nodes_changed(false) {}
> +
> + virtual int run();
>
> virtual bool visit(node &n, bool enter);
> virtual bool visit(alu_group_node &n, bool enter);
> @@ -140,6 +142,8 @@ private:
> void cleanup_dst(node &n);
> bool cleanup_dst_vec(vvec &vv);
>
> + // Did we alter/remove nodes during a single pass?
> + bool nodes_changed;
> };
>
>
> diff --git a/src/gallium/drivers/r600/sb/sb_valtable.cpp b/src/gallium/drivers/r600/sb/sb_valtable.cpp
> index eb242b1c26..a8b7b49cd4 100644
> --- a/src/gallium/drivers/r600/sb/sb_valtable.cpp
> +++ b/src/gallium/drivers/r600/sb/sb_valtable.cpp
> @@ -220,17 +220,33 @@ void value::add_use(node* n, use_kind kind, int arg) {
> dump::dump_op(n);
> sblog << " kind " << kind << " arg " << arg << "\n";
> }
> - uses = new use_info(n, kind, arg, uses);
> + uses.push_back(new use_info(n, kind, arg));
> }
>
> -unsigned value::use_count() {
> - use_info *u = uses;
> - unsigned c = 0;
> - while (u) {
> - ++c;
> - u = u->next;
> +struct use_node_comp {
> + explicit use_node_comp(const node *n) : n(n) {}
> + bool operator() (const use_info *u) {
> + return u->op->hash() == n->hash();
> + }
> +
> + private:
> + const node *n;
> +};
> +
> +void value::remove_use(const node *n) {
> + uselist::iterator it =
> + std::find_if(uses.begin(), uses.end(), use_node_comp(n));
> +
> + if (it != uses.end())
> + {
> + // TODO assert((*it)->kind == kind) ?
> + // TODO assert((*it)->arg == arg) ?
> + uses.erase(it);
> }
> - return c;
> +}
> +
> +unsigned value::use_count() {
> + return uses.size();
> }
>
> bool value::is_global() {
> @@ -274,13 +290,7 @@ bool value::is_prealloc() {
> }
>
> void value::delete_uses() {
> - use_info *u, *c = uses;
> - while (c) {
> - u = c->next;
> - delete c;
> - c = u;
> - }
> - uses = NULL;
> + uses.erase(uses.begin(), uses.end());
> }
>
> void ra_constraint::update_values() {
> @@ -468,7 +478,7 @@ bool r600_sb::sb_value_set::add_vec(vvec& vv) {
> bool r600_sb::sb_value_set::contains(value* v) {
> unsigned b = v->uid - 1;
> if (b < bs.size())
> - return bs.get(v->uid - 1);
> + return bs.get(b);
> else
> return false;
> }
> --
> 2.11.0.rc2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list