<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [SKL]Render error in some games (etqw-demo, nexuiz, portal)"
href="https://bugs.freedesktop.org/show_bug.cgi?id=89058#c27">Comment # 27</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [SKL]Render error in some games (etqw-demo, nexuiz, portal)"
href="https://bugs.freedesktop.org/show_bug.cgi?id=89058">bug 89058</a>
from <span class="vcard"><a class="email" href="mailto:neil@linux.intel.com" title="Neil Roberts <neil@linux.intel.com>"> <span class="fn">Neil Roberts</span></a>
</span></b>
<pre>I think I have a better understanding of what's going on. The original ARBvp
program has three constant array loads with a non-constant index like this:
MOV _R0, _joints[_A0.x+0];
MOV _R1, _joints[_A0.x+1];
MOV _R2, _joints[_A0.x+2];
These get converted to VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 instructions which is
something like below. v17, v20 and v23 are 2-register-wide virtual registers
where the first register is reserved for the message header and the second
register is loaded with the indices by some prior instructions.
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 v18, v17
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 v21, v20
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 v24, v23
The register allocator allocates the virtual registers as below. It reuses the
source register from the second load as the destination in the third. It also
uses g11 for both the source and the dest in the first load. Neither of these
should cause a problem.
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 g11, g11
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 g12, g13
VS_OPCODE_PULL_CONSTANT_LOAD_GEN7 g13, g15
The instruction scheduler now kicks in and reorders the last two instructions.
As far as it is concerned it is safe to do this because the initialisation code
before the loads only writes to the top half of the source register pairs (g12,
g14 and g16) and doesn't write to the lower halves so it looks like writing to
those destination registers doesn't cause a collision. However the problem is
that the generator actually sneaks in a write to the source register in order
to set up the message header. So the code now looks like this:
mov(4) g11<1>UD g0<4,4,1>UD ; set up the message header
send(8) g11<1>F g11<4,4,1>.xD
sampler (0, 0, 7, 0) mlen 2 rlen 1
mov(4) g15<1>UD g0<4,4,1>UD ; set up the message header
send(8) g13<1>F g15<4,4,1>.xD
sampler (0, 0, 7, 0) mlen 2 rlen 1
mov(4) g13<1>UD g0<4,4,1>UD ; set up the message header
send(8) g12<1>F g13<4,4,1>.xD
sampler (0, 0, 7, 0) mlen 2 rlen 1
This is a problem because the third move instruction is actually overwriting
the results from the second send instruction. The scheduler had no way of
knowing this was going to happen because there was no dependency set up to let
it know that the PULL_CONSTANT_LOAD instruction writes to one of its sources.
I think this might be a general problem with the way we handle texture sampling
and I think it would effect normal texture sampling with a header such as
texelOffset in a fragment shader and it's just a coincidence that it is only
hit in these circumstances. However this is only a hunch because I still don't
really understand the register allocator and the scheduler very well.
Maybe a good solution would be to add the MOV for the message header outside of
the generator so that the dependencies would be tracked correctly. This might
also allow some better optimisations to take place.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
</ul>
</body>
</html>