<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Implement SSBOs in GLSL front-end and i965"
href="https://bugs.freedesktop.org/show_bug.cgi?id=89597#c25">Comment # 25</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Implement SSBOs in GLSL front-end and i965"
href="https://bugs.freedesktop.org/show_bug.cgi?id=89597">bug 89597</a>
from <span class="vcard"><a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> <span class="fn">Iago Toral</span></a>
</span></b>
<pre>I think the FS bits work well now and I have switched focus to vec4. Here the
problem is again with non-constant offsets that are not 16-byte aligned (this
is actually not working in master for UBOs either). I have a working solution
for this, but I'd like to discuss here the details and see what you think about
it:
Instead of using scattered messages like we do for writes in the FS I am
experimenting with unaligned oword block reads in this case. The solution, that
I describe below, seems to work well but I had to work around a couple of
issues that maybe can be dealt with in a better way. The solution looks like
this (only for non-constant offsets, for constant offsets we can just use dual
oword read):
1) Since unaligned oword messages are not "dual", we have to handle each of the
two SIMD4x2 vertices separately, so I emit two separate unaligned reads, one
from offset.0 (first vertex) and another from offset.4 (second vertex). I store
the results of these reads to separate virtual registers read_result0 and
read_result1 respectively.
2) In the next step I merge both read results into a single register suitable
for SIMD4x2 operation. That is, if we call dst the destination of the SIMD4x2
operation, I move the lower half of read_result0 to the lower half of dst and
the lower half of read_result1 to the higher half of dst. For this part I have
defined a generator opcode (let's call it simd4x2_merge) that I initially
implemented as two mov(4) operations, like this:
brw_MOV(p,
brw_vec4_reg(dst.file, dst.nr, 0),
brw_vec4_reg(src0.file, src0.nr, 0));
brw_MOV(p,
brw_vec4_reg(dst.file, dst.nr, 4),
brw_vec4_reg(src0.file, src1.nr, 0));
The first problem I found with this solution is that in some examples it would
trigger the following assertion:
brw_vec4_generator.cpp:1927: void brw::vec4_generator::generate_code(const
cfg_t*): Assertion `p->nr_insn == pre_emit_nr_insn + 1 || !"conditional_mod,
no_dd_check, or no_dd_clear set for IR " "emitting more than 1 instruction"'
failed.
This problem comes from the fact that in these cases,
opt_set_dependency_control would set no_dd_clear/no_dd_check on the
simd4x2_merge instruction and this does not seem to like the fact that this
generator opcode actually expands to more than just one assembly instruction. I
did not see an obvious way to deal with this other than skipping
opt_set_dependency_control for this generator opcode, since the fact that it
spawns more than just one assembly instruction will inevitably lead to this
problem. I imagine that it could be possible to fix this more generally by
having the generator set dependency control flags on all the instructions
emitted by the opcode maybe. Anyway, since there is a is_dep_ctrl_unsafe()
function that seems to be there to select situations where we want to avoid
opt_set_dependency_control to kick in, I just added this opcode there. Is there
anything else to this scenario that I am missing?
With that fixed, I found another issue with the register coalesce optimization
pass, as it attempted to rewrite the simd4x2_merge instruction to write
directly to an MRF register using a writemask. As I show above, my first
approach to the generator opcode was to have two MOVs that would always write
all of the dst register (since that made sense in my context), but if other
optimization passes can take the liberty to rewrite the instruction to
introduce a writemask then that approach is no longer valid and I have to honor
the writemask on the dst. The (small?) problem with that is that as far as I
know, I cannot do mov(4) operations that honor dst.dw1.bits.writemask, so I
have to do that manually emitting mov(1) operations for each channel enabled.
Is there a better way to handle this?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
</ul>
</body>
</html>