<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Add FP64 support to the i965 shader backends"
href="https://bugs.freedesktop.org/show_bug.cgi?id=92760#c42">Comment # 42</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Add FP64 support to the i965 shader backends"
href="https://bugs.freedesktop.org/show_bug.cgi?id=92760">bug 92760</a>
from <span class="vcard"><a class="email" href="mailto:itoral@igalia.com" title="Iago Toral <itoral@igalia.com>"> <span class="fn">Iago Toral</span></a>
</span></b>
<pre>(In reply to Iago Toral from <a href="show_bug.cgi?id=92760#c41">comment #41</a>)
<span class="quote">> Hi Connor, I have a question about the brw_nir_split_doubles pass that you
> wrote for the vec4 backend. The pass does not lower nir_op_vec3/4 on purpose
> with this comment:
>
> /* These ops are the ones that group up dvec2's and doubles into dvec3's
> * and dvec4's when necessary, so we don't lower them. If they're
> * unnecessary, copy propagation will clean them up.
> */
>
> However, this obviously leads to 64-bit instructions writing to channels ZW,
> which we don't want to have since our Nir->vec4 pass expects that any 64-bit
> operation won't have a writemask including channels other than XY.
>
> Right now, the lower_vec_to_movs pass that we run right after the
> nir_from_ssa pass seems to generate MOVs that write to each channel of the
> vecN instruction dest, so with this, it generates MOVs with 64-bit things
> that write to components Z and W of a dvec3/4.
>
> I suppose your idea was to break up ALU operations, then group them back as
> vec3/vec4 operations so we don't lose track of the original size of the data
> elements involved in the operations. If that is the case, I think we can
> disable lower_vec_to_movs() on dvec3/dvec4 and let the nir-vec4 pass handle
> those. Does this make sense to you? Did you have a different idea about how
> this should work?</span >
Or maybe you expected that the MOVs in lower_vec_to_movs would always be
coalesced so we would never really emit instructions to generate the vec3/4 at
all? This is not happening because of the presence of source modifiers in the
instructions that use the result of the vecN operation. I suppose we can detect
these cases and fix them by inserting a MOV to a temporary with the source
modifier and then rewriting the instruction to consume this instead of the
original value.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
</ul>
</body>
</html>