<p dir="ltr"><br>
On Aug 14, 2014 9:49 PM, "Kenneth Graunke" <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>> wrote:<br>
><br>
> On Thursday, August 14, 2014 08:51:24 PM Matt Turner wrote:<br>
> > Cuts about 1.5k of text size and reduces the compile time from 23~27 to<br>
> > 19 seconds.<br>
> ><br>
> > text data bss dec hex filename<br>
> > 243337 0 0 243337 3b689 .libs/format_utils.o<br>
> > 241807 0 0 241807 3b08f .libs/format_utils.o<br>
> > ---<br>
> > Numbers from gcc-4.8.2 on an amd64 system. Hopefully this improves<br>
> > compile time on x86 by a bunch more.<br>
> ><br>
> > src/mesa/main/format_utils.c | 20 ++++++++++++--------<br>
> > 1 file changed, 12 insertions(+), 8 deletions(-)<br>
> ><br>
> > diff --git a/src/mesa/main/format_utils.c b/src/mesa/main/format_utils.c<br>
> > index 240e3bc..b24e067 100644<br>
> > --- a/src/mesa/main/format_utils.c<br>
> > +++ b/src/mesa/main/format_utils.c<br>
> > @@ -318,15 +318,19 @@ swizzle_convert_try_memcpy(void *dst, GLenum dst_type, int num_dst_channels,<br>
> > tmp[j] = CONV; \<br>
> > } \<br>
> > \<br>
> > - typed_dst[0] = tmp[swizzle_x]; \<br>
> > - if (DST_CHANS > 1) { \<br>
> > + switch (4 - DST_CHANS) { \<br>
> > + case 3: \<br>
> > + typed_dst[0] = tmp[swizzle_x]; \<br>
> > + /* fallthrough */ \<br>
> > + case 2: \<br>
> > typed_dst[1] = tmp[swizzle_y]; \<br>
> > - if (DST_CHANS > 2) { \<br>
> > - typed_dst[2] = tmp[swizzle_z]; \<br>
> > - if (DST_CHANS > 3) { \<br>
> > - typed_dst[3] = tmp[swizzle_w]; \<br>
> > - } \<br>
> > - } \<br>
> > + /* fallthrough */ \<br>
> > + case 1: \<br>
> > + typed_dst[2] = tmp[swizzle_z]; \<br>
> > + /* fallthrough */ \<br>
> > + case 0: \<br>
> > + typed_dst[3] = tmp[swizzle_w]; \<br>
> > + /* fallthrough */ \<br>
> > } \<br>
> > typed_src += SRC_CHANS; \<br>
> > typed_dst += DST_CHANS; \<br>
> ><br>
><br>
> It doesn't seem like this does the same thing...so your new code is:<br>
><br>
> switch (4 - DST_CHANS) {<br>
> case 3: // DST_CHANS == 1<br>
> typed_dst[0] = tmp[swizzle_x];<br>
> case 2: // DST_CHANS == 2<br>
> typed_dst[1] = tmp[swizzle_y];<br>
> case 1: // DST_CHANS == 3<br>
> typed_dst[2] = tmp[swizzle_z];<br>
> case 0: // DST_CHANS == 4<br>
> typed_dst[3] = tmp[swizzle_w];<br>
> }<br>
><br>
> So when DST_CHANS == 1...your new code would run:<br>
><br>
> typed_dst[0] = tmp[swizzle_x];<br>
> typed_dst[1] = tmp[swizzle_y];<br>
> typed_dst[2] = tmp[swizzle_z];<br>
> typed_dst[3] = tmp[swizzle_w];<br>
><br>
> and when it's 2, it would run...<br>
><br>
> typed_dst[1] = tmp[swizzle_y];<br>
> typed_dst[2] = tmp[swizzle_z];<br>
> typed_dst[3] = tmp[swizzle_w];<br>
><br>
> I think instead you want:<br>
><br>
> switch (DST_CHANS) {<br>
> case 4:<br>
> typed_dst[3] = tmp[swizzle_w];<br>
> /* fallthrough */<br>
> case 3:<br>
> typed_dst[2] = tmp[swizzle_z];<br>
> /* fallthrough */<br>
> case 2:<br>
> typed_dst[1] = tmp[swizzle_y];<br>
> /* fallthrough */<br>
> case 1:<br>
> typed_dst[0] = tmp[swizzle_x];<br>
> /* fallthrough */<br>
> }</p>
<p dir="ltr">The problem that I *think* Matt was cleverly trying to solve was to keep the writes in-order. Apparently (according to a comment in the old texture conversation code, out-of-order memory accesses caused performance problems on some systems. That's why I didn't implement it that way in the first place. Unfortunately, I haven't been able to observe this difference on my machine. We could also use a for loop although that's demonstrably slower.<br>
--Jason<br>
</p>