On Aug 14, 2014 9:49 PM, "Kenneth Graunke" <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>> wrote: > > On Thursday, August 14, 2014 08:51:24 PM Matt Turner wrote: > > Cuts about 1.5k of text size and reduces the compile time from 23~27 to > > 19 seconds. > > > > text data bss dec hex filename > > 243337 0 0 243337 3b689 .libs/format_utils.o > > 241807 0 0 241807 3b08f .libs/format_utils.o > > --- > > Numbers from gcc-4.8.2 on an amd64 system. Hopefully this improves > > compile time on x86 by a bunch more. > > > > src/mesa/main/format_utils.c | 20 ++++++++++++-------- > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > diff --git a/src/mesa/main/format_utils.c b/src/mesa/main/format_utils.c > > index 240e3bc..b24e067 100644 > > --- a/src/mesa/main/format_utils.c > > +++ b/src/mesa/main/format_utils.c > > @@ -318,15 +318,19 @@ swizzle_convert_try_memcpy(void *dst, GLenum dst_type, int num_dst_channels, > > tmp[j] = CONV; \ > > } \ > > \ > > - typed_dst[0] = tmp[swizzle_x]; \ > > - if (DST_CHANS > 1) { \ > > + switch (4 - DST_CHANS) { \ > > + case 3: \ > > + typed_dst[0] = tmp[swizzle_x]; \ > > + /* fallthrough */ \ > > + case 2: \ > > typed_dst[1] = tmp[swizzle_y]; \ > > - if (DST_CHANS > 2) { \ > > - typed_dst[2] = tmp[swizzle_z]; \ > > - if (DST_CHANS > 3) { \ > > - typed_dst[3] = tmp[swizzle_w]; \ > > - } \ > > - } \ > > + /* fallthrough */ \ > > + case 1: \ > > + typed_dst[2] = tmp[swizzle_z]; \ > > + /* fallthrough */ \ > > + case 0: \ > > + typed_dst[3] = tmp[swizzle_w]; \ > > + /* fallthrough */ \ > > } \ > > typed_src += SRC_CHANS; \ > > typed_dst += DST_CHANS; \ > > > > It doesn't seem like this does the same thing...so your new code is: > > switch (4 - DST_CHANS) { > case 3: // DST_CHANS == 1 > typed_dst[0] = tmp[swizzle_x]; > case 2: // DST_CHANS == 2 > typed_dst[1] = tmp[swizzle_y]; > case 1: // DST_CHANS == 3 > typed_dst[2] = tmp[swizzle_z]; > case 0: // DST_CHANS == 4 > typed_dst[3] = tmp[swizzle_w]; > } > > So when DST_CHANS == 1...your new code would run: > > typed_dst[0] = tmp[swizzle_x]; > typed_dst[1] = tmp[swizzle_y]; > typed_dst[2] = tmp[swizzle_z]; > typed_dst[3] = tmp[swizzle_w]; > > and when it's 2, it would run... > > typed_dst[1] = tmp[swizzle_y]; > typed_dst[2] = tmp[swizzle_z]; > typed_dst[3] = tmp[swizzle_w]; > > I think instead you want: > > switch (DST_CHANS) { > case 4: > typed_dst[3] = tmp[swizzle_w]; > /* fallthrough */ > case 3: > typed_dst[2] = tmp[swizzle_z]; > /* fallthrough */ > case 2: > typed_dst[1] = tmp[swizzle_y]; > /* fallthrough */ > case 1: > typed_dst[0] = tmp[swizzle_x]; > /* fallthrough */ > } The problem that I *think* Matt was cleverly trying to solve was to keep the writes in-order. Apparently (according to a comment in the old texture conversation code, out-of-order memory accesses caused performance problems on some systems. That's why I didn't implement it that way in the first place. Unfortunately, I haven't been able to observe this difference on my machine. We could also use a for loop although that's demonstrably slower. --Jason