[Mesa-dev] [PATCH] mesa: Optimize SWIZZLE_CONVERT_LOOP macro.
Jason Ekstrand
jason at jlekstrand.net
Thu Aug 14 21:55:34 PDT 2014
On Aug 14, 2014 9:49 PM, "Kenneth Graunke" <kenneth at whitecape.org> wrote:
>
> On Thursday, August 14, 2014 08:51:24 PM Matt Turner wrote:
> > Cuts about 1.5k of text size and reduces the compile time from 23~27 to
> > 19 seconds.
> >
> > text data bss dec hex filename
> > 243337 0 0 243337 3b689 .libs/format_utils.o
> > 241807 0 0 241807 3b08f .libs/format_utils.o
> > ---
> > Numbers from gcc-4.8.2 on an amd64 system. Hopefully this improves
> > compile time on x86 by a bunch more.
> >
> > src/mesa/main/format_utils.c | 20 ++++++++++++--------
> > 1 file changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/src/mesa/main/format_utils.c b/src/mesa/main/format_utils.c
> > index 240e3bc..b24e067 100644
> > --- a/src/mesa/main/format_utils.c
> > +++ b/src/mesa/main/format_utils.c
> > @@ -318,15 +318,19 @@ swizzle_convert_try_memcpy(void *dst, GLenum
dst_type, int num_dst_channels,
> > tmp[j] = CONV; \
> > } \
> > \
> > - typed_dst[0] = tmp[swizzle_x]; \
> > - if (DST_CHANS > 1) { \
> > + switch (4 - DST_CHANS) { \
> > + case 3: \
> > + typed_dst[0] = tmp[swizzle_x]; \
> > + /* fallthrough */ \
> > + case 2: \
> > typed_dst[1] = tmp[swizzle_y]; \
> > - if (DST_CHANS > 2) { \
> > - typed_dst[2] = tmp[swizzle_z]; \
> > - if (DST_CHANS > 3) { \
> > - typed_dst[3] = tmp[swizzle_w]; \
> > - } \
> > - } \
> > + /* fallthrough */ \
> > + case 1: \
> > + typed_dst[2] = tmp[swizzle_z]; \
> > + /* fallthrough */ \
> > + case 0: \
> > + typed_dst[3] = tmp[swizzle_w]; \
> > + /* fallthrough */ \
> > } \
> > typed_src += SRC_CHANS; \
> > typed_dst += DST_CHANS; \
> >
>
> It doesn't seem like this does the same thing...so your new code is:
>
> switch (4 - DST_CHANS) {
> case 3: // DST_CHANS == 1
> typed_dst[0] = tmp[swizzle_x];
> case 2: // DST_CHANS == 2
> typed_dst[1] = tmp[swizzle_y];
> case 1: // DST_CHANS == 3
> typed_dst[2] = tmp[swizzle_z];
> case 0: // DST_CHANS == 4
> typed_dst[3] = tmp[swizzle_w];
> }
>
> So when DST_CHANS == 1...your new code would run:
>
> typed_dst[0] = tmp[swizzle_x];
> typed_dst[1] = tmp[swizzle_y];
> typed_dst[2] = tmp[swizzle_z];
> typed_dst[3] = tmp[swizzle_w];
>
> and when it's 2, it would run...
>
> typed_dst[1] = tmp[swizzle_y];
> typed_dst[2] = tmp[swizzle_z];
> typed_dst[3] = tmp[swizzle_w];
>
> I think instead you want:
>
> switch (DST_CHANS) {
> case 4:
> typed_dst[3] = tmp[swizzle_w];
> /* fallthrough */
> case 3:
> typed_dst[2] = tmp[swizzle_z];
> /* fallthrough */
> case 2:
> typed_dst[1] = tmp[swizzle_y];
> /* fallthrough */
> case 1:
> typed_dst[0] = tmp[swizzle_x];
> /* fallthrough */
> }
The problem that I *think* Matt was cleverly trying to solve was to keep
the writes in-order. Apparently (according to a comment in the old texture
conversation code, out-of-order memory accesses caused performance problems
on some systems. That's why I didn't implement it that way in the first
place. Unfortunately, I haven't been able to observe this difference on my
machine. We could also use a for loop although that's demonstrably slower.
--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140814/c1feca90/attachment-0001.html>
More information about the mesa-dev
mailing list