[Mesa-dev] [PATCH 6/6] translate_sse: major rewrite

Fri Aug 13 03:00:49 PDT 2010

On Thu, 2010-08-12 at 10:22 -0700, Luca Barbieri wrote:
> translate_sse is currently very limited to the point of
> being useless in essentially all cases.
> 
> In particular, it only support some float32 and unorm8
> formats and doesn't work on x86-64.
> 
> This commit rewrites it to support:
> 1. Dumb memory copy for any pair of identical formats
> 2. All formats that are swizzles of each other
> 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
> 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
> 5. Support for x86-64 (doesn't take advantage of it in any way though)
> 
> This new translate can even be useful to translate index buffers for
> cards that lack 8-bit index support.
> 
> It passes the testsuite I wrote, but note that this is a major change, and more
> testing would be great.

Luca,

Beyond a few niggles, this looking great - an impressive body of work...

Couple of comments:

-static void emit_load_R32G32( struct translate_sse *p, 
-			   struct x86_reg data,
-			   struct x86_reg arg0 )
+/* out_chans = 5 means we want 4 channels with 1 in alpha instead of 0 */
+static void emit_load_float32( struct translate_sse *p,
+                                       struct x86_reg data,
+                                       struct x86_reg arg0,
+                                       unsigned out_chans,
+                                       unsigned chans)
 {

Is it possible to use an explicit flag for the (out_chans == 5) case?  

....

   case 8:
+#ifdef PIPE_ARCH_X86_64
+      x64_mov64(p->func, dataGPR, src);
+      x64_mov64(p->func, dst, dataGPR);
+#else
+      sse_movlps(p->func, dataXMM, src);
+      sse_movlps(p->func, dst, dataXMM);
+#endif
+      break;
+   case 12:
+#ifdef PIPE_ARCH_X86_64
+      x64_mov64(p->func, dataGPR2, src);
+#else
+      sse_movlps(p->func, dataXMM, src);
+#endif
+      x86_mov(p->func, dataGPR, x86_make_disp(src, 8));
+#ifdef PIPE_ARCH_X86_64
+      x64_mov64(p->func, dst, dataGPR2);
+#else
+      sse_movlps(p->func, dst, dataXMM);
+#endif
+      x86_mov(p->func, x86_make_disp(dst, 8), dataGPR);

Is it possible to do this without all the #ifdefs?  Even if statements
based on a preprocessor variable would be easier to read, but better
still would be some sort of wrapper function which just did the right
thing on either architecture.

Similar comment applies to your x86-64 changes in rtasm.c -- is there a
way to reduce the #ifdef load?

...

+            // TODO: add support for SSE4.1 pmovzx

Probably want to use C-style comments throughout.

Keith