[Mesa-dev] [PATCH] Better GPU program optimization in Mesa front end
Jakob Bornecrantz
jakob at vmware.com
Mon Jul 26 17:19:29 PDT 2010
mov dst.xz src.yyww
The second and y and w can be anything.
Cheers Jakob.
On 26 jul 2010, at 17.14, Segovia, Benjamin wrote:
> So basically,
>
> "mov dst.xz src.yw" is impossible with mesa code?
>
> Ben
>
> ________________________________________
> From: mesa-dev-bounces+benjamin.segovia=intel.com at lists.freedesktop.org
> [mesa-dev-bounces+benjamin.segovia=intel.com at lists.freedesktop.org]
> On Behalf Of Segovia, Benjamin [benjamin.segovia at intel.com]
> Sent: Monday, July 26, 2010 6:13 PM
> To: Eric Anholt; mesa-dev at lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH] Better GPU program optimization in
> Mesa front end
>
> - Hmm, first bug is clearly something I misunderstood. You say:
>
> in "mov dst.xz src" src will use .xz and not .xy? But the swizzle is
> still. xyzw or do I miss a point?
>
> - I am not sure but the other test were skipped I think with my
> version of piglit + mesa. I have to check that again.
>
> - For bit_scan, bit_cound, what would you prefer?
>
> Cheers,
> Ben
>
> ________________________________________
> From: Eric Anholt [eric at anholt.net]
> Sent: Monday, July 26, 2010 6:03 PM
> To: Segovia, Benjamin; mesa-dev at lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH] Better GPU program optimization in
> Mesa front end
>
> On Mon, 19 Jul 2010 16:01:02 -0700, Benjamin Segovia <benjamin.segovia at intel.com
> > wrote:
>> - Improved optimization of GPU programs. Now, swizzling is taken
>> into account
>> and swizzles are properly transformed while removing mov
>> instructions.
>> Removals of mov instructions are now much more effective
>>
>> - Analysis of control flows is still very primitive and far more too
>> conservative. Shaders using a lot of branches will be less
>> optimized than
>> straightforward ones
>>
>> - Main things to do next is:
>> * instruction merging like for example merging:
>> mul a.x b.x c.x
>> mul a.y b.y c.y
>> mul a.z b.z c.z
>> into
>> mul a.xyz b.xyz c.xyz
>> * register renaming to avoid some still unecessary movs
>>
>> - Tested with piglit. I run all the shaders and compare output from
>> the new
>> version with the old one. Also, run openarena, nexuiz and warsow.
>> All games
>> perfectly run and GPU code is clearly improved. Note that I only
>> use my Intel
>> Gen GPU for the backend. So everything was tested using classic
>> Mesa with
>> the Intel i965 driver.
>
> Added two new testcases to piglit, glsl-fs-add-masked and
> glsl-fs-mov-masked, to catch bugs in this that I found while reviewing
> the code after my GLSL demo started failing. Patch included for
> glsl-fs-add-masked (MOV dst.xz, src uses the X and Z channels of src,
> not X and Y), feel free to squash it into a fixed patch.
>
> glsl-vs-arrays, glsl-vs-arrays-2, and glsl-vs-mov-after-deref also
> started failing for me as well after applying your change. Lots of
> important code seems to have been optimized out. I'm guessing RelAddr
> handling is broken. Everywhere else that bit_count is used looks
> bogus
> to me, but I haven't constructed tests for them yet.
>
> I'm not a fan of bit_clear[], bit_scan[], or expand_one[] either.
> They
> obfuscate the code to me.
>
> From 9c7fb7b5b6f299b19d27245f2b68853521499afa Mon Sep 17 00:00:00 2001
> From: Eric Anholt <eric at anholt.net>
> Date: Mon, 26 Jul 2010 16:20:11 -0700
> Subject: [PATCH] mesa: Fix bug in handling of read channel masks
> get_src_arg_mask.
>
> Fixes piglit glsl-fs-add-masked, and my GLSL demo.
> ---
> src/mesa/program/prog_optimize.c | 14 ++++----------
> 1 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/src/mesa/program/prog_optimize.c b/src/mesa/program/
> prog_optimize.c
> index 9119b10..c853330 100644
> --- a/src/mesa/program/prog_optimize.c
> +++ b/src/mesa/program/prog_optimize.c
> @@ -51,12 +51,11 @@ static const int expand_one[5] = {0,1,3,7,15};
> static GLuint
> get_src_arg_mask(const struct prog_instruction *inst, int arg, int
> dst_mask)
> {
> - int read_mask, channel_mask, read_count;
> + int read_mask, channel_mask;
> int comp;
>
> /* Form the dst register, find the written channels */
> if (inst->CondUpdate) {
> - read_count = 4;
> channel_mask = WRITEMASK_XYZW;
> }
> else {
> @@ -67,8 +66,7 @@ get_src_arg_mask(const struct prog_instruction
> *inst, int arg, int dst_mask)
> case OPCODE_MAD:
> case OPCODE_MUL:
> case OPCODE_SUB:
> - read_count = bit_count[inst->DstReg.WriteMask & dst_mask];
> - channel_mask = expand_one[read_count];
> + channel_mask = inst->DstReg.WriteMask & dst_mask;
> break;
> case OPCODE_RCP:
> case OPCODE_SIN:
> @@ -76,20 +74,16 @@ get_src_arg_mask(const struct prog_instruction
> *inst, int arg, int dst_mask)
> case OPCODE_RSQ:
> case OPCODE_POW:
> case OPCODE_EX2:
> - read_count = 1;
> channel_mask = WRITEMASK_X;
> break;
> case OPCODE_DP2:
> - read_count = 2;
> channel_mask = WRITEMASK_XY;
> break;
> case OPCODE_DP3:
> case OPCODE_XPD:
> - read_count = 3;
> channel_mask = WRITEMASK_XYZ;
> break;
> default:
> - read_count = 4;
> channel_mask = WRITEMASK_XYZW;
> break;
> }
> @@ -99,9 +93,9 @@ get_src_arg_mask(const struct prog_instruction
> *inst, int arg, int dst_mask)
> * components are actually read
> */
> read_mask = 0;
> - for (comp = 0; comp < read_count; ++comp) {
> + for (comp = 0; comp < 4; ++comp) {
> const int coord = GET_SWZ(inst->SrcReg[arg].Swizzle, comp);
> - if (coord <= SWIZZLE_W)
> + if (channel_mask & (1 << comp) && coord <= SWIZZLE_W)
> read_mask |= 1 << coord;
> }
>
> --
> 1.7.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list