[Mesa-dev] [PATCH 1/2] draw: simplify (and correct) aaline fallback (v2)

Tue Mar 6 21:42:57 UTC 2018

Am 06.03.2018 um 21:52 schrieb Brian Paul:
> Looks good.  That certainly does simplify things.  Two minor suggestions
> below.
> 
> In any case, for both, Reviewed-by: Brian Paul <brianp at vmware.com>
> 
> On 03/06/2018 01:34 PM, sroland at vmware.com wrote:
>> From: Roland Scheidegger <sroland at vmware.com>
>>
>> The motivation actually was to get rid of the additional tex
>> instruction, since that requires the draw fallback code to intercept
>> all sampler / view calls (even if the fallback is never hit).
>> Basically, the idea is to use coverage of the pixel to calculate
>> the alpha value, and coverage is simply based on the distance
>> to the center of the line (in both line direction, which is useful
>> for wide lines, as well as perpendicular to the line).
>> This is much closer to what hw supporting this natively actually does.
>> It also fixes an issue with line width not quite being correct, as
>> well as endpoints getting stretched too far (in line direction) with
>> wide lines, which is apparent with mesa demo line-width.
>> (For llvmpipe, it would probably make sense to do something like this
>> directly when drawing lines, since rendering two tris is twice as
>> expensive as a line, but it would need some changes with state
>> management.)
>> Since we're no longer relying on mipmapping to get the alpha value,
>> we also don't need to draw 3 rects (6 tris), one is sufficient.
>>
>> There's still issues (as before):
>> - quite sure it's not correct without half_pixel_center, but can't test
>> this with GL.
>> - aaline + line stipple is incorrect (evident with line-width demo).
>> Looking at the spec the stipple pattern should actually be based on
>> distance (not just dx or dy for x/y major lines as without aa).
>> - outputs (other than pos + the one used for line aa) should be
>> reinterpolated since we actually increase line length by half a pixel
>> (but there's no tests which would care).
>>
>> v2: simplify the math (should be equivalent), don't need immediate
>> ---
>>   src/gallium/auxiliary/draw/draw_pipe_aaline.c | 504
>> +++++---------------------
>>   1 file changed, 100 insertions(+), 404 deletions(-)
>>


>> diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c
>> b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
>> index a859dbc..591e2a3 100644
>> @@ -210,49 +155,29 @@ aa_transform_prolog(struct
>> tgsi_transform_context *ctx)
>>      struct aa_transform_context *aactx = (struct aa_transform_context
>> *) ctx;
>>      uint i;
>>   -   STATIC_ASSERT(sizeof(aactx->samplersUsed) * 8 >=
>> PIPE_MAX_SAMPLERS);
>> -
>> -   /* find free sampler */
>> -   aactx->freeSampler = free_bit(aactx->samplersUsed);
>> -   if (aactx->freeSampler < 0 || aactx->freeSampler >=
>> PIPE_MAX_SAMPLERS)
>> -      aactx->freeSampler = PIPE_MAX_SAMPLERS - 1;
>>        /* find two free temp regs */
>> -   for (i = 0; i < 32; i++) {
>> +   for (i = 0; i < 64; i++) {
>>         if ((aactx->tempsUsed & (1 << i)) == 0) {
>>         /* found a free temp */
>>         if (aactx->colorTemp < 0)
>>            aactx->colorTemp  = i;
>> -      else if (aactx->texTemp < 0)
>> -         aactx->texTemp  = i;
>> +      else if (aactx->aaTemp < 0)
>> +         aactx->aaTemp  = i;
>>         else
>>            break;
>>         }
>>      }
> 
> The loop could probably be eliminated and replaced with a couple
> ffsll(~tempsUsed) calls.
Right.


>>   @@ -266,13 +191,41 @@ aa_transform_epilog(struct
>> tgsi_transform_context *ctx)
>>      struct aa_transform_context *aactx = (struct aa_transform_context
>> *) ctx;
>>        if (aactx->colorOutput != -1) {
>> -      /* insert texture sampling code for antialiasing. */
>> -
>> -      /* TEX texTemp, input_coord, sampler, 2D */
>> -      tgsi_transform_tex_inst(ctx,
>> -                              TGSI_FILE_TEMPORARY, aactx->texTemp,
>> -                              TGSI_FILE_INPUT, aactx->maxInput + 1,
>> -                              TGSI_TEXTURE_2D, aactx->freeSampler);
>> +      struct tgsi_full_instruction inst;
>> +      /* insert distance-based coverage code for antialiasing. */
>> +
>> +      /* saturate(linewidth - fabs(interpx), linelength -
>> fabs(interpz) */
>> +      inst = tgsi_default_full_instruction();
>> +      inst.Instruction.Saturate = true;
>> +      inst.Instruction.Opcode = TGSI_OPCODE_ADD;
>> +      inst.Instruction.NumDstRegs = 1;
>> +      tgsi_transform_dst_reg(&inst.Dst[0], TGSI_FILE_TEMPORARY,
>> +                             aactx->aaTemp, TGSI_WRITEMASK_XZ);
>> +      inst.Instruction.NumSrcRegs = 2;
>> +      tgsi_transform_src_reg(&inst.Src[1], TGSI_FILE_INPUT,
>> aactx->maxInput + 1,
>> +                             TGSI_SWIZZLE_X, TGSI_SWIZZLE_X,
>> +                             TGSI_SWIZZLE_Z, TGSI_SWIZZLE_Z);
>> +      tgsi_transform_src_reg(&inst.Src[0], TGSI_FILE_INPUT,
>> aactx->maxInput + 1,
>> +                             TGSI_SWIZZLE_Y, TGSI_SWIZZLE_Y,
>> +                             TGSI_SWIZZLE_W, TGSI_SWIZZLE_W);
>> +      inst.Src[1].Register.Absolute = true;
>> +      inst.Src[1].Register.Negate = true;
>> +      ctx->emit_instruction(ctx, &inst);
>> +
>> +      /* MUL width / height alpha */
>> +      inst = tgsi_default_full_instruction();
>> +      inst.Instruction.Opcode = TGSI_OPCODE_MUL;
>> +      inst.Instruction.NumDstRegs = 1;
>> +      tgsi_transform_dst_reg(&inst.Dst[0], TGSI_FILE_TEMPORARY,
>> +                             aactx->aaTemp, TGSI_WRITEMASK_W);
>> +      inst.Instruction.NumSrcRegs = 2;
>> +      tgsi_transform_src_reg(&inst.Src[0], TGSI_FILE_TEMPORARY,
>> aactx->aaTemp,
>> +                             TGSI_SWIZZLE_X, TGSI_SWIZZLE_X,
>> +                             TGSI_SWIZZLE_X, TGSI_SWIZZLE_X);
>> +      tgsi_transform_src_reg(&inst.Src[1], TGSI_FILE_TEMPORARY,
>> aactx->aaTemp,
>> +                             TGSI_SWIZZLE_Z, TGSI_SWIZZLE_Z,
>> +                             TGSI_SWIZZLE_Z, TGSI_SWIZZLE_Z);
>> +      ctx->emit_instruction(ctx, &inst);
> 
> I think you might be able to use tgsi_transform_op2_swz_inst() here.
> 
Ah yes.
(As a side note, while writing this, I actually found the transform code
majorly annoying due to restrictions of the tgsi_transform_opX_xx
instructions. If you need to change it because it wasn't quite right,
there's a missing possibility for abs here, swizzle which can only
handle non-scalar swizzles there etc. - and if you used these simplified
instructions in the first place you have to go back and rewrite it
completely using the full one. Naturally you converge on using the
non-simplified instructions for everything in the end.)

Roland