[Mesa-dev] i965: Need help testing patches to fix bogus antaliasing in Gen4/Gen5

Fri May 23 02:58:34 PDT 2014

Hi,

Short story:

I have two implementations of a patch to fix this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=78679

I would need help testing both on Gen4 (I don't have Gen4 hardware
available) and I would also like to understand why only on of them works
on Gen5. There is a patch making both implementations available for
testing attached to the bug report.

Following are the details of the problem and some issues I see with the
two implementations I have. Hopefully someone can help me address the
questions/problems I mention below.

Long story:

The bug report is related to evaluating whether
c->runtime_check_aads_emit is still needed and in that case put back the
code that dealt with it and that was removed with commit 098acf6c843.

I found that this flag is relevant for correct rendering in Gen < 6 when
the following requirements are met:

- Rendering polygons (not lines)
- Rendering exactly one of the polygon's faces in GL_LINE mode and the
other in GL_FILL mode (and none of them are culled).
- Enable smooth lines (antaliased lines).

I attached a sample program to the bug report that generates this
scenario. Running the program in IronLake shows incorrect rendering of
the GL_FILL face of the polygon and correct rendering of the GL_LINE
face.

The idea here is that with this setup in Gen < 6 the hardware generates
a runtime bit that indicates whether AA data needs to be sent as part of
the frame buffer write SEND message. Particularly, AA data has to be
sent when rendering the GL_LINE face of the polygon, but not when
rendering the GL_FILL face for things to work properly.

The current code always sends AA info which produces the incorrect
rendering results in the sample program I mentioned. Instead,
runtime_check_aads_emit should be checked and if TRUE we should generate
code that tests this bit at runtime and changes the SEND message
accordingly.

The code removed in the mentioned commit apparently addressed this for
Gen4, however rendering in Gen5 was not addressed even before that
commit because the code path that included the code to deal with this
did not run in Gen5, so for IronLake this has always been broken as far
as I can tell and, presumably, it has been broken for Gen < 5 since that
commit.

The original code used a brw_JMPI instruction to handle the runtime
check when runtime_check_aads_emit==TRUE. Adapted to the current code
base it would look something like this:

 brw_set_conditionalmod(p, BRW_CONDITIONAL_NZ);
 brw_AND(p,
         v1_null_ud,
         get_element_ud(brw_vec8_grf(1,0), 6),
         brw_imm_ud(1<<26));
 jmp = brw_JMPI(p, ip, ip, brw_imm_ud(0)) - p->store;
 {
    fire_fb_write(inst, inst->base_mrf+1, implied_header, inst->mlen-1);
 }
 brw_land_fwd_jump(p, jmp);
 fire_fb_write(inst, inst->base_mrf, implied_header, inst->mlen);

Basically the idea here is to shift the message header by one position
to fill in the space that would otherwise be used for AA data (stored in
base_mrf+2) when the runtime test indicates that we should not send AA
info in the SEND message.

However, at least in my IronLake hardware this does not produce correct
rendering results either. The face of the polygon in GL_FILL mode,
although different from before, is still not rendered properly. The
original code included comments suggesting that the resulting program
would be equivalent to an IF/ELSE structure where only one of the
fire_fb_writes will be executed.

However, I found that this other code that in theory should be
equivalent, produces correct results:

 struct brw_reg v1_null_ud =
   vec1(retype(brw_null_reg(), BRW_REGISTER_TYPE_UD));
 struct brw_reg tmp_reg_ud = brw_vec1_grf(BRW_MAX_GRF-1, 0);
 brw_AND(p,
         tmp_reg_ud,
         get_element_ud(brw_vec8_grf(1,0), 6),
         brw_imm_ud(1<<26));
 brw_CMP(p,
         v1_null_ud,
         BRW_CONDITIONAL_Z,
         tmp_reg_ud,
         brw_imm_ud(0));
 brw_IF(p, BRW_EXECUTE_1);
 {
    fire_fb_write(inst, inst->base_mrf+1, implied_header, inst->mlen-1);
 }
 brw_ELSE(p);
 {
    fire_fb_write(inst, inst->base_mrf, implied_header, inst->mlen);
 }
 brw_ENDIF(p);

I have no idea why this happens. Also, since the implementation that
follows the strategy from the original code does not work on IronLake, I
wonder if that means that it was not working correctly in Gen < 5
either.

Can anyone explain if the is any difference between both
implementations? If it helps, here is the assembly code produced for
both implementations (notice that fire_fb_write produces a mov and a
send):

IF/ELSE/ENDIF version:

and(1)       g127<1>F  g1.6<0,1,0>UD  0x04000000UD    { align1 nomask };
cmp.e.f0(1)  null      g127<0,1,0>F   0x00000000UD    { align1 nomask };
(+f0) if(1)            ip             6D              { align1 switch };
mov(8)       m3<1>F    g1<8,8,1>F                     { align1 nomask };
send(16) 2   null      g0<8,8,1>UW
  write (0, 8, 4, 0) mlen 10 rlen 0               { align1 nomask EOT };
else(1)                ip             65544D          { align1 switch };
mov(8)       m2<1>F    g1<8,8,1>F                     { align1 nomask };
send(16) 1   null      g0<8,8,1>UW
  write (0, 8, 4, 0) mlen 11 rlen 0               { align1 nomask EOT };
endif(1)

JMPI version:

and.ne.f0(1) null      g1.6<0,1,0>UD  0x04000000UD    { align1 nomask };
(+f0) jmpi(2) 4                                       { align1 nomask };
mov(8)       m3<1>F    g1<8,8,1>F                     { align1 nomask };
send(16) 2   null      g0<8,8,1>UW
  write (0, 8, 4, 0) mlen 10 rlen 0               { align1 nomask EOT };
mov(8)       m2<1>F    g1<8,8,1>F                     { align1 nomask };
send(16) 1   null      g0<8,8,1>UW
  write (0, 8, 4, 0) mlen 11 rlen 0               { align1 nomask EOT };

I think it would be better if we could go with the JMPI implementation
(if someone tells me how to make that work as intended) since the one
with IF/ELSE/ENDIF requires an additional GRF register to deal with the
AND/CMP instructions and I have not found any API to query unused GRF
registers. As you can see I am using BRW_MAX_GRF-1, hoping that this is
unlikely to be assigned in normal situations, but this is obviously not
a good solution.

Iago