[Mesa-dev] [PATCH 3/3] llvmpipe: drop scissor planes early if the tri is fully inside them

Roland Scheidegger sroland at vmware.com
Tue Feb 2 20:12:38 UTC 2016


Am 02.02.2016 um 13:32 schrieb Jose Fonseca:
> On 01/02/16 01:00, sroland at vmware.com wrote:
>> From: Roland Scheidegger <sroland at vmware.com>
>>
>> If the tri is fully inside a scissor edge (or rather, we just use the
>> bounding box of the tri for the comparison), then we can drop these
>> additional scissor "planes" early. We do not even need to allocate
>> space for them in the tri.
>> The math actually appears to be slightly iffy due to bounding boxes
>> being rounded, but it doesn't matter in the end.
>> Those scissor rects are costly - the 4 planes from the scissor are
>> already more expensive to calculate than the 3 planes from the tri
>> itself,
>> and it also prevents us from using the specialized raster code for small
>> tris.
>> This helps openarena performance by about 8% or so. Of course, it helps
>> there that while openarena often enables scissoring (and even moves the
>> scissor rect around) I have not seen a single tri actually hit the
>> scissor rect, ever.
>>
>> v2: drop individual scissor edges, and do it earlier, not even allocating
>> space for them.
>> ---
>>   src/gallium/drivers/llvmpipe/lp_rast_tri.c   | 16 +++++
>>   src/gallium/drivers/llvmpipe/lp_setup_line.c | 81
>> ++++++++++++++---------
>>   src/gallium/drivers/llvmpipe/lp_setup_tri.c  | 98
>> +++++++++++++++++-----------
>>   3 files changed, 126 insertions(+), 69 deletions(-)
>>
>> diff --git a/src/gallium/drivers/llvmpipe/lp_rast_tri.c
>> b/src/gallium/drivers/llvmpipe/lp_rast_tri.c
>> index f4a2f02..bf27900 100644
>> --- a/src/gallium/drivers/llvmpipe/lp_rast_tri.c
>> +++ b/src/gallium/drivers/llvmpipe/lp_rast_tri.c
>> @@ -380,11 +380,27 @@ lp_rast_triangle_32_3_4(struct
>> lp_rasterizer_task *task,
>>       */
>>      dcdx = _mm_sub_epi32(zero, dcdx);
>>
>> +#if 0
>> +   {
>> +   __m128i lbits, fixup, xy, dcdxdy, dcdx16;
>> +   lbits = _mm_and_si128(c, _mm_set1_epi32(FIXED_ONE - 1));
>> +   fixup = _mm_cmpeq_epi32(lbits, _mm_setzero_si128());
>> +   c = _mm_srai_epi32(c, 8);
>> +   c = _mm_add_epi32(c, fixup);
>> +   xy = _mm_set1_epi32(x | y << 16);
>> +   dcdx = _mm_srai_epi32(dcdx, 8);
>> +   dcdy = _mm_srai_epi32(dcdy, 8);
>> +   dcdx16 = _mm_and_si128(_mm_set_epi16(0,-1,0,-1,0,-1,0,-1), dcdx);
>> +   dcdxdy = _mm_or_si128(dcdx16, _mm_slli_epi32(dcdy, 16));
>> +   c = _mm_add_epi32(c, _mm_madd_epi16(dcdxdy, xy));
>> +   }
>> +#else
>>      c = _mm_add_epi32(c, mm_mullo_epi32(dcdx, _mm_set1_epi32(x)));
>>      c = _mm_add_epi32(c, mm_mullo_epi32(dcdy, _mm_set1_epi32(y)));
>>
>>      /* Adjust so we can just check the sign bit (< 0 comparison),
>> instead of having to do a less efficient <= 0 comparison */
>>      c = _mm_sub_epi32(c, _mm_set1_epi32(1));
>> +#endif
>>
>>      dcdx2 = _mm_add_epi32(dcdx, dcdx);
>>      dcdx3 = _mm_add_epi32(dcdx2, dcdx);
>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup_line.c
>> b/src/gallium/drivers/llvmpipe/lp_setup_line.c
>> index f425825..3ec9ac4 100644
>> --- a/src/gallium/drivers/llvmpipe/lp_setup_line.c
>> +++ b/src/gallium/drivers/llvmpipe/lp_setup_line.c
>> @@ -336,13 +336,6 @@ try_setup_line( struct lp_setup_context *setup,
>>         layer = MIN2(layer, scene->fb_max_layer);
>>      }
>>
>> -   if (setup->scissor_test) {
>> -      nr_planes = 8;
>> -   }
>> -   else {
>> -      nr_planes = 4;
>> -   }
>> -
>>      dx = v1[0][0] - v2[0][0];
>>      dy = v1[0][1] - v2[0][1];
>>      area = (dx * dx  + dy * dy);
>> @@ -591,6 +584,20 @@ try_setup_line( struct lp_setup_context *setup,
>>      bbox.x0 = MAX2(bbox.x0, 0);
>>      bbox.y0 = MAX2(bbox.y0, 0);
>>
>> +   nr_planes = 4;
>> +   /*
>> +    * Determine how many scissor planes we need, that is drop scissor
>> +    * edges if the bounding box of the tri is fully inside that edge.
>> +    */
>> +   if (setup->scissor_test) {
>> +      /* why not just use draw_regions */
>> +      struct u_rect *scissor = &setup->scissors[viewport_index];
> 
> 
> Please use 4 bools:
> 
>   bool scissor_left = bbox.x0 < scissor->x0;
> 
> and reused them below.
> 
> I'm concern we could use >= for some of these -- though it might depend
> on the current rasterization rules.
I think it should be ok. We already have adjusted the bounding box for
the tri depending on rasterization rules so this shouldn't matter
anymore. But yes, the math is a bit non-obvious with the rounded
bounding box of the tri (I'd guess with msaa we'd definitely need
something better, but that would be true elsewhere too).

> 
> Anyway, using bools ensure these conditions stay in sync with the code
> bloew.
I already submitted this yesterday after incorporating Brian's feedback.
Those suggestions were actually incompatible with yours, you can have
either faster or nicer code here ;-).


> Maybe an even better alternative is have a "needed_scissor_planes"
> helper function that gets used every where.
I'm going to push something along these lines, making it nicer ;-)

Roland


> 
> 
> 
>> +      if (bbox.x0 < scissor->x0) nr_planes++;
>> +      if (bbox.x1 > scissor->x1) nr_planes++;
>> +      if (bbox.y0 < scissor->y0) nr_planes++;
>> +      if (bbox.y1 > scissor->y1) nr_planes++;
>> +   }
>> +
>>      line = lp_setup_alloc_triangle(scene,
>>                                     key->num_inputs,
>>                                     nr_planes,
>> @@ -708,30 +715,44 @@ try_setup_line( struct lp_setup_context *setup,
>>       * Note that otherwise, the scissor planes only vary in 'C' value,
>>       * and even then only on state-changes.  Could alternatively store
>>       * these planes elsewhere.
>> +    * (Or only store the c value together with a bit indicating which
>> +    * scissor edge this is, so rasterization would treat them
>> differently
>> +    * (easier to evaluate) to ordinary planes.)
>>       */
>> -   if (nr_planes == 8) {
>> -      const struct u_rect *scissor =
>> -         &setup->scissors[viewport_index];
>> -
>> -      plane[4].dcdx = -1 << 8;
>> -      plane[4].dcdy = 0;
>> -      plane[4].c = (1-scissor->x0) << 8;
>> -      plane[4].eo = 1 << 8;
>> -
>> -      plane[5].dcdx = 1 << 8;
>> -      plane[5].dcdy = 0;
>> -      plane[5].c = (scissor->x1+1) << 8;
>> -      plane[5].eo = 0;
>> -
>> -      plane[6].dcdx = 0;
>> -      plane[6].dcdy = 1 << 8;
>> -      plane[6].c = (1-scissor->y0) << 8;
>> -      plane[6].eo = 1 << 8;
>> -
>> -      plane[7].dcdx = 0;
>> -      plane[7].dcdy = -1 << 8;
>> -      plane[7].c = (scissor->y1+1) << 8;
>> -      plane[7].eo = 0;
>> +   if (nr_planes > 4) {
>> +      /* why not just use draw_regions */
>> +      struct u_rect *scissor = &setup->scissors[viewport_index];
>> +      unsigned scis_index = 4;
>> +
>> +      if (bbox.x0 < scissor->x0) {
> 
> And use scissor_left.
> 
>> +         plane[scis_index].dcdx = -1 << 8;
>> +         plane[scis_index].dcdy = 0;
>> +         plane[scis_index].c = (1-scissor->x0) << 8;
>> +         plane[scis_index].eo = 1 << 8;
>> +         scis_index++;
>> +      }
>> +      if (bbox.x1 > scissor->x1) {
>> +         plane[scis_index].dcdx = 1 << 8;
>> +         plane[scis_index].dcdy = 0;
>> +         plane[scis_index].c = (scissor->x1+1) << 8;
>> +         plane[scis_index].eo = 0 << 8;
>> +         scis_index++;
>> +      }
>> +      if (bbox.y0 < scissor->y0) {
>> +         plane[scis_index].dcdx = 0;
>> +         plane[scis_index].dcdy = 1 << 8;
>> +         plane[scis_index].c = (1-scissor->y0) << 8;
>> +         plane[scis_index].eo = 1 << 8;
>> +         scis_index++;
>> +      }
>> +      if (bbox.y1 > scissor->y1) {
>> +         plane[scis_index].dcdx = 0;
>> +         plane[scis_index].dcdy = -1 << 8;
>> +         plane[scis_index].c = (scissor->y1+1) << 8;
>> +         plane[scis_index].eo = 0;
>> +         scis_index++;
>> +      }
>> +      assert(scis_index == nr_planes);
>>      }
>>
>>      return lp_setup_bin_triangle(setup, line, &bbox, nr_planes,
>> viewport_index);
>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
>> b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
>> index 1e3a750..4e9acce 100644
>> --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
>> +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
>> @@ -302,13 +302,6 @@ do_triangle_ccw(struct lp_setup_context *setup,
>>         layer = MIN2(layer, scene->fb_max_layer);
>>      }
>>
>> -   if (setup->scissor_test) {
>> -      nr_planes = 7;
>> -   }
>> -   else {
>> -      nr_planes = 3;
>> -   }
>> -
>>      /* Bounding rectangle (in pixels) */
>>      {
>>         /* Yes this is necessary to accurately calculate bounding boxes
>> @@ -347,6 +340,20 @@ do_triangle_ccw(struct lp_setup_context *setup,
>>      bbox.x0 = MAX2(bbox.x0, 0);
>>      bbox.y0 = MAX2(bbox.y0, 0);
>>
>> +   nr_planes = 3;
>> +   /*
>> +    * Determine how many scissor planes we need, that is drop scissor
>> +    * edges if the bounding box of the tri is fully inside that edge.
>> +    */
>> +   if (setup->scissor_test) {
>> +      /* why not just use draw_regions */
>> +      struct u_rect *scissor = &setup->scissors[viewport_index];
>> +      if (bbox.x0 < scissor->x0) nr_planes++;
>> +      if (bbox.x1 > scissor->x1) nr_planes++;
>> +      if (bbox.y0 < scissor->y0) nr_planes++;
>> +      if (bbox.y1 > scissor->y1) nr_planes++;
>> +   }
>> +
>>      tri = lp_setup_alloc_triangle(scene,
>>                                    key->num_inputs,
>>                                    nr_planes,
>> @@ -367,13 +374,11 @@ do_triangle_ccw(struct lp_setup_context *setup,
>>
>>      /* Setup parameter interpolants:
>>       */
>> -   setup->setup.variant->jit_function( v0,
>> -                       v1,
>> -                       v2,
>> -                       frontfacing,
>> -                       GET_A0(&tri->inputs),
>> -                       GET_DADX(&tri->inputs),
>> -                       GET_DADY(&tri->inputs) );
>> +   setup->setup.variant->jit_function(v0, v1, v2,
>> +                                      frontfacing,
>> +                                      GET_A0(&tri->inputs),
>> +                                      GET_DADX(&tri->inputs),
>> +                                      GET_DADY(&tri->inputs));
>>
>>      tri->inputs.frontfacing = frontfacing;
>>      tri->inputs.disable = FALSE;
>> @@ -383,9 +388,9 @@ do_triangle_ccw(struct lp_setup_context *setup,
>>
>>      if (0)
>>         lp_dump_setup_coef(&setup->setup.variant->key,
>> -             (const float (*)[4])GET_A0(&tri->inputs),
>> -             (const float (*)[4])GET_DADX(&tri->inputs),
>> -             (const float (*)[4])GET_DADY(&tri->inputs));
>> +                         (const float (*)[4])GET_A0(&tri->inputs),
>> +                         (const float (*)[4])GET_DADX(&tri->inputs),
>> +                         (const float (*)[4])GET_DADY(&tri->inputs));
>>
>>      plane = GET_PLANES(tri);
>>
>> @@ -672,29 +677,44 @@ do_triangle_ccw(struct lp_setup_context *setup,
>>       * Note that otherwise, the scissor planes only vary in 'C' value,
>>       * and even then only on state-changes.  Could alternatively store
>>       * these planes elsewhere.
>> +    * (Or only store the c value together with a bit indicating which
>> +    * scissor edge this is, so rasterization would treat them
>> differently
>> +    * (easier to evaluate) to ordinary planes.)
>>       */
>> -   if (nr_planes == 7) {
>> -      const struct u_rect *scissor = &setup->scissors[viewport_index];
>> -
>> -      plane[3].dcdx = -1 << 8;
>> -      plane[3].dcdy = 0;
>> -      plane[3].c = (1-scissor->x0) << 8;
>> -      plane[3].eo = 1 << 8;
>> -
>> -      plane[4].dcdx = 1 << 8;
>> -      plane[4].dcdy = 0;
>> -      plane[4].c = (scissor->x1+1) << 8;
>> -      plane[4].eo = 0;
>> -
>> -      plane[5].dcdx = 0;
>> -      plane[5].dcdy = 1 << 8;
>> -      plane[5].c = (1-scissor->y0) << 8;
>> -      plane[5].eo = 1 << 8;
>> -
>> -      plane[6].dcdx = 0;
>> -      plane[6].dcdy = -1 << 8;
>> -      plane[6].c = (scissor->y1+1) << 8;
>> -      plane[6].eo = 0;
>> +   if (nr_planes > 3) {
>> +      /* why not just use draw_regions */
>> +      struct u_rect *scissor = &setup->scissors[viewport_index];
>> +      unsigned scis_index = 3;
>> +
>> +      if (bbox.x0 < scissor->x0) {
>> +         plane[scis_index].dcdx = -1 << 8;
>> +         plane[scis_index].dcdy = 0;
>> +         plane[scis_index].c = (1-scissor->x0) << 8;
>> +         plane[scis_index].eo = 1 << 8;
>> +         scis_index++;
>> +      }
>> +      if (bbox.x1 > scissor->x1) {
>> +         plane[scis_index].dcdx = 1 << 8;
>> +         plane[scis_index].dcdy = 0;
>> +         plane[scis_index].c = (scissor->x1+1) << 8;
>> +         plane[scis_index].eo = 0 << 8;
>> +         scis_index++;
>> +      }
>> +      if (bbox.y0 < scissor->y0) {
>> +         plane[scis_index].dcdx = 0;
>> +         plane[scis_index].dcdy = 1 << 8;
>> +         plane[scis_index].c = (1-scissor->y0) << 8;
>> +         plane[scis_index].eo = 1 << 8;
>> +         scis_index++;
>> +      }
>> +      if (bbox.y1 > scissor->y1) {
>> +         plane[scis_index].dcdx = 0;
>> +         plane[scis_index].dcdy = -1 << 8;
>> +         plane[scis_index].c = (scissor->y1+1) << 8;
>> +         plane[scis_index].eo = 0;
>> +         scis_index++;
>> +      }
>> +      assert(scis_index == nr_planes);
>>      }
>>
>>      return lp_setup_bin_triangle(setup, tri, &bbox, nr_planes,
>> viewport_index);
>>
> 
> 
> 
> Otherwise looks good.
> 
> The other patches of this series looks good too, and are
> 
> Reviwed-by: Jose Fonseca <jfonseca at vmware.com
> 



More information about the mesa-dev mailing list