[Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

Roland Scheidegger sroland at vmware.com
Wed Jun 29 15:44:10 PDT 2011


Actually I ran some numbers here and tried out a optimized struct compare:
original ipers: 12.1 fps
ajax patch: 15.5 fps
optimized struct compare: 16.8 fps


This is the function I used for that (just enabled in that lp_setup
function):

static INLINE int util_cmp_struct(const void *src1, const void *src2,
unsigned count)
{
  /* hmm pointer casting is evil */
  const uint32_t *src1_ptr = (uint32_t *)src1;
  const uint32_t *src2_ptr = (uint32_t *)src2;
  unsigned i;
  assert(count % 4 == 0);
  for (i = 0; i < count/4; i++) {
    if (*src1_ptr != *src2_ptr) {
      return 1;
    }
    src1_ptr++;
    src2_ptr++;
  }
  return 0;
}

(And no this doesn't use repz cmpsd here.)

So, unless I made some mistake, memcmp is just dead slow (*), most of
the slowness probably coming from the bytewise comparison (and
apparently I was wrong in assuming the comparison there might never be
the same for ipers).
Of course, the optimized struct compare relies on structs really being
dword aligned (I think this is always the case), and additionally it
relies on the struct size being a whole multiple of dwords - likely
struct needs padding to ensure that (at least I don't think this is
always guaranteed for all structs).
But since memcmp is used extensively (cso for one) maybe some
optimization along these lines might be worth it (though of course for
small structs the win isn't going to be as big - and can't beat the repz
cmps in code size...).

Roland

(*) I actually found some references some implementations might be
better they don't just use repz cmpsb but they split this up in parts
which do dword (or qword even - well for really large structs could use
sse2) comparisons for the parts where it's possible and only byte
comparisons for the remaining bytes (and if the compiler does that it
probably would know the size at compile time here hence could leave out
much of the code). Of course memcmp requires that the return value isn't
just a true or false value, hence there's more code needed once an
unequal dword is found, though the compiler could optimize that away too
in case it's not needed. Much the same as memcpy is optimized usually
really, so blame gcc :-).



Am 29.06.2011 20:33, schrieb Roland Scheidegger:
> Ohh that's interesting, you'd think the comparison shouldn't be that
> expensive (though I guess in ipers case the comparison is never true).
> memcmp is quite extensively used everywhere. Maybe we could replace that
> with something faster (since we only ever care if the blocks are the
> same but not care about the lexographic ordering and always compare
> whole structs, should compare dwords instead of bytes for a 4 time
> speedup)? Or isn't that the reason cmpsb instead of cmpsd is used?
> Also I guess it would help if the values which are more likely to be
> unequal are first in the struct (if we can tell that).
> Of course though if it's unlikely to be the same as the compared value
> anyway not comparing at all still might be a win (here).
> 
> Roland
> 
> Am 29.06.2011 19:19, schrieb Adam Jackson:
>> Perversely, do this by eliminating the comparison between stored and
>> current fs state.  On ipers, a perf trace showed try_update_scene_state
>> using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
>> Taking that out takes try_update_scene_state down to 6.5% of the
>> profile; more importantly, ipers goes from 10 to 14fps and gears goes
>> from 790 to 830fps.
>>
>> Signed-off-by: Adam Jackson <ajax at redhat.com>
>> ---
>>  src/gallium/drivers/llvmpipe/lp_setup.c |   61 ++++++++++++++-----------------
>>  1 files changed, 27 insertions(+), 34 deletions(-)
>>
>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c
>> index cbe06e5..9118db5 100644
>> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
>> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
>> @@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context *setup )
>>        setup->dirty |= LP_SETUP_NEW_FS;
>>     }
>>  
>> -
>>     if (setup->dirty & LP_SETUP_NEW_FS) {
>> -      if (!setup->fs.stored ||
>> -          memcmp(setup->fs.stored,
>> -                 &setup->fs.current,
>> -                 sizeof setup->fs.current) != 0)
>> -      {
>> -         struct lp_rast_state *stored;
>> -         uint i;
>> -         
>> -         /* The fs state that's been stored in the scene is different from
>> -          * the new, current state.  So allocate a new lp_rast_state object
>> -          * and append it to the bin's setup data buffer.
>> -          */
>> -         stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof *stored);
>> -         if (!stored) {
>> -            assert(!new_scene);
>> -            return FALSE;
>> -         }
>> +      struct lp_rast_state *stored;
>> +      uint i;
>> +      
>> +      /* The fs state that's been stored in the scene is different from
>> +       * the new, current state.  So allocate a new lp_rast_state object
>> +       * and append it to the bin's setup data buffer.
>> +       */
>> +      stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof *stored);
>> +      if (!stored) {
>> +         assert(!new_scene);
>> +         return FALSE;
>> +      }
>>  
>> -         memcpy(stored,
>> -                &setup->fs.current,
>> -                sizeof setup->fs.current);
>> -         setup->fs.stored = stored;
>> -         
>> -         /* The scene now references the textures in the rasterization
>> -          * state record.  Note that now.
>> -          */
>> -         for (i = 0; i < Elements(setup->fs.current_tex); i++) {
>> -            if (setup->fs.current_tex[i]) {
>> -               if (!lp_scene_add_resource_reference(scene,
>> -                                                    setup->fs.current_tex[i],
>> -                                                    new_scene)) {
>> -                  assert(!new_scene);
>> -                  return FALSE;
>> -               }
>> +      memcpy(stored,
>> +             &setup->fs.current,
>> +             sizeof setup->fs.current);
>> +      setup->fs.stored = stored;
>> +      
>> +      /* The scene now references the textures in the rasterization
>> +       * state record.  Note that now.
>> +       */
>> +      for (i = 0; i < Elements(setup->fs.current_tex); i++) {
>> +         if (setup->fs.current_tex[i]) {
>> +            if (!lp_scene_add_resource_reference(scene,
>> +                                                 setup->fs.current_tex[i],
>> +                                                 new_scene)) {
>> +               assert(!new_scene);
>> +               return FALSE;
>>              }
>>           }
>>        }
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev



More information about the mesa-dev mailing list