[Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

Corbin Simpson mostawesomedude at gmail.com
Wed Jun 29 16:16:24 PDT 2011


Okay, so maybe I'm failing to recognize the exact situation here, but
wouldn't it be possible to mark the FS state with a serial number and
just compare those? Or are these FS states not CSO-cached?

~ C.

On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Actually I ran some numbers here and tried out a optimized struct compare:
> original ipers: 12.1 fps
> ajax patch: 15.5 fps
> optimized struct compare: 16.8 fps
>
>
> This is the function I used for that (just enabled in that lp_setup
> function):
>
> static INLINE int util_cmp_struct(const void *src1, const void *src2,
> unsigned count)
> {
>  /* hmm pointer casting is evil */
>  const uint32_t *src1_ptr = (uint32_t *)src1;
>  const uint32_t *src2_ptr = (uint32_t *)src2;
>  unsigned i;
>  assert(count % 4 == 0);
>  for (i = 0; i < count/4; i++) {
>    if (*src1_ptr != *src2_ptr) {
>      return 1;
>    }
>    src1_ptr++;
>    src2_ptr++;
>  }
>  return 0;
> }
>
> (And no this doesn't use repz cmpsd here.)
>
> So, unless I made some mistake, memcmp is just dead slow (*), most of
> the slowness probably coming from the bytewise comparison (and
> apparently I was wrong in assuming the comparison there might never be
> the same for ipers).
> Of course, the optimized struct compare relies on structs really being
> dword aligned (I think this is always the case), and additionally it
> relies on the struct size being a whole multiple of dwords - likely
> struct needs padding to ensure that (at least I don't think this is
> always guaranteed for all structs).
> But since memcmp is used extensively (cso for one) maybe some
> optimization along these lines might be worth it (though of course for
> small structs the win isn't going to be as big - and can't beat the repz
> cmps in code size...).
>
> Roland
>
> (*) I actually found some references some implementations might be
> better they don't just use repz cmpsb but they split this up in parts
> which do dword (or qword even - well for really large structs could use
> sse2) comparisons for the parts where it's possible and only byte
> comparisons for the remaining bytes (and if the compiler does that it
> probably would know the size at compile time here hence could leave out
> much of the code). Of course memcmp requires that the return value isn't
> just a true or false value, hence there's more code needed once an
> unequal dword is found, though the compiler could optimize that away too
> in case it's not needed. Much the same as memcpy is optimized usually
> really, so blame gcc :-).
>
>
>
> Am 29.06.2011 20:33, schrieb Roland Scheidegger:
>> Ohh that's interesting, you'd think the comparison shouldn't be that
>> expensive (though I guess in ipers case the comparison is never true).
>> memcmp is quite extensively used everywhere. Maybe we could replace that
>> with something faster (since we only ever care if the blocks are the
>> same but not care about the lexographic ordering and always compare
>> whole structs, should compare dwords instead of bytes for a 4 time
>> speedup)? Or isn't that the reason cmpsb instead of cmpsd is used?
>> Also I guess it would help if the values which are more likely to be
>> unequal are first in the struct (if we can tell that).
>> Of course though if it's unlikely to be the same as the compared value
>> anyway not comparing at all still might be a win (here).
>>
>> Roland
>>
>> Am 29.06.2011 19:19, schrieb Adam Jackson:
>>> Perversely, do this by eliminating the comparison between stored and
>>> current fs state.  On ipers, a perf trace showed try_update_scene_state
>>> using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
>>> Taking that out takes try_update_scene_state down to 6.5% of the
>>> profile; more importantly, ipers goes from 10 to 14fps and gears goes
>>> from 790 to 830fps.
>>>
>>> Signed-off-by: Adam Jackson <ajax at redhat.com>
>>> ---
>>>  src/gallium/drivers/llvmpipe/lp_setup.c |   61 ++++++++++++++-----------------
>>>  1 files changed, 27 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c
>>> index cbe06e5..9118db5 100644
>>> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
>>> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
>>> @@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context *setup )
>>>        setup->dirty |= LP_SETUP_NEW_FS;
>>>     }
>>>
>>> -
>>>     if (setup->dirty & LP_SETUP_NEW_FS) {
>>> -      if (!setup->fs.stored ||
>>> -          memcmp(setup->fs.stored,
>>> -                 &setup->fs.current,
>>> -                 sizeof setup->fs.current) != 0)
>>> -      {
>>> -         struct lp_rast_state *stored;
>>> -         uint i;
>>> -
>>> -         /* The fs state that's been stored in the scene is different from
>>> -          * the new, current state.  So allocate a new lp_rast_state object
>>> -          * and append it to the bin's setup data buffer.
>>> -          */
>>> -         stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof *stored);
>>> -         if (!stored) {
>>> -            assert(!new_scene);
>>> -            return FALSE;
>>> -         }
>>> +      struct lp_rast_state *stored;
>>> +      uint i;
>>> +
>>> +      /* The fs state that's been stored in the scene is different from
>>> +       * the new, current state.  So allocate a new lp_rast_state object
>>> +       * and append it to the bin's setup data buffer.
>>> +       */
>>> +      stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof *stored);
>>> +      if (!stored) {
>>> +         assert(!new_scene);
>>> +         return FALSE;
>>> +      }
>>>
>>> -         memcpy(stored,
>>> -                &setup->fs.current,
>>> -                sizeof setup->fs.current);
>>> -         setup->fs.stored = stored;
>>> -
>>> -         /* The scene now references the textures in the rasterization
>>> -          * state record.  Note that now.
>>> -          */
>>> -         for (i = 0; i < Elements(setup->fs.current_tex); i++) {
>>> -            if (setup->fs.current_tex[i]) {
>>> -               if (!lp_scene_add_resource_reference(scene,
>>> -                                                    setup->fs.current_tex[i],
>>> -                                                    new_scene)) {
>>> -                  assert(!new_scene);
>>> -                  return FALSE;
>>> -               }
>>> +      memcpy(stored,
>>> +             &setup->fs.current,
>>> +             sizeof setup->fs.current);
>>> +      setup->fs.stored = stored;
>>> +
>>> +      /* The scene now references the textures in the rasterization
>>> +       * state record.  Note that now.
>>> +       */
>>> +      for (i = 0; i < Elements(setup->fs.current_tex); i++) {
>>> +         if (setup->fs.current_tex[i]) {
>>> +            if (!lp_scene_add_resource_reference(scene,
>>> +                                                 setup->fs.current_tex[i],
>>> +                                                 new_scene)) {
>>> +               assert(!new_scene);
>>> +               return FALSE;
>>>              }
>>>           }
>>>        }
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>



-- 
When the facts change, I change my mind. What do you do, sir? ~ Keynes

Corbin Simpson
<MostAwesomeDude at gmail.com>


More information about the mesa-dev mailing list