[Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
Corbin Simpson
mostawesomedude at gmail.com
Wed Jun 29 16:16:24 PDT 2011
Okay, so maybe I'm failing to recognize the exact situation here, but
wouldn't it be possible to mark the FS state with a serial number and
just compare those? Or are these FS states not CSO-cached?
~ C.
On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Actually I ran some numbers here and tried out a optimized struct compare:
> original ipers: 12.1 fps
> ajax patch: 15.5 fps
> optimized struct compare: 16.8 fps
>
>
> This is the function I used for that (just enabled in that lp_setup
> function):
>
> static INLINE int util_cmp_struct(const void *src1, const void *src2,
> unsigned count)
> {
> /* hmm pointer casting is evil */
> const uint32_t *src1_ptr = (uint32_t *)src1;
> const uint32_t *src2_ptr = (uint32_t *)src2;
> unsigned i;
> assert(count % 4 == 0);
> for (i = 0; i < count/4; i++) {
> if (*src1_ptr != *src2_ptr) {
> return 1;
> }
> src1_ptr++;
> src2_ptr++;
> }
> return 0;
> }
>
> (And no this doesn't use repz cmpsd here.)
>
> So, unless I made some mistake, memcmp is just dead slow (*), most of
> the slowness probably coming from the bytewise comparison (and
> apparently I was wrong in assuming the comparison there might never be
> the same for ipers).
> Of course, the optimized struct compare relies on structs really being
> dword aligned (I think this is always the case), and additionally it
> relies on the struct size being a whole multiple of dwords - likely
> struct needs padding to ensure that (at least I don't think this is
> always guaranteed for all structs).
> But since memcmp is used extensively (cso for one) maybe some
> optimization along these lines might be worth it (though of course for
> small structs the win isn't going to be as big - and can't beat the repz
> cmps in code size...).
>
> Roland
>
> (*) I actually found some references some implementations might be
> better they don't just use repz cmpsb but they split this up in parts
> which do dword (or qword even - well for really large structs could use
> sse2) comparisons for the parts where it's possible and only byte
> comparisons for the remaining bytes (and if the compiler does that it
> probably would know the size at compile time here hence could leave out
> much of the code). Of course memcmp requires that the return value isn't
> just a true or false value, hence there's more code needed once an
> unequal dword is found, though the compiler could optimize that away too
> in case it's not needed. Much the same as memcpy is optimized usually
> really, so blame gcc :-).
>
>
>
> Am 29.06.2011 20:33, schrieb Roland Scheidegger:
>> Ohh that's interesting, you'd think the comparison shouldn't be that
>> expensive (though I guess in ipers case the comparison is never true).
>> memcmp is quite extensively used everywhere. Maybe we could replace that
>> with something faster (since we only ever care if the blocks are the
>> same but not care about the lexographic ordering and always compare
>> whole structs, should compare dwords instead of bytes for a 4 time
>> speedup)? Or isn't that the reason cmpsb instead of cmpsd is used?
>> Also I guess it would help if the values which are more likely to be
>> unequal are first in the struct (if we can tell that).
>> Of course though if it's unlikely to be the same as the compared value
>> anyway not comparing at all still might be a win (here).
>>
>> Roland
>>
>> Am 29.06.2011 19:19, schrieb Adam Jackson:
>>> Perversely, do this by eliminating the comparison between stored and
>>> current fs state. On ipers, a perf trace showed try_update_scene_state
>>> using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
>>> Taking that out takes try_update_scene_state down to 6.5% of the
>>> profile; more importantly, ipers goes from 10 to 14fps and gears goes
>>> from 790 to 830fps.
>>>
>>> Signed-off-by: Adam Jackson <ajax at redhat.com>
>>> ---
>>> src/gallium/drivers/llvmpipe/lp_setup.c | 61 ++++++++++++++-----------------
>>> 1 files changed, 27 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c
>>> index cbe06e5..9118db5 100644
>>> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
>>> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
>>> @@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context *setup )
>>> setup->dirty |= LP_SETUP_NEW_FS;
>>> }
>>>
>>> -
>>> if (setup->dirty & LP_SETUP_NEW_FS) {
>>> - if (!setup->fs.stored ||
>>> - memcmp(setup->fs.stored,
>>> - &setup->fs.current,
>>> - sizeof setup->fs.current) != 0)
>>> - {
>>> - struct lp_rast_state *stored;
>>> - uint i;
>>> -
>>> - /* The fs state that's been stored in the scene is different from
>>> - * the new, current state. So allocate a new lp_rast_state object
>>> - * and append it to the bin's setup data buffer.
>>> - */
>>> - stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof *stored);
>>> - if (!stored) {
>>> - assert(!new_scene);
>>> - return FALSE;
>>> - }
>>> + struct lp_rast_state *stored;
>>> + uint i;
>>> +
>>> + /* The fs state that's been stored in the scene is different from
>>> + * the new, current state. So allocate a new lp_rast_state object
>>> + * and append it to the bin's setup data buffer.
>>> + */
>>> + stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof *stored);
>>> + if (!stored) {
>>> + assert(!new_scene);
>>> + return FALSE;
>>> + }
>>>
>>> - memcpy(stored,
>>> - &setup->fs.current,
>>> - sizeof setup->fs.current);
>>> - setup->fs.stored = stored;
>>> -
>>> - /* The scene now references the textures in the rasterization
>>> - * state record. Note that now.
>>> - */
>>> - for (i = 0; i < Elements(setup->fs.current_tex); i++) {
>>> - if (setup->fs.current_tex[i]) {
>>> - if (!lp_scene_add_resource_reference(scene,
>>> - setup->fs.current_tex[i],
>>> - new_scene)) {
>>> - assert(!new_scene);
>>> - return FALSE;
>>> - }
>>> + memcpy(stored,
>>> + &setup->fs.current,
>>> + sizeof setup->fs.current);
>>> + setup->fs.stored = stored;
>>> +
>>> + /* The scene now references the textures in the rasterization
>>> + * state record. Note that now.
>>> + */
>>> + for (i = 0; i < Elements(setup->fs.current_tex); i++) {
>>> + if (setup->fs.current_tex[i]) {
>>> + if (!lp_scene_add_resource_reference(scene,
>>> + setup->fs.current_tex[i],
>>> + new_scene)) {
>>> + assert(!new_scene);
>>> + return FALSE;
>>> }
>>> }
>>> }
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
--
When the facts change, I change my mind. What do you do, sir? ~ Keynes
Corbin Simpson
<MostAwesomeDude at gmail.com>
More information about the mesa-dev
mailing list