[Mesa-dev] [PATCH] mesa: Remove the ralloc canary on release builds.

Sat Nov 23 15:40:37 PST 2013

On 11/22/2013 10:30 AM, Eric Anholt wrote:
> Kenneth Graunke <kenneth at whitecape.org> writes:
>
>> On 11/22/2013 12:21 AM, Eric Anholt wrote:
>>> The canary is basically just to give a better debugging message when you
>>> ralloc_free() something that wasn't rallocated.  Reduces maximum memory
>>> usage of apitrace replay of the dota2 demo by 60MB on my 64-bit system (so
>>> half that on a real 32-bit dota2 environment).
>>
>> Really, half?  It's an unsigned...that's 4 bytes regardless of 64-bit
>> vs. 32-bit.  I think this should be 60MB of savings, end of story.
>
> Scalar types get aligned to their size, so since it's followed by a
> pointer, there's 4 bytes of pad in between.
>
> For anyone that hasn't seen this tool before, check out pahole from the
> dwarves package.  Run it on a .o file you think might be sucking up a
> bunch of memory, and see your structs like:
>
> class fs_inst : public backend_instruction {
> public:
>
>          /* class backend_instruction <ancestor>; */      /*     0    32 */
>
>          /* XXX last struct has 7 bytes of padding */
>
>          class fs_reg              dst;                   /*    32    48 */
>          /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
>          class fs_reg              src[3];                /*    80   144 */
>          /* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
>          bool                       saturate;             /*   224     1 */
>
>          /* XXX 3 bytes hole, try to pack */
>
>          int                        conditional_mod;      /*   228     4 */
>          uint8_t                    flag_subreg;          /*   232     1 */
>
>          /* XXX 3 bytes hole, try to pack */
>
>          int                        mlen;                 /*   236     4 */
>          int                        regs_written;         /*   240     4 */
>          int                        base_mrf;             /*   244     4 */
>          uint32_t                   texture_offset;       /*   248     4 */
>          int                        sampler;              /*   252     4 */
>          /* --- cacheline 4 boundary (256 bytes) --- */
>          int                        target;               /*   256     4 */
>          bool                       eot;                  /*   260     1 */
>          bool                       header_present;       /*   261     1 */
>          bool                       shadow_compare;       /*   262     1 */
>          bool                       force_uncompressed;   /*   263     1 */
>          bool                       force_sechalf;        /*   264     1 */
>          bool                       force_writemask_all;  /*   265     1 */
>
> ...
>
>          /* size: 288, cachelines: 5, members: 21 */
>          /* sum members: 280, holes: 3, sum holes: 8 */
>          /* paddings: 1, sum paddings: 7 */
>          /* last cacheline: 32 bytes */
> };

Getting a bit OT, but I'm sure some mesa structs could be compacted 
quite a bit.  In gl_texture_image, for example, a number of the fields 
could be reduced to GLubyte (like Face, Level, Border, NumSamples, etc) 
and rearranged to reduce the memory used for such objects.

We could potentially reduce gl_texture_image from 80 bytes to 44 bytes 
which would save 324 bytes for a 256x256 mipmapped texture.  It would 
start to add up with a thousand textures or so.

There might be some debate about how worthwhile that is.  I'm not too 
concerned right now.

However, pahole says gl_debug_state is fairly huge: 292712 bytes! 
sizeof(gl_context) = 384208 so that's a big piece.  At the very least, 
maybe gl_debug_state could be pulled out and allocated on first use...

-Brian