[Nouveau] [RFC] drm/nouveau: optimize code emission of inline functions

Younes Manton younes.m at gmail.com
Mon Aug 10 17:14:16 PDT 2009


On Mon, Aug 10, 2009 at 1:40 PM, Pekka Paalanen<pq at iki.fi> wrote:
> Before this patch:
>
> $ objdump -t nouveau.ko --section=.text | cut -f2 | sort -k2 | uniq -d -c
>      4
>      9 0000000000000010 BEGIN_RING
>      5 0000000000000051 FIRE_RING
>      2 00000000000000b3 NVLockVgaCrtcs
>      4 000000000000008b NVReadVgaCrtc
>      2 000000000000008c NVReadVgaCrtc
>      2 0000000000000011 NVVgaSeqReset
>      2 000000000000006b NVWriteCRTC
>      2 0000000000000066 NVWriteRAMDAC
>      4 0000000000000081 NVWriteVgaCrtc
>      3 0000000000000082 NVWriteVgaCrtc
>     11 000000000000001a OUT_RING
>      9 0000000000000028 RING_SPACE
>      2 0000000000000019 crtc_wr_cio_state
>      3 0000000000000012 drm_gem_object_unreference
>      2 0000000000000005 kmalloc
>      3 000000000000000b kzalloc
>      4 0000000000000051 nouveau_bo_ref
>      2 0000000000000050 nvReadMC
>      2 0000000000000052 nvWriteMC
>      3 0000000000000029 nv_gf4_disp_arch
>      4 000000000000001b nv_rd08
>      3 000000000000001c nv_rd08
>     29 0000000000000012 nv_rd32
>      2 0000000000000012 nv_ri32
>      5 000000000000001c nv_ro32
>      4 000000000000008b nv_two_heads
>     11 0000000000000022 nv_wo32
>      8 0000000000000015 nv_wr08
>     29 0000000000000014 nv_wr32
>      2 0000000000000013 pci_read_config_dword
>
> After this patch:
>
> $ objdump -t nouveau.ko --section=.text | cut -f2 | sort -k2 | uniq -d -c
>      4
>      9 0000000000000010 BEGIN_RING
>      5 0000000000000051 FIRE_RING
>      2 00000000000000b3 NVLockVgaCrtcs
>      5 00000000000000a7 NVReadVgaCrtc
>      2 0000000000000011 NVVgaSeqReset
>      2 0000000000000073 NVWriteCRTC
>      3 0000000000000072 NVWriteRAMDAC
>      4 0000000000000091 NVWriteVgaCrtc
>      3 0000000000000092 NVWriteVgaCrtc
>     11 000000000000001a OUT_RING
>      9 0000000000000028 RING_SPACE
>      2 0000000000000019 crtc_wr_cio_state
>      3 0000000000000012 drm_gem_object_unreference
>      2 0000000000000005 kmalloc
>      3 000000000000000b kzalloc
>      3 0000000000000051 nouveau_bo_ref
>      2 0000000000000052 nouveau_bo_ref
>      3 000000000000005d nvReadMC
>      2 000000000000005c nvWriteMC
>      3 0000000000000029 nv_gf4_disp_arch
>      4 000000000000008b nv_two_heads
>      2 0000000000000013 pci_read_config_dword
>
> As you can see, the static inline functions changed to extern
> inline functions no longer appear many times in the final kernel
> module. But, at the same time nouveau.ko file size
> before: 583683 B (.text size 0x000312c8)
> after:  681075 B (.text size 0x00039474)
> That's .text size increase by 32 kB.
>
> So something is definitely inlined a lot more. This was tested on
> x86_64, gcc 4.1.2, CONFIG_OPTIMIZE_INLINING=y,
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
>
> Now, I'm not sure if this patch would be a good thing or not.
> Comments?

Well if the goal is a small module then I guess it's not a good idea,
but then we should be disabling some other optimizations that
excessively bloat the module. I don't think it's a bad idea, but I'd
be curious where all the extra text comes from. I'm guessing more
inlining and/or loop unrolling.


More information about the Nouveau mailing list