[Intel-gfx] [PATCH v4] string-choice: add yesno(), onoff(), enableddisabled(), plural() helpers

Rasmus Villemoes linux at rasmusvillemoes.dk
Thu Oct 24 07:40:45 UTC 2019


On 24/10/2019 00.56, Andrew Morton wrote:
> On Wed, 23 Oct 2019 16:13:08 +0300 Jani Nikula <jani.nikula at intel.com> wrote:
> 
>> +
>> +static inline const char *yesno(bool v)
>> +{
>> +	return v ? "yes" : "no";
>> +}
>> +
>> +static inline const char *onoff(bool v)
>> +{
>> +	return v ? "on" : "off";
>> +}
>> +
>> +static inline const char *enableddisabled(bool v)
>> +{
>> +	return v ? "enabled" : "disabled";
>> +}
>> +
>> +static inline const char *plural(long v)
>> +{
>> +	return v == 1 ? "" : "s";
>> +}
>> +
>> +#endif /* __STRING_CHOICE_H__ */
> 
> These aren't very good function names.  Better to create a kernel-style
> namespace such as "choice_" and then add the expected underscores:
> 
> choice_yes_no()
> choice_enabled_disabled()
> choice_plural()

I think I prefer the short names (no choice_ prefix), it's rather
obvious what they do. I also asked for underscores, especially for the
enableddisabled case, but Jani didn't want to change existing callers.
But I'll keep out of the naming discussion from now on.
> Also, I worry that making these functions inline means that each .o
> file will contain its own copy of the strings 

They will, in .rodata.str1.1

("yes", "no", "enabled",
> etc) if the .c file calls the relevant helper.  I'm not sure if the
> linker is smart enough (yet) to fix this up. 

AFAIK, that's an optimization the linker has done forever - the whole
reason the SHF_MERGE | SHF_STRINGS (the MS in readelf -S output) flags
exist (and AFAICT they have been part of the ELF spec since forever) is
so the linker can do that trick. So no, do not make them ool.

> And doing this will cause additional savings: calling a single-arg
> out-of-line function generates less .text than calling yesno().  When I
> did this: 
> 
> --- a/include/linux/string-choice.h~string-choice-add-yesno-onoff-enableddisabled-plural-helpers-fix
> +++ a/include/linux/string-choice.h
> @@ -8,10 +8,7 @@
>  
>  #include <linux/types.h>
>  
> -static inline const char *yesno(bool v)
> -{
> -	return v ? "yes" : "no";
> -}
> +const char *yesno(bool v);
>  
>  static inline const char *onoff(bool v)
>  {
> 
> The text segment of drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.o
> (78 callsites) shrunk by 118 bytes.
> 

Interesting, and not at all what I see. Mind dumping rss_config_show
before/after? Even better, cxgb4_debugfs.s before/after. Here's what I get:

static inline yesno:

     cbb:       49 c7 c4 00 00 00 00    mov    $0x0,%r12        cbe:
R_X86_64_32S       .rodata.str1.1+0x197
     cc2:       44 89 ea                mov    %r13d,%edx
     cc5:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi        cc8:
R_X86_64_32S       .rodata.str1.1+0x1b7
     ccc:       48 c7 c5 00 00 00 00    mov    $0x0,%rbp        ccf:
R_X86_64_32S       .rodata.str1.1+0x19b

Load "yes" into %12 and "no" into %rbp (or vice versa).

     cd3:       4d 89 e7                mov    %r12,%r15
     cd6:       e8 00 00 00 00          callq  cdb
<rss_config_show+0x3b>       cd7: R_X86_64_PC32      seq_printf-0x4
     cdb:       45 85 ed                test   %r13d,%r13d
     cde:       4c 89 e2                mov    %r12,%rdx
     ce1:       48 89 df                mov    %rbx,%rdi
     ce4:       48 0f 49 d5             cmovns %rbp,%rdx
     ce8:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi        ceb:
R_X86_64_32S       .rodata.str1.1+0x1cb
     cef:       e8 00 00 00 00          callq  cf4
<rss_config_show+0x54>       cf0: R_X86_64_PC32      seq_printf-0x4
     cf4:       41 f7 c5 00 00 00 40    test   $0x40000000,%r13d
     cfb:       4c 89 e2                mov    %r12,%rdx
     cfe:       48 89 df                mov    %rbx,%rdi
     d01:       48 0f 44 d5             cmove  %rbp,%rdx
     d05:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi        d08:
R_X86_64_32S       .rodata.str1.1+0x1e1
     d0c:       e8 00 00 00 00          callq  d11
<rss_config_show+0x71>       d0d: R_X86_64_PC32      seq_printf-0x4

Test a bit, move "yes" into %rdx, conditionally move "no" into %rdx
instead, call seq_printf.

     d11:       41 f7 c5 00 00 00 20    test   $0x20000000,%r13d
     d18:       4c 89 e2                mov    %r12,%rdx
     d1b:       48 89 df                mov    %rbx,%rdi
     d1e:       48 0f 44 d5             cmove  %rbp,%rdx
     d22:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi        d25:
R_X86_64_32S       .rodata.str1.1+0x1f7
     d29:       e8 00 00 00 00          callq  d2e
<rss_config_show+0x8e>       d2a: R_X86_64_PC32      seq_printf-0x4

etc. That's a marginal (i.e., after the preamble storing "yes" and "no"
in callee-saved registers) cost of six instructions/29 bytes per
seq_printf, three of which are to implement the yesno() call.

extern const char *yesno():

   64e7:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi        64ea:
R_X86_64_32S      .rodata.str1.1+0x8e4
    64ee:       89 ea                   mov    %ebp,%edx
    64f0:       41 89 ed                mov    %ebp,%r13d
    64f3:       e8 00 00 00 00          callq  64f8
<rss_config_show+0x28>      64f4: R_X86_64_PC32     seq_printf-0x4
    64f8:       89 ef                   mov    %ebp,%edi
    64fa:       c1 ef 1f                shr    $0x1f,%edi
    64fd:       e8 00 00 00 00          callq  6502
<rss_config_show+0x32>      64fe: R_X86_64_PC32     yesno-0x4

Three instructions to prepare the argument to yesno and call it.

    6502:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi        6505:
R_X86_64_32S      .rodata.str1.1+0x8f8
    6509:       48 89 c2                mov    %rax,%rdx

One more to put the return from yesno in the right register.

    650c:       48 89 df                mov    %rbx,%rdi
    650f:       e8 00 00 00 00          callq  6514
<rss_config_show+0x44>      6510: R_X86_64_PC32     seq_printf-0x4

So not a lot, but still one more instruction, for a total of 31 bytes.
bloat-o-meter says

$ scripts/bloat-o-meter
drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.o.{1,2}
add/remove: 0/0 grow/shrink: 3/0 up/down: 301/0 (301)
Function                                     old     new   delta
rss_config_show                             2343    2482    +139
rss_vf_config_show                           263     363    +100
rss_pf_config_show                           342     404     +62

which is more then 2*78, but I haven't looked at the code patterns in
the other functions.

Did you use size(1) to compare when you say "text segment"? That would
include .rodata (and, more importantly, .rodata.strX.Y) in its text
column. Maybe your compiler doesn't do string literal merging (since the
linker does it anyway), so your .rodata.str1.1 might contain several
copies of "yes" and "no", but they shouldn't really be counted.

Rasmus


More information about the Intel-gfx mailing list