[PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline

Tiezhu Yang yangtiezhu at loongson.cn
Fri Dec 20 05:02:18 UTC 2024


On 12/19/2024 03:22 AM, Alex Deucher wrote:
> On Wed, Dec 18, 2024 at 2:18 PM Josh Poimboeuf <jpoimboe at kernel.org> wrote:
>>
>> On Wed, Dec 18, 2024 at 10:36:00PM +0800, Huacai Chen wrote:
>>> Hi, Tiezhu,
>>>
>>> On Tue, Dec 17, 2024 at 9:50 AM Tiezhu Yang <yangtiezhu at loongson.cn> wrote:
>>>>
>>>> When compiling with Clang on LoongArch, there exists the following objtool
>>>> warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o:
>>>>
>>>>   dc_fixpt_recip() falls through to next function dc_fixpt_sinc()
>>>>
>>>> This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip()
>>>> by Clang, given dc_fixpt_from_fraction() is not a simple function, just
>>>> mark it noinline to avoid the above issue.
>>> I don't know whether drm maintainers can accept this, because it looks
>>> like a workaround. Yes, uninline this function "solve" a problem and
>>> seems reasonable in this case because the function is "not simple",
>>> but from another point of view, you may hide a type of bug.
>>
>> Agreed, it sounds like there's definitely a bug which this patch is
>> papering over.
>
> Yes, agreed.

Additional Info:

In order to avoid the effect of this series, I tested with kernel
6.13-rc3 without this series again.

--------------------------------------------------------------------------
Here are the test result.

1. For LoongArch

"dc_fixpt_recip() falls through to next function dc_fixpt_sinc()"
objtool warning only appears compiled with the latest mainline LLVM,
there is no this issue compiled with the latest release version such
as LLVM 19.1.6.

(1) objdump info with LLVM release version 19.1.6:

$ clang --version | head -1
clang version 19.1.6

There is an jump instruction "b" at the end of dc_fixpt_recip(), it
maybe jumps to a position and then steps to the return instruction, so
there is no "falls through" objtool warning.

0000000000000200 <dc_fixpt_recip>:
      200:       40009480        beqz            $a0, 148        # 294 
<dc_fixpt_recip+0x94>
      204:       0049fc85        srai.d          $a1, $a0, 0x3f
      208:       00159486        xor             $a2, $a0, $a1
      20c:       001194c5        sub.d           $a1, $a2, $a1
      210:       03800007        ori             $a3, $zero, 0x0
      214:       16000027        lu32i.d         $a3, 1
      218:       002314e6        div.du          $a2, $a3, $a1
      21c:       001d94c8        mul.d           $a4, $a2, $a1
      220:       0011a0e9        sub.d           $a5, $a3, $a4
      224:       02bf8008        addi.w          $a4, $zero, -32
      228:       03400000        andi            $zero, $zero, 0x0
      22c:       03400000        andi            $zero, $zero, 0x0
      230:       00410529        slli.d          $a5, $a5, 0x1
      234:       004104c6        slli.d          $a2, $a2, 0x1
      238:       0012952a        sltu            $a6, $a5, $a1
      23c:       03c0054a        xori            $a6, $a6, 0x1
      240:       001328ab        maskeqz         $a7, $a1, $a6
      244:       0011ad29        sub.d           $a5, $a5, $a7
      248:       00df0108        bstrpick.d      $a4, $a4, 0x1f, 0x0
      24c:       02c00508        addi.d          $a4, $a4, 1
      250:       00149d0b        and             $a7, $a4, $a3
      254:       001528c6        or              $a2, $a2, $a6
      258:       43ffd97f        beqz            $a7, -40        # 230 
<dc_fixpt_recip+0x30>
      25c:       00410527        slli.d          $a3, $a5, 0x1
      260:       001294e5        sltu            $a1, $a3, $a1
      264:       03c004a5        xori            $a1, $a1, 0x1
      268:       02bffc07        addi.w          $a3, $zero, -1
      26c:       031ffce7        lu52i.d         $a3, $a3, 2047
      270:       00159ca7        xor             $a3, $a1, $a3
      274:       680030e6        bltu            $a3, $a2, 48    # 2a4 
<dc_fixpt_recip+0xa4>
      278:       001094c5        add.d           $a1, $a2, $a1
      27c:       00119406        sub.d           $a2, $zero, $a1
      280:       02000084        slti            $a0, $a0, 0
      284:       001390a5        masknez         $a1, $a1, $a0
      288:       001310c4        maskeqz         $a0, $a2, $a0
      28c:       00151484        or              $a0, $a0, $a1
      290:       4c000020        jirl            $zero, $ra, 0
      294:       00150005        or              $a1, $zero, $zero
      298:       002a0001        break           0x1
      29c:       002a0001        break           0x1
      2a0:       53ff73ff        b               -144    # 210 
<dc_fixpt_recip+0x10>
      2a4:       002a0001        break           0x1
      2a8:       53ffd3ff        b               -48     # 278 
<dc_fixpt_recip+0x78>
      2ac:       03400000        andi            $zero, $zero, 0x0
      2b0:       03400000        andi            $zero, $zero, 0x0
      2b4:       03400000        andi            $zero, $zero, 0x0
      2b8:       03400000        andi            $zero, $zero, 0x0
      2bc:       03400000        andi            $zero, $zero, 0x0

00000000000002c0 <dc_fixpt_sinc>:
      2c0:       0049fc85        srai.d          $a1, $a0, 0x3f
      2c4:       00159486        xor             $a2, $a0, $a1
      2c8:       1490fda7        lu12i.w         $a3, 296941
      2cc:       039444e7        ori             $a3, $a3, 0x511
      2d0:       001194c6        sub.d           $a2, $a2, $a1
      2d4:       001500e5        or              $a1, $a3, $zero
      2d8:       160000c5        lu32i.d         $a1, 6
      2dc:       001500c9        or              $a5, $a2, $zero
      2e0:       00150088        or              $a4, $a0, $zero
      2e4:       6000a8c5        blt             $a2, $a1, 168   # 38c 
<dc_fixpt_sinc+0xcc>

(2) objdump info with LLVM mainline version 20.0.0git:

$ clang --version | head -1
clang version 20.0.0git (https://github.com/llvm/llvm-project.git 
8daf4f16fa08b5d876e98108721dd1743a360326)

There is "break" instruction at the end of dc_fixpt_recip(), its offset
is "2a0", this "break" instruction is not dead end due to the offset
"2a4" is in the relocation section '.rela.discard.reachable', that is
to say, dc_fixpt_recip() doesn't end with a return instruction or an
unconditional jump, so objtool determined that the function can fall
through into the next function, thus there is "falls through" objtool
warning.

0000000000000200 <dc_fixpt_recip>:
      200:       40009c80        beqz            $a0, 156        # 29c 
<dc_fixpt_recip+0x9c>
      204:       0049fc85        srai.d          $a1, $a0, 0x3f
      208:       00159486        xor             $a2, $a0, $a1
      20c:       001194c5        sub.d           $a1, $a2, $a1
      210:       03800007        ori             $a3, $zero, 0x0
      214:       16000027        lu32i.d         $a3, 1
      218:       002314e6        div.du          $a2, $a3, $a1
      21c:       001d94c8        mul.d           $a4, $a2, $a1
      220:       0011a0e9        sub.d           $a5, $a3, $a4
      224:       02bf8008        addi.w          $a4, $zero, -32
      228:       03400000        andi            $zero, $zero, 0x0
      22c:       03400000        andi            $zero, $zero, 0x0
      230:       00410529        slli.d          $a5, $a5, 0x1
      234:       004104c6        slli.d          $a2, $a2, 0x1
      238:       0012952a        sltu            $a6, $a5, $a1
      23c:       03c0054a        xori            $a6, $a6, 0x1
      240:       001328ab        maskeqz         $a7, $a1, $a6
      244:       0011ad29        sub.d           $a5, $a5, $a7
      248:       00df0108        bstrpick.d      $a4, $a4, 0x1f, 0x0
      24c:       02c00508        addi.d          $a4, $a4, 1
      250:       00149d0b        and             $a7, $a4, $a3
      254:       001528c6        or              $a2, $a2, $a6
      258:       43ffd97f        beqz            $a7, -40        # 230 
<dc_fixpt_recip+0x30>
      25c:       00410527        slli.d          $a3, $a5, 0x1
      260:       001294e5        sltu            $a1, $a3, $a1
      264:       03c004a5        xori            $a1, $a1, 0x1
      268:       02bffc07        addi.w          $a3, $zero, -1
      26c:       031ffce7        lu52i.d         $a3, $a3, 2047
      270:       00159ca7        xor             $a3, $a1, $a3
      274:       680020e6        bltu            $a3, $a2, 32    # 294 
<dc_fixpt_recip+0x94>
      278:       001094c5        add.d           $a1, $a2, $a1
      27c:       00119406        sub.d           $a2, $zero, $a1
      280:       02000084        slti            $a0, $a0, 0
      284:       001390a5        masknez         $a1, $a1, $a0
      288:       001310c4        maskeqz         $a0, $a2, $a0
      28c:       00151484        or              $a0, $a0, $a1
      290:       4c000020        jirl            $zero, $ra, 0
      294:       002a0001        break           0x1
      298:       53ffe3ff        b               -32     # 278 
<dc_fixpt_recip+0x78>
      29c:       002a0001        break           0x1
      2a0:       002a0001        break           0x1
      2a4:       03400000        andi            $zero, $zero, 0x0
      2a8:       03400000        andi            $zero, $zero, 0x0
      2ac:       03400000        andi            $zero, $zero, 0x0
      2b0:       03400000        andi            $zero, $zero, 0x0
      2b4:       03400000        andi            $zero, $zero, 0x0
      2b8:       03400000        andi            $zero, $zero, 0x0
      2bc:       03400000        andi            $zero, $zero, 0x0

00000000000002c0 <dc_fixpt_sinc>:
      2c0:       0049fc85        srai.d          $a1, $a0, 0x3f
      2c4:       00159486        xor             $a2, $a0, $a1
      2c8:       001194c6        sub.d           $a2, $a2, $a1
      2cc:       1490fda5        lu12i.w         $a1, 296941
      2d0:       039444a5        ori             $a1, $a1, 0x511
      2d4:       160000c5        lu32i.d         $a1, 6
      2d8:       001500c7        or              $a3, $a2, $zero
      2dc:       00150088        or              $a4, $a0, $zero
      2e0:       600090c5        blt             $a2, $a1, 144   # 370 
<dc_fixpt_sinc+0xb0>

2. For x86

I tested with LLVM 19.1.6 and the latest mainline LLVM, the test result
is same with LoongArch.

(1) objdump info with LLVM release version 19.1.6:

$ clang --version | head -1
clang version 19.1.6

There is an jump instruction "jmp" at the end of dc_fixpt_recip(), it
maybe jumps to a position and then steps to the return instruction, so
there is no "falls through" objtool warning.

0000000000000290 <dc_fixpt_recip>:
      290:       f3 0f 1e fa             endbr64
      294:       e8 00 00 00 00          call   299 <dc_fixpt_recip+0x9>
      299:       48 85 ff                test   %rdi,%rdi
      29c:       0f 84 a8 00 00 00       je     34a <dc_fixpt_recip+0xba>
      2a2:       48 89 f9                mov    %rdi,%rcx
      2a5:       48 f7 d9                neg    %rcx
      2a8:       48 0f 48 cf             cmovs  %rdi,%rcx
      2ac:       53                      push   %rbx
      2ad:       48 b8 00 00 00 00 01    movabs $0x100000000,%rax
      2b4:       00 00 00
      2b7:       31 d2                   xor    %edx,%edx
      2b9:       48 f7 f1                div    %rcx
      2bc:       be e0 ff ff ff          mov    $0xffffffe0,%esi
      2c1:       eb 1b                   jmp    2de <dc_fixpt_recip+0x4e>
      2c3:       45 88 c8                mov    %r9b,%r8b
      2c6:       4d 01 c0                add    %r8,%r8
      2c9:       4d 8d 04 80             lea    (%r8,%rax,4),%r8
      2cd:       48 29 da                sub    %rbx,%rdx
      2d0:       45 88 da                mov    %r11b,%r10b
      2d3:       4c 89 d0                mov    %r10,%rax
      2d6:       4c 09 c0                or     %r8,%rax
      2d9:       83 c6 02                add    $0x2,%esi
      2dc:       74 31                   je     30f <dc_fixpt_recip+0x7f>
      2de:       48 01 d2                add    %rdx,%rdx
      2e1:       45 31 c0                xor    %r8d,%r8d
      2e4:       41 ba 00 00 00 00       mov    $0x0,%r10d
      2ea:       48 39 ca                cmp    %rcx,%rdx
      2ed:       41 0f 93 c1             setae  %r9b
      2f1:       72 03                   jb     2f6 <dc_fixpt_recip+0x66>
      2f3:       49 89 ca                mov    %rcx,%r10
      2f6:       4c 29 d2                sub    %r10,%rdx
      2f9:       48 01 d2                add    %rdx,%rdx
      2fc:       45 31 d2                xor    %r10d,%r10d
      2ff:       48 89 cb                mov    %rcx,%rbx
      302:       48 39 ca                cmp    %rcx,%rdx
      305:       41 0f 93 c3             setae  %r11b
      309:       73 b8                   jae    2c3 <dc_fixpt_recip+0x33>
      30b:       31 db                   xor    %ebx,%ebx
      30d:       eb b4                   jmp    2c3 <dc_fixpt_recip+0x33>
      30f:       48 01 d2                add    %rdx,%rdx
      312:       48 be fe ff ff ff ff    movabs $0x7ffffffffffffffe,%rsi
      319:       ff ff 7f
      31c:       4c 8d 46 01             lea    0x1(%rsi),%r8
      320:       48 39 ca                cmp    %rcx,%rdx
      323:       4c 0f 43 c6             cmovae %rsi,%r8
      327:       4c 39 c0                cmp    %r8,%rax
      32a:       77 29                   ja     355 <dc_fixpt_recip+0xc5>
      32c:       48 39 ca                cmp    %rcx,%rdx
      32f:       48 83 d8 ff             sbb    $0xffffffffffffffff,%rax
      333:       48 89 c1                mov    %rax,%rcx
      336:       48 f7 d9                neg    %rcx
      339:       48 85 ff                test   %rdi,%rdi
      33c:       48 0f 49 c8             cmovns %rax,%rcx
      340:       48 89 c8                mov    %rcx,%rax
      343:       5b                      pop    %rbx
      344:       2e e9 00 00 00 00       cs jmp 34a <dc_fixpt_recip+0xba>
      34a:       0f 0b                   ud2
      34c:       0f 0b                   ud2
      34e:       31 c9                   xor    %ecx,%ecx
      350:       e9 57 ff ff ff          jmp    2ac <dc_fixpt_recip+0x1c>
      355:       0f 0b                   ud2
      357:       eb d3                   jmp    32c <dc_fixpt_recip+0x9c>
      359:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000000360 <.Ltmp40>:
      360:       90                      nop
      361:       90                      nop
      362:       90                      nop
      363:       90                      nop
      364:       90                      nop
      365:       90                      nop
      366:       90                      nop
      367:       90                      nop
      368:       90                      nop
      369:       90                      nop
      36a:       90                      nop
      36b:       90                      nop
      36c:       90                      nop
      36d:       90                      nop
      36e:       90                      nop
      36f:       90                      nop

0000000000000370 <dc_fixpt_sinc>:
      370:       f3 0f 1e fa             endbr64
      374:       e8 00 00 00 00          call   379 <dc_fixpt_sinc+0x9>

(2) objdump info with LLVM mainline version 20.0.0git:

$ clang --version | head -1
clang version 20.0.0git (https://github.com/llvm/llvm-project.git 
8daf4f16fa08b5d876e98108721dd1743a360326)

There is "ud2" instruction at the end of dc_fixpt_recip(), its offset
is "350", this "ud2" instruction is not dead end due to the offset "352"
is in the relocation section '.rela.discard.reachable', that is to say,
dc_fixpt_recip() doesn't end with a return instruction or an
unconditional jump, so objtool determined that the function can fall
through into the next function, thus there is "falls through" objtool
warning.

0000000000000290 <dc_fixpt_recip>:
      290:       f3 0f 1e fa             endbr64
      294:       e8 00 00 00 00          call   299 <dc_fixpt_recip+0x9>
      299:       48 85 ff                test   %rdi,%rdi
      29c:       0f 84 ac 00 00 00       je     34e <dc_fixpt_recip+0xbe>
      2a2:       53                      push   %rbx
      2a3:       48 89 f9                mov    %rdi,%rcx
      2a6:       48 f7 d9                neg    %rcx
      2a9:       48 0f 48 cf             cmovs  %rdi,%rcx
      2ad:       48 b8 00 00 00 00 01    movabs $0x100000000,%rax
      2b4:       00 00 00
      2b7:       31 d2                   xor    %edx,%edx
      2b9:       48 f7 f1                div    %rcx
      2bc:       be e0 ff ff ff          mov    $0xffffffe0,%esi
      2c1:       eb 1b                   jmp    2de <dc_fixpt_recip+0x4e>
      2c3:       45 88 c8                mov    %r9b,%r8b
      2c6:       4d 01 c0                add    %r8,%r8
      2c9:       4d 8d 04 80             lea    (%r8,%rax,4),%r8
      2cd:       48 29 da                sub    %rbx,%rdx
      2d0:       45 88 da                mov    %r11b,%r10b
      2d3:       4c 89 d0                mov    %r10,%rax
      2d6:       4c 09 c0                or     %r8,%rax
      2d9:       83 c6 02                add    $0x2,%esi
      2dc:       74 31                   je     30f <dc_fixpt_recip+0x7f>
      2de:       48 01 d2                add    %rdx,%rdx
      2e1:       45 31 c0                xor    %r8d,%r8d
      2e4:       41 ba 00 00 00 00       mov    $0x0,%r10d
      2ea:       48 39 ca                cmp    %rcx,%rdx
      2ed:       41 0f 93 c1             setae  %r9b
      2f1:       72 03                   jb     2f6 <dc_fixpt_recip+0x66>
      2f3:       49 89 ca                mov    %rcx,%r10
      2f6:       4c 29 d2                sub    %r10,%rdx
      2f9:       48 01 d2                add    %rdx,%rdx
      2fc:       45 31 d2                xor    %r10d,%r10d
      2ff:       48 89 cb                mov    %rcx,%rbx
      302:       48 39 ca                cmp    %rcx,%rdx
      305:       41 0f 93 c3             setae  %r11b
      309:       73 b8                   jae    2c3 <dc_fixpt_recip+0x33>
      30b:       31 db                   xor    %ebx,%ebx
      30d:       eb b4                   jmp    2c3 <dc_fixpt_recip+0x33>
      30f:       48 01 d2                add    %rdx,%rdx
      312:       48 be fe ff ff ff ff    movabs $0x7ffffffffffffffe,%rsi
      319:       ff ff 7f
      31c:       4c 8d 46 01             lea    0x1(%rsi),%r8
      320:       48 39 ca                cmp    %rcx,%rdx
      323:       4c 0f 43 c6             cmovae %rsi,%r8
      327:       4c 39 c0                cmp    %r8,%rax
      32a:       77 1e                   ja     34a <dc_fixpt_recip+0xba>
      32c:       48 39 ca                cmp    %rcx,%rdx
      32f:       48 83 d8 ff             sbb    $0xffffffffffffffff,%rax
      333:       48 89 c1                mov    %rax,%rcx
      336:       48 f7 d9                neg    %rcx
      339:       48 85 ff                test   %rdi,%rdi
      33c:       48 0f 49 c8             cmovns %rax,%rcx
      340:       48 89 c8                mov    %rcx,%rax
      343:       5b                      pop    %rbx
      344:       2e e9 00 00 00 00       cs jmp 34a <dc_fixpt_recip+0xba>
      34a:       0f 0b                   ud2
      34c:       eb de                   jmp    32c <dc_fixpt_recip+0x9c>
      34e:       0f 0b                   ud2
      350:       0f 0b                   ud2
      352:       66 66 66 66 66 2e 0f    data16 data16 data16 data16 cs 
nopw 0x0(%rax,%rax,1)
      359:       1f 84 00 00 00 00 00

0000000000000360 <.Ltmp40>:
      360:       90                      nop
      361:       90                      nop
      362:       90                      nop
      363:       90                      nop
      364:       90                      nop
      365:       90                      nop
      366:       90                      nop
      367:       90                      nop
      368:       90                      nop
      369:       90                      nop
      36a:       90                      nop
      36b:       90                      nop
      36c:       90                      nop
      36d:       90                      nop
      36e:       90                      nop
      36f:       90                      nop

0000000000000370 <dc_fixpt_sinc>:
      370:       f3 0f 1e fa             endbr64
      374:       e8 00 00 00 00          call   379 <dc_fixpt_sinc+0x9>
--------------------------------------------------------------------------

In my opinion, if there is a bug, then it is a generic rather than
arch-specific bug.

The root cause is related with LLVM or Linux kernel? I am not sure
because objtool only checks the object info and reports the warning
"falls through" if the check conditions are true.

tools/objtool/check.c
static int validate_branch(struct objtool_file *file, struct symbol *func,
                            struct instruction *insn, struct insn_state 
state)
{
         struct alternative *alt;
         struct instruction *next_insn, *prev_insn = NULL;
         struct section *sec;
         u8 visited;
         int ret;

         sec = insn->sec;

         while (1) {
                 next_insn = next_insn_to_validate(file, insn);

                 if (func && insn_func(insn) && func != 
insn_func(insn)->pfunc) {
                         /* Ignore KCFI type preambles, which always 
fall through */
                         if (!strncmp(func->name, "__cfi_", 6) ||
                             !strncmp(func->name, "__pfx_", 6))
                                 return 0;

                         WARN("%s() falls through to next function %s()",
                              func->name, insn_func(insn)->name);
                         return 1;
                 }
         ...
}

How to silence the warning when compiling with the latest mainline
LLVM with a proper way? Modify LLVM code or tools/objtool/check.c
or drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.c?

Anyway, I think this patch can be independent of this series and can
be sent separately after the "real bug" is fixed, please ignore it now.

Thanks,
Tiezhu



More information about the amd-gfx mailing list