[PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline
Peter Zijlstra
peterz at infradead.org
Fri Dec 20 10:31:00 UTC 2024
On Fri, Dec 20, 2024 at 01:02:18PM +0800, Tiezhu Yang wrote:
> 2. For x86
>
> I tested with LLVM 19.1.6 and the latest mainline LLVM, the test result
> is same with LoongArch.
Debian's clang-19 is 19.1.5.
> (1) objdump info with LLVM release version 19.1.6:
Please always use -r, that's ever so much more readable.
> $ clang --version | head -1
> clang version 19.1.6
>
> There is an jump instruction "jmp" at the end of dc_fixpt_recip(), it
> maybe jumps to a position and then steps to the return instruction, so
> there is no "falls through" objtool warning.
>
> 0000000000000290 <dc_fixpt_recip>:
> 290: f3 0f 1e fa endbr64
> 294: e8 00 00 00 00 call 299 <dc_fixpt_recip+0x9>
> 299: 48 85 ff test %rdi,%rdi
> 29c: 0f 84 a8 00 00 00 je 34a <dc_fixpt_recip+0xba>
> 2a2: 48 89 f9 mov %rdi,%rcx
> 2a5: 48 f7 d9 neg %rcx
> 2a8: 48 0f 48 cf cmovs %rdi,%rcx
> 2ac: 53 push %rbx
> 2ad: 48 b8 00 00 00 00 01 movabs $0x100000000,%rax
> 2b4: 00 00 00
> 2b7: 31 d2 xor %edx,%edx
> 2b9: 48 f7 f1 div %rcx
> 2bc: be e0 ff ff ff mov $0xffffffe0,%esi
> 2c1: eb 1b jmp 2de <dc_fixpt_recip+0x4e>
> 2c3: 45 88 c8 mov %r9b,%r8b
> 2c6: 4d 01 c0 add %r8,%r8
> 2c9: 4d 8d 04 80 lea (%r8,%rax,4),%r8
> 2cd: 48 29 da sub %rbx,%rdx
> 2d0: 45 88 da mov %r11b,%r10b
> 2d3: 4c 89 d0 mov %r10,%rax
> 2d6: 4c 09 c0 or %r8,%rax
> 2d9: 83 c6 02 add $0x2,%esi
> 2dc: 74 31 je 30f <dc_fixpt_recip+0x7f>
> 2de: 48 01 d2 add %rdx,%rdx
> 2e1: 45 31 c0 xor %r8d,%r8d
> 2e4: 41 ba 00 00 00 00 mov $0x0,%r10d
> 2ea: 48 39 ca cmp %rcx,%rdx
> 2ed: 41 0f 93 c1 setae %r9b
> 2f1: 72 03 jb 2f6 <dc_fixpt_recip+0x66>
> 2f3: 49 89 ca mov %rcx,%r10
> 2f6: 4c 29 d2 sub %r10,%rdx
> 2f9: 48 01 d2 add %rdx,%rdx
> 2fc: 45 31 d2 xor %r10d,%r10d
> 2ff: 48 89 cb mov %rcx,%rbx
> 302: 48 39 ca cmp %rcx,%rdx
> 305: 41 0f 93 c3 setae %r11b
> 309: 73 b8 jae 2c3 <dc_fixpt_recip+0x33>
> 30b: 31 db xor %ebx,%ebx
> 30d: eb b4 jmp 2c3 <dc_fixpt_recip+0x33>
> 30f: 48 01 d2 add %rdx,%rdx
> 312: 48 be fe ff ff ff ff movabs $0x7ffffffffffffffe,%rsi
> 319: ff ff 7f
> 31c: 4c 8d 46 01 lea 0x1(%rsi),%r8
> 320: 48 39 ca cmp %rcx,%rdx
> 323: 4c 0f 43 c6 cmovae %rsi,%r8
> 327: 4c 39 c0 cmp %r8,%rax
> 32a: 77 29 ja 355 <dc_fixpt_recip+0xc5>
> 32c: 48 39 ca cmp %rcx,%rdx
> 32f: 48 83 d8 ff sbb $0xffffffffffffffff,%rax
> 333: 48 89 c1 mov %rax,%rcx
> 336: 48 f7 d9 neg %rcx
> 339: 48 85 ff test %rdi,%rdi
> 33c: 48 0f 49 c8 cmovns %rax,%rcx
> 340: 48 89 c8 mov %rcx,%rax
> 343: 5b pop %rbx
> 344: 2e e9 00 00 00 00 cs jmp 34a <dc_fixpt_recip+0xba>
> 34a: 0f 0b ud2
> 34c: 0f 0b ud2
> 34e: 31 c9 xor %ecx,%ecx
> 350: e9 57 ff ff ff jmp 2ac <dc_fixpt_recip+0x1c>
> 355: 0f 0b ud2
> 357: eb d3 jmp 32c <dc_fixpt_recip+0x9c>
> 359: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
I had to puzzle a bit to get that double ud2 -- not all configs do that.
Also, curse the DRM Makefiles, you can't do:
make drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.s
:-(
They are the two ASSERT()s on line 217 and 54 respectively. They end up
asserting the same value twice, so that makes sense.
> $ clang --version | head -1
> clang version 20.0.0git (https://github.com/llvm/llvm-project.git
> 8daf4f16fa08b5d876e98108721dd1743a360326)
So I didn't have a recent build at hand.. so I've not validated the
below.
> There is "ud2" instruction at the end of dc_fixpt_recip(), its offset
> is "350", this "ud2" instruction is not dead end due to the offset "352"
> is in the relocation section '.rela.discard.reachable', that is to say,
> dc_fixpt_recip() doesn't end with a return instruction or an
> unconditional jump, so objtool determined that the function can fall
> through into the next function, thus there is "falls through" objtool
> warning.
>
> 0000000000000290 <dc_fixpt_recip>:
> 290: f3 0f 1e fa endbr64
> 294: e8 00 00 00 00 call 299 <dc_fixpt_recip+0x9>
> 299: 48 85 ff test %rdi,%rdi
> 29c: 0f 84 ac 00 00 00 je 34e <dc_fixpt_recip+0xbe>
> 2a2: 53 push %rbx
> 2a3: 48 89 f9 mov %rdi,%rcx
> 2a6: 48 f7 d9 neg %rcx
> 2a9: 48 0f 48 cf cmovs %rdi,%rcx
> 2ad: 48 b8 00 00 00 00 01 movabs $0x100000000,%rax
> 2b4: 00 00 00
> 2b7: 31 d2 xor %edx,%edx
> 2b9: 48 f7 f1 div %rcx
> 2bc: be e0 ff ff ff mov $0xffffffe0,%esi
> 2c1: eb 1b jmp 2de <dc_fixpt_recip+0x4e>
> 2c3: 45 88 c8 mov %r9b,%r8b
> 2c6: 4d 01 c0 add %r8,%r8
> 2c9: 4d 8d 04 80 lea (%r8,%rax,4),%r8
> 2cd: 48 29 da sub %rbx,%rdx
> 2d0: 45 88 da mov %r11b,%r10b
> 2d3: 4c 89 d0 mov %r10,%rax
> 2d6: 4c 09 c0 or %r8,%rax
> 2d9: 83 c6 02 add $0x2,%esi
> 2dc: 74 31 je 30f <dc_fixpt_recip+0x7f>
> 2de: 48 01 d2 add %rdx,%rdx
> 2e1: 45 31 c0 xor %r8d,%r8d
> 2e4: 41 ba 00 00 00 00 mov $0x0,%r10d
> 2ea: 48 39 ca cmp %rcx,%rdx
> 2ed: 41 0f 93 c1 setae %r9b
> 2f1: 72 03 jb 2f6 <dc_fixpt_recip+0x66>
> 2f3: 49 89 ca mov %rcx,%r10
> 2f6: 4c 29 d2 sub %r10,%rdx
> 2f9: 48 01 d2 add %rdx,%rdx
> 2fc: 45 31 d2 xor %r10d,%r10d
> 2ff: 48 89 cb mov %rcx,%rbx
> 302: 48 39 ca cmp %rcx,%rdx
> 305: 41 0f 93 c3 setae %r11b
> 309: 73 b8 jae 2c3 <dc_fixpt_recip+0x33>
> 30b: 31 db xor %ebx,%ebx
> 30d: eb b4 jmp 2c3 <dc_fixpt_recip+0x33>
> 30f: 48 01 d2 add %rdx,%rdx
> 312: 48 be fe ff ff ff ff movabs $0x7ffffffffffffffe,%rsi
> 319: ff ff 7f
> 31c: 4c 8d 46 01 lea 0x1(%rsi),%r8
> 320: 48 39 ca cmp %rcx,%rdx
> 323: 4c 0f 43 c6 cmovae %rsi,%r8
> 327: 4c 39 c0 cmp %r8,%rax
> 32a: 77 1e ja 34a <dc_fixpt_recip+0xba>
> 32c: 48 39 ca cmp %rcx,%rdx
> 32f: 48 83 d8 ff sbb $0xffffffffffffffff,%rax
> 333: 48 89 c1 mov %rax,%rcx
> 336: 48 f7 d9 neg %rcx
> 339: 48 85 ff test %rdi,%rdi
> 33c: 48 0f 49 c8 cmovns %rax,%rcx
> 340: 48 89 c8 mov %rcx,%rax
> 343: 5b pop %rbx
> 344: 2e e9 00 00 00 00 cs jmp 34a <dc_fixpt_recip+0xba>
> 34a: 0f 0b ud2
> 34c: eb de jmp 32c <dc_fixpt_recip+0x9c>
> 34e: 0f 0b ud2
> 350: 0f 0b ud2
> 352: 66 66 66 66 66 2e 0f data16 data16 data16 data16 cs nopw
> 0x0(%rax,%rax,1)
> 359: 1f 84 00 00 00 00 00
If you put them size-by-side, you'll see it's more or less the same
code-gen (trivial differences), but now it just stops code-gen, where
previously it would continue.
So this really is a compiler problem, this needs no annotation, it's
straight up broken.
Now, the thing is, these ASSERT()s are checking for divide-by-zero, I
suspect clang figured that out and invokes UB on us and just stops
code-gen.
Nathan, Nick, don't we have a compiler flag that forces __builtin_trap()
whenever clang pulls something like this? I think UBSAN does this, but
we really shouldn't pull in the whole of that for sanity.
More information about the amd-gfx
mailing list