[PATCH] drm: omapdrm: reduce clang stack usage
Arnd Bergmann
arnd at arndb.de
Thu Jun 12 12:40:45 UTC 2025
On Thu, Jun 12, 2025, at 09:58, Tomi Valkeinen wrote:
> On 10/06/2025 12:27, Arnd Bergmann wrote:
>>
>> -static void dispc_restore_context(struct dispc_device *dispc)
>> +static noinline_for_stack void dispc_restore_context(struct dispc_device *dispc)
>> {
>> int i, j;
>>
>
> While I don't think this causes any harm, but... What's going on here?
> If I compile with gcc (x86 or arm), I see stack usage in few hundreds of
> bytes. If I compile with LLVM=1, the stack usage jumps to over a thousand.
>
> Is clang just broken? I don't see anything special with
> dispc_restore_context() or dispc_runtime_resume(), so is this same thing
> happening all around the kernel, and we need to sprinkle noinlines
> everywhere?
>
> Or do we get some extra debugging feature enabled only on clang with
> allmodconfig, and that is eating the stack?
There is no general answer here, but a combination of multiple
effects going on at the same time throughout the kernel, which lead
to clang observing excessive stack usage in some files when gcc
does not:
- both compilers have a number of corner cases where they run off
and do something crazy for unusual input (usually crypto code),
but since gcc has more users, most files that trigger only gcc
already have workarounds in place, while the ones that trigger
with clang are still missing them
- The inlining algorithm works the opposite way on clang vs gcc,
while gcc always starts inlining leaf functions into their callers
and does this recursively, clang starts with global functions
and inlines its direct callees first. If you have deeply nested
static functions that could all be inlined, both stop at some
point, but the resulting object code looks completely different,
and the stack usage is a symptom of this. I've added 'noinline'
for some of the cases like this where I know both result in
the same (harmless) stack usage through the call chain, but
only clang warns about it.
- clang has previously had bugs where it tracks the lifetime of
stack variables incorrectly, so multiple variables that
should share the same stack slot won't. Some of these are
fixed now, others are a result of the different inlining, and
some others are likely still bugs we should fix in clang
- CONFIG_KMSAN disables some optimizations that are required
for reducing stack usage, and at the moment this is only
implemented in clang but not gcc.
- CONFIG_KASAN has some similar issues as KMSAN but is not
quite as bad here.
- CONFIG_KASAN_STACK tends to use more stack with clang than gcc
because of implementation choices around how hard it should
try to detect array overflows. This could be changed by having
clang make similar decisions to gcc here, but for now we just
require using CONFIG_EXPERT=y to enable KASAN_STACK on clang.
I have managed to produce a testcase for this file that shows
how clang produces huge stack usage when gcc does not,
in this case it seems to be triggered by -fsanitize=kernel-address
https://godbolt.org/z/TT88zPYf6
Arnd
More information about the dri-devel
mailing list