[PATCH] drm: omapdrm: reduce clang stack usage

Arnd Bergmann arnd at arndb.de
Thu Jun 12 12:40:45 UTC 2025


On Thu, Jun 12, 2025, at 09:58, Tomi Valkeinen wrote:
> On 10/06/2025 12:27, Arnd Bergmann wrote:
>>  
>> -static void dispc_restore_context(struct dispc_device *dispc)
>> +static noinline_for_stack void dispc_restore_context(struct dispc_device *dispc)
>>  {
>>  	int i, j;
>>  
>
> While I don't think this causes any harm, but... What's going on here?
> If I compile with gcc (x86 or arm), I see stack usage in few hundreds of
> bytes. If I compile with LLVM=1, the stack usage jumps to over a thousand.
>
> Is clang just broken? I don't see anything special with
> dispc_restore_context() or dispc_runtime_resume(), so is this same thing
> happening all around the kernel, and we need to sprinkle noinlines
> everywhere?
>
> Or do we get some extra debugging feature enabled only on clang with
> allmodconfig, and that is eating the stack?

There is no general answer here, but a combination of multiple
effects going on at the same time throughout the kernel, which lead
to clang observing excessive stack usage in some files when gcc
does not:

- both compilers have a number of corner cases where they run off
  and do something crazy for unusual input (usually crypto code),
  but since gcc has more users, most files that trigger only gcc
  already have workarounds in place, while the ones that trigger
  with clang are still missing them

- The inlining algorithm works the opposite way on clang vs gcc,
  while gcc always starts inlining leaf functions into their callers
  and does this recursively, clang starts with global functions
  and inlines its direct callees first. If you have deeply nested
  static functions that could all be inlined, both stop at some
  point, but the resulting object code looks completely different,
  and the stack usage is a symptom of this. I've added 'noinline'
  for some of the cases like this where I know both result in
  the same (harmless) stack usage through the call chain, but
  only clang warns about it.

- clang has previously had bugs where it tracks the lifetime of
  stack variables incorrectly, so multiple variables that
  should share the same stack slot won't. Some of these are
  fixed now, others are a result of the different inlining, and
  some others are likely still bugs we should fix in clang

- CONFIG_KMSAN disables some optimizations that are required
  for reducing stack usage, and at the moment this is only
  implemented in clang but not gcc.

- CONFIG_KASAN has some similar issues as KMSAN but is not
  quite as bad here.

- CONFIG_KASAN_STACK tends to use more stack with clang than gcc
  because of implementation choices around how hard it should
  try to detect array overflows. This could be changed by having
  clang make similar decisions to gcc here, but for now we just
  require using CONFIG_EXPERT=y to enable KASAN_STACK on clang.

I have managed to produce a testcase for this file that shows
how clang produces huge stack usage when gcc does not,
in this case it seems to be triggered by -fsanitize=kernel-address

https://godbolt.org/z/TT88zPYf6


      Arnd


More information about the dri-devel mailing list