AMDGPU and 16B stack alignment
Arnd Bergmann
arnd at arndb.de
Tue Oct 15 07:19:04 UTC 2019
On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar at amd.com> wrote:
> On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> My gcc build fails with below errors:
>
> dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
>
> dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
>
> While GPF observed on clang builds seem to be fixed.
Ok, so it seems that gcc insists on having at least 2^4 bytes stack
alignment when
SSE is enabled on x86-64, but does not actually rely on that for
correct operation
unless it's using sse2. So -msse always has to be paired with
-mpreferred-stack-boundary=3.
For clang, it sounds like the opposite is true: when passing 16 byte
stack alignment
and having sse/sse2 enabled, it requires the incoming stack to be 16
byte aligned,
but passing 8 byte alignment makes it do the right thing.
So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
to get the desired outcome on both?
Arnd
More information about the amd-gfx
mailing list