[Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds

Scott D Phillips scott.d.phillips at intel.com
Wed Mar 21 15:18:04 UTC 2018


Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Scott D Phillips (2018-03-20 20:39:25)
>> When building intel_tiled_memcpy for i686, the stack will only be
>> 4-byte aligned. This isn't sufficient for SSE temporaries which
>> require 16-byte alignment.  Use the force_align_arg_pointer
>> function attribute in that case to ensure sufficient alignment.
>> ---
>>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> index 69306828d72..bd8bafbd2d7 100644
>> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> @@ -42,6 +42,12 @@
>>  #include <emmintrin.h>
>>  #endif
>>  
>> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || defined(__SSE2__))
>> +#define REALIGN __attribute__((force_align_arg_pointer))
>> +#else
>> +#define REALIGN
>> +#endif
>
> It would be a harmless no-op on x86-64 (or essential ;)
>
>> +
>>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>>  
>>  #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b)
>> @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src)
>>  /**
>>   * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned.
>>   */
>> -static inline void *
>> +static REALIGN inline void *
>>  rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes)
>>  {
>>     assert(bytes == 0 || !(((uintptr_t)dst) & 0xf));
>
> Hmm, if aligned_dst is spilling to stack, so would be aligned_src.  As
> these are supposed to be inlined (and constant folded), do you not
> want to realign the callers instead?

Playing around with this function attribute more, it seems not to work
as I had hoped. Specifically I was expecting that having the attribute
on an inline function would cause the attribute to be applied to any
inliners as well, but that seems to not happen. Instead it seems to work
more like "if you emit a prologue here, then use this alternative one".

I guess the right solution is to just apply -mstackrealign to the whole
file and let the stack fix get applied to all entry points (whatever
that winds up being after optimization) like we have for
main/streaming-load-memcpy.c. And then relying on inlining to keep any
stack munging out of our loops.

> Perhaps with an explicit FLATTEN.

Ya, it looks like flatten would be good there.

> -Chris


More information about the mesa-dev mailing list