[Mesa-dev] [PATCH] mesa: replace GLenum with GLenum16 in common structures

Tue Nov 14 16:49:26 UTC 2017

2017-11-14 15:04 GMT+01:00 Marek Olšák <maraeo at gmail.com>:
> On Mon, Nov 13, 2017 at 10:19 PM, Ian Romanick <idr at freedesktop.org> wrote:
>> On 11/08/2017 07:16 PM, Marek Olšák wrote:
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> For lower CPU cache usage. All enums fit within 2 bytes.
>>
>> Have you benchmarked this on anything?  My recollection is that for many
>> things loads and stores of 16-bit values is more expensive than 8- or
>> 32-bit values on 64-bit architectures.  More recent CPUs may have
>> changed in this respect... I think that was back in the Core2 kind of
>> time frame, but I thought it also applied to AMD CPUs.

According to Agner [1] (looking at the instruction tables), there was
a penalty for 8 and 16 bit reads (+1 cycle of latency) at least on AMD
K8, K10 and Jaguar processors. The MOVZX and MOVSX instructions (often
used when dealing with sub-32bit loads) are also slower than direct
loads (with the same +1 cycle of latency penalty). On Bulldozer based
CPUs the penalty is only left when doing MOVSX (i.e. zero-extending
and loading to a smaller register is as fast as a 32-bit load). Zen
based CPUs has removed even this, so all modern AMD CPUs shouldn't be
slower (other than potential increase in code size due to usage of
0x66 prefixes when dealing with 16-bit quantities; 8-bit load/stores
have no such penalty).

I don't see any problem on any modern Intel CPUs since at least Core2.

[1] http://www.agner.org/optimize/