[Mesa-dev] use of likey() / unlikely() macros

Patrick Baggett baggett.patrick at gmail.com
Thu Jan 17 09:25:34 PST 2013

On Thu, Jan 17, 2013 at 10:37 AM, Brian Paul <brianp at vmware.com> wrote:

> In compiler.h we define the likely(), unlikely() macros which wrap GCC's
> __builtin_expect().  But we only use them in a handful of places.
> It seems to me that an obvious place to possibly use these would be for GL
> error testing.  For example, in glDrawArrays():
>    if (unlikely(count <= 0)) {
>       _mesa_error();
>    }
> Plus, in some of the glBegin/End per-vertex calls such as
> glVertexAttrib3fARB() where we error test the index parameter.
> I guess the key question is how much might we gain from this.  I don't
> really have a good feel for the value at this level.  In a tight inner
> loop, sure, but the GL error checking is pretty high-level code.
This is basically a micro-optimization, to be honest. Not that
micro-optimization is "bad", but while it should "improve" performance, it
would take a lot for that to show up on profiles. In the case of error
checking at the start of a function, you might be lucky to save a few
cycles -- virtually unnoticeable.

> I haven't found much on the web about performance gains from
> __builtin_expect().  Anyone?
I read a few heresay posts, but this one comes with actual numbers:


Long story short: if you're wrong, slower; if you're right, marginal

It's use is for changing the ordering of jumps from gcc's default of assume
linear execution. For example, code like this:
if(A == NULL) //not likely
    return ERR_NULL;

if(B >= MAX) //not likely
   return ERR_MAX;

if(C < MIN) //not likely
   return ERR_MIN;


generates jumps around the return statement, so in the normal case, you're
making a jump, which can mean you have a delay and possibly refetch
instructions. If you didn't jump, then CPU will have the "then" part
already loaded in the icache. The "optimal" ordering then is:

if(A != NULL) {
    if(B < MAX) {
        if(C >= MIN) {
        else return ERR_MIN;
    else return ERR_MAX;
else return ERR_NULL;

In the common case then, the code does not branch, but executes a linear
stream of instructions. On modern x86 CPUs, this matters very little,
except for maybe a few in-order CPUs (maybe Intel Atom?). You're probably a
lot more likely to get some improvements from non-x86 where branch
prediction is weaker or unavailable and/or the CPU is in-order. ARM and
older SPARC CPUs come to mind. Also, some architectures allow you to encode
a branch prediction hint inside of the branch itself, e.g. IA64's
"br.call.sptk.many" Branch / Call / Static Predict Taken / Many Times,
which gcc can take advantage of. Still overall, this is well within the
realm of micro-optimization.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130117/380b8319/attachment.html>

More information about the mesa-dev mailing list