[Mesa-dev] [PATCH v3 2/6] anv: Add a helper for doing mass allocations
Chris Wilson
chris at chris-wilson.co.uk
Fri Apr 7 23:57:02 UTC 2017
On Fri, Apr 07, 2017 at 04:30:49PM -0700, Jason Ekstrand wrote:
> On Fri, Apr 7, 2017 at 3:19 PM, Chris Wilson <[1]chris at chris-wilson.co.uk>
> wrote:
>
> On Fri, Apr 07, 2017 at 02:41:13PM -0700, Jason Ekstrand wrote:
> > On Fri, Apr 7, 2017 at 1:26 PM, Chris Wilson
> <[1][2]chris at chris-wilson.co.uk>
> > wrote:
> >
> > On Fri, Apr 07, 2017 at 12:55:53PM -0700, Jason Ekstrand wrote:
> > > +#define _ANV_MULTIALLOC_UPDATE_POINTER(_i) \
> > > + if ((_i) < ma->ptr_count) \
> > > + *ma->ptrs[_i] = ptr + (uintptr_t)*ma->ptrs[_i]
> > > + _ANV_MULTIALLOC_UPDATE_POINTER(0);
> > > + _ANV_MULTIALLOC_UPDATE_POINTER(1);
> > > + _ANV_MULTIALLOC_UPDATE_POINTER(2);
> > > + _ANV_MULTIALLOC_UPDATE_POINTER(3);
> > > + _ANV_MULTIALLOC_UPDATE_POINTER(4);
> > > + _ANV_MULTIALLOC_UPDATE_POINTER(5);
> > > + _ANV_MULTIALLOC_UPDATE_POINTER(6);
> > > + _ANV_MULTIALLOC_UPDATE_POINTER(7);
> > > +#undef _ANV_MULTIALLOC_UPDATE_POINTER
> >
> > #define _ANV_MULTIALLOC_UPDATE_POINTER(_i) case _i + 1:
> *ma->ptrs[_i] =
> > ptr +(uintptr)*ma->ptrs[_i]
> >
> > switch (ma->ptr_count) {
> > _ANV_MULTIALLOC_UPDATE_POINTER(7);
> > _ANV_MULTIALLOC_UPDATE_POINTER(6);
> > _ANV_MULTIALLOC_UPDATE_POINTER(5);
> > _ANV_MULTIALLOC_UPDATE_POINTER(4);
> > _ANV_MULTIALLOC_UPDATE_POINTER(3);
> > _ANV_MULTIALLOC_UPDATE_POINTER(2);
> > _ANV_MULTIALLOC_UPDATE_POINTER(1);
> > _ANV_MULTIALLOC_UPDATE_POINTER(0);
> > }
> >
> > #undef _ANV_MULITALLOC_UPDATE_POINTER
> >
> > If ma->ptr_count is constant, they generate exactly the same code.
> If it
> > isn't (i.e. if one of the multialloc_adds is predicated), then they
> still
> > generate basically the same code with the code for the if version
> being
> > slightly more straightforward.
>
> Took a look at this with [3]https://godbolt.org/g/UwrMk1
>
> Weird... That's not at all what I'm seeing with my demo file. In fact,
> when I try to compile your demo file with GCC on my local machine, it
> reduces the entire thing down to less than a dozen instrutions.
Yes, if I force inline the add, gcc and clang both realise that the
function doesn't use any of the values and discards everything. In the
end, gcc actually generates very smart code.
consume_pointer:
movl $0, (%rdi)
ret
main:
subq $8, %rsp
movl $200, %edi
call malloc
testq %rax, %rax
je .L5
movq %rax, %rdi
call consume_pointer
leaq 4(%rax), %rdi
call consume_pointer
leaq 72(%rax), %rdi
call consume_pointer
xorl %eax, %eax
.L3:
addq $8, %rsp
ret
.L5:
orl $-1, %eax
jmp .L3
It's generated a single allocation, and yet still passed around the
various offsets within that block without having to store the offsets.
anv_multialloc_add() definitely needs __attribute__((always_inline)).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the mesa-dev
mailing list