[Beignet] [PATCH] GBE: Optimize scratch memory usage using register interval
Zhigang Gong
zhigang.gong at linux.intel.com
Thu Feb 20 23:56:22 PST 2014
On Fri, Feb 21, 2014 at 03:23:49PM +0800, Ruiling Song wrote:
> As scratch memory is a limited resource in HW. And different
> register have the opptunity to share same scratch memory. So I
> implement a special designed allocator to reduce scratch memory usage,
> it is assumed that scatch memory are always allocated in multiple of 32 bytes.
>
> Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> ---
> backend/src/backend/context.cpp | 149 +++++++++++++++++++++++++++-
> backend/src/backend/context.hpp | 28 +++++-
> backend/src/backend/gen_reg_allocation.cpp | 50 ++++++++--
> 3 files changed, 212 insertions(+), 15 deletions(-)
>
> diff --git a/backend/src/backend/context.cpp b/backend/src/backend/context.cpp
> index 2125bd1..24f4f6f 100644
> --- a/backend/src/backend/context.cpp
> +++ b/backend/src/backend/context.cpp
> @@ -301,6 +301,145 @@ namespace gbe
>
> return i;
> }
> + /*
> + * As scrath memory are allocated by multiple of fixedSize(32), we design the simple allocator
> + * to make allocation and free efficient. A ScratchBlock is multiple of fixedSize with one head,
> + * and possibly one tail and internal elements, each element are defined as [size, used, prev, next],
> + * The block header's size record the real block size, other elements in the block are with size 0.
> + * When an element is block header, its [prev/next] are used for free block list.
> + * The tail block's prev points to block header. When a block is freed, it will check
> + * left and right blocks for possible merge.
> + */
> + // XXX TODO as we optimize scratch memory usage using the register interval.
Let us use FIXME instead of XXX.
> + // we need to add some dependency in post_reg_alloc scheduler, to keep scratch
> + // memory that are reused still keep the order
According to current usage model, which is allocate/deallocate one register each time.
IMO, a simple std::set should be good enough to be used as the allocator.
The maximum scratch size is known, then we can initialize the set to contains all elements
in [0 - (max_scratch_size/32 - 1))]. Then each time to allocate, we just call set.begin()/erase
to extract one from the set. And at deallocation, we just call set.insert() to insert a virtual
scratch register to the set.
What do you think?
> + {
> + const uint32_t regNum = spilledRegs.size();
> + this->starting.resize(regNum);
> + this->ending.resize(regNum);
> + uint32_t regID = 0;
> + for(auto it = spilledRegs.begin(); it != spilledRegs.end(); ++it) {
> + this->starting[regID] = this->ending[regID] = &intervals[it->first];
> + regID++;
> + }
> + std::sort(this->starting.begin(), this->starting.end(), cmp<true>);
> + std::sort(this->ending.begin(), this->ending.end(), cmp<false>);
> + int toExpire = 0;
> + for(uint32_t i = 0; i < regNum; i++) {
> + const GenRegInterval * cur = starting[i];
> + const GenRegInterval * exp = ending[toExpire];
> + if(exp->maxID <= cur->minID) {
It should be exp->maxID < cur->minID. If the exp->maxID == cur->minID, then it is still alive
and we should not expire it.
Thanks.
More information about the Beignet
mailing list