[Beignet] [PATCH] GBE: Optimize scratch memory usage using register interval

Zhigang Gong zhigang.gong at linux.intel.com
Thu Feb 20 23:56:22 PST 2014


On Fri, Feb 21, 2014 at 03:23:49PM +0800, Ruiling Song wrote:
> As scratch memory is a limited resource in HW. And different
> register have the opptunity to share same scratch memory. So I
> implement a special designed allocator to reduce scratch memory usage,
> it is assumed that scatch memory are always allocated in multiple of 32 bytes.
> 
> Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> ---
>  backend/src/backend/context.cpp            |  149 +++++++++++++++++++++++++++-
>  backend/src/backend/context.hpp            |   28 +++++-
>  backend/src/backend/gen_reg_allocation.cpp |   50 ++++++++--
>  3 files changed, 212 insertions(+), 15 deletions(-)
> 
> diff --git a/backend/src/backend/context.cpp b/backend/src/backend/context.cpp
> index 2125bd1..24f4f6f 100644
> --- a/backend/src/backend/context.cpp
> +++ b/backend/src/backend/context.cpp
> @@ -301,6 +301,145 @@ namespace gbe
>  
>      return i;
>    }
> + /*
> +  * As scrath memory are allocated by multiple of fixedSize(32), we design the simple allocator
> +  * to make allocation and free efficient. A ScratchBlock is multiple of fixedSize with one head,
> +  * and possibly one tail and internal elements, each element are defined as [size, used, prev, next],
> +  * The block header's size record the real block size, other elements in the block are with size 0.
> +  * When an element is block header, its [prev/next] are used for free block list.
> +  * The tail block's prev points to block header. When a block is freed, it will check
> +  * left and right blocks for possible merge.
> +  */
> +  // XXX TODO as we optimize scratch memory usage using the register interval.
Let us use FIXME instead of XXX.
> +  // we need to add some dependency in post_reg_alloc scheduler, to keep scratch
> +  // memory that are reused still keep the order

According to current usage model, which is allocate/deallocate one register each time.
IMO, a simple std::set should be good enough to be used as the allocator.

The maximum scratch size is known, then we can initialize the set to contains all elements
in [0 - (max_scratch_size/32 - 1))]. Then each time to allocate, we just call set.begin()/erase
to extract one from the set. And at deallocation, we just call set.insert() to insert a virtual
scratch register to the set. 

What do you think?

> +  {
> +    const uint32_t regNum = spilledRegs.size();
> +    this->starting.resize(regNum);
> +    this->ending.resize(regNum);
> +    uint32_t regID = 0;
> +    for(auto it = spilledRegs.begin(); it != spilledRegs.end(); ++it) {
> +      this->starting[regID] = this->ending[regID] = &intervals[it->first];
> +      regID++;
> +    }
> +    std::sort(this->starting.begin(), this->starting.end(), cmp<true>);
> +    std::sort(this->ending.begin(), this->ending.end(), cmp<false>);
> +    int toExpire = 0;
> +    for(uint32_t i = 0; i < regNum; i++) {
> +      const GenRegInterval * cur = starting[i];
> +      const GenRegInterval * exp = ending[toExpire];
> +      if(exp->maxID <= cur->minID) {
It should be exp->maxID < cur->minID. If the exp->maxID == cur->minID, then it is still alive
and we should not expire it.


Thanks.


More information about the Beignet mailing list