[Mesa-dev] [PATCH] i965 : optimized bucket index calculation

Ian Romanick idr at freedesktop.org
Tue Sep 12 17:19:49 UTC 2017


On 09/12/2017 02:40 AM, Marathe, Yogesh wrote:
> Hi Jason,
> 
>  
> 
> On the asserts you’ve mentioned below, I assume we need to add them
> after ‘bufmgr->num_buckets++’ in add_bucket() as num_buckets could be 0
> initially. Another clarification on ~1%, we meant approx. 1% there,
> that’s an improvement we saw in 3Dmark total not a degradation, we’ll
> correct it in commit msg.

I think the problem is that there is insufficient information about your
data.  What we want to see in a commit message is something like:

commit 5ae2de81c8350272c122ea38e6bb4c0a41d58921
Author: Kenneth Graunke <kenneth at whitecape.org>
Date:   Mon Aug 28 16:08:32 2017 -0700

    i965: Use BLORP for buffer object stall avoidance blits instead of BLT.
    
    Improves performance of GFXBench4 tests at 1024x768 on a Kabylake GT2:
    - Manhattan 3.1 by 1.32134% +/- 0.322734% (n=8).
    - Car Chase by 1.25607% +/- 0.291262% (n=5).
    
    Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>

The important bits are:

- average improvement
- statistical deviation
- number of runs
- platform
- name of specific benchmark

> Rest all review comments from you, Tapani and Emil are noted &
> implemented, we are working on running it through mesa CI/CTS and we
> should see a v2 for review after that.
> 
>  
> 
> Regards,
> 
> Yogesh.
> 
>  
> 
> *From:*mesa-dev [mailto:mesa-dev-bounces at lists.freedesktop.org] *On
> Behalf Of *Jason Ekstrand
> *Sent:* Friday, September 8, 2017 9:09 PM
> *To:* Muthukumar, Aravindan <aravindan.muthukumar at intel.com>
> *Cc:* mesa-dev at lists.freedesktop.org; J Karanje, Kedar
> <kedar.j.karanje at intel.com>
> *Subject:* Re: [Mesa-dev] [PATCH] i965 : optimized bucket index calculation
> 
>  
> 
> In general, I'm very concerned about how this handles rounding
> behavior.  Almost everywhere, you round down when what you want to do is
> round up.  Also, as I said on IRC, I'd like to see some asserts in
> add_bucket so that we are sure this calculation is correct.  In
> particular, I'd like to see
> 
>  
> 
> assert(bucket_for_size(size) == &bufmgr->cache_bucket[i]);
> 
> assert(bucket_for_size(size - 2048) == &bufmgr->cache_bucket[i]);
> 
> assert(bucket_for_size(size + 1) != &bufmgr->cache_bucket[i]);
> 
>  
> 
> We need to check on both sides of size to be 100% sure we're doing our
> rounding correctly.
> 
>  
> 
> On Fri, Sep 8, 2017 at 1:11 AM, <aravindan.muthukumar at intel.com
> <mailto:aravindan.muthukumar at intel.com>> wrote:
> 
>     From: Aravindan Muthukumar <aravindan.muthukumar at intel.com
>     <mailto:aravindan.muthukumar at intel.com>>
> 
>     Avoiding the loop which was running with O(n) complexity.
>     Now the complexity has been reduced to O(1)
> 
>     Tested with piglit.
>     Slight performance improvement (~1%) in 3d mark.
> 
>  
> 
> Which 3dmark test?  Also, what's the error in that 1%?
> 
>  
> 
>     Change-Id: Id099f1cd24ad5b691a69070eda79b8f4e9be39a6
>     Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar at intel.com
>     <mailto:aravindan.muthukumar at intel.com>>
>     Signed-off-by: Kedar Karanje <kedar.j.karanje at intel.com
>     <mailto:kedar.j.karanje at intel.com>>
>     Reviewed-by: Yogesh Marathe <yogesh.marathe at intel.com
>     <mailto:yogesh.marathe at intel.com>>
>     ---
>      src/mesa/drivers/dri/i965/brw_bufmgr.c | 48
>     +++++++++++++++++++++++++++++-----
>      1 file changed, 41 insertions(+), 7 deletions(-)
> 
>     diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c
>     b/src/mesa/drivers/dri/i965/brw_bufmgr.c
>     index 5b4e784..18cb166 100644
>     --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
>     +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
>     @@ -87,6 +87,11 @@
> 
>      #define memclear(s) memset(&s, 0, sizeof(s))
> 
>     +/* Macros for BO cache size */
>     +#define CACHE_PAGE_SIZE    4096
> 
>  
> 
> Just call this PAGE_SIZE
> 
>  
> 
>     +#define PAGE_SIZE_SHIFT    12
>     +#define BO_CACHE_PAGE_SIZE (4 * CACHE_PAGE_SIZE)
> 
>  
> 
> I think I'd rather we just use 4 * PAGE_SIZE explicitly than have this
> extra #define.  I think it's making things harder to read and not easier.
> 
>  
> 
>     +
>      #define FILE_DEBUG_FLAG DEBUG_BUFMGR
> 
>      static inline int
>     @@ -181,19 +186,48 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr,
>     uint32_t pitch, uint32_t tiling)
>         return ALIGN(pitch, tile_width);
>      }
> 
>     +/*
>     + * This functions is to find the correct bucket fit for the input size.
>     + * This function works with O(1) complexity when the requested size
>     + * was queried instead of iterating the size through all the buckets.
>     + */
>      static struct bo_cache_bucket *
>      bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)
>      {
>     -   int i;
>     +   struct bo_cache_bucket *bucket = NULL;
>     +   int x=0,index = -1;
>     +   int row, col=0;
> 
>     -   for (i = 0; i < bufmgr->num_buckets; i++) {
>     -      struct bo_cache_bucket *bucket = &bufmgr->cache_bucket[i];
>     -      if (bucket->size >= size) {
>     -         return bucket;
>     -      }
>     +   /* condition for size less  than 4*4096 (4KB) page size */
>     +   if(size < BO_CACHE_PAGE_SIZE){
> 
>  
> 
> This should be "<="
> 
>  
> 
>     +      index =
>     (size>>PAGE_SIZE_SHIFT)+((size%(1<<PAGE_SIZE_SHIFT)?1:0))-1;
> 
>  
> 
> I agree with tapani, that this can easily be an early return.
> 
>  
> 
> I think we can also make this calculation a lot more clear:
> 
>  
> 
> index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;
> 
>  
> 
>         }
>     +   else{
>     +      /* When the size is more than 4*4096, the logic follows a
>     matrix method
>     +       * where the index will be searched using
>     Arithmetico-Geometric progression.
>     +       * So the given size will be divided by 4096 & the index will
>     be traced out.
>     +       */
>     +      x = size>>PAGE_SIZE_SHIFT;
> 
>  
> 
> This rounds down not up like you want.
> 
>  
> 
>     -   return NULL;
>     +      /* Find the row using Geometric Progression. The highest bit
>     set will give
>     +       * the row number. num = a * r^(n-1) where num = size a = 4 r = 2
>     +       */
>     +      row = 31 - __builtin_clz(x>>1);
>     +
>     +     /* Find the column using AP but using the row value
>     +      * calculated using GP.
>     +      */
>     +      col =((x-(1<<(row+1)))/(1<<(row-1)))+1;
>     +      col += (size%(1<<PAGE_SIZE_SHIFT<<(row-1)))?1:0;
>     +
>     +      /* Finding the index value using calculated row and col number */
>     +      index = ((row-1)<<2) + col + 2;
> 
>  
> 
> I think this can probably also be a lot simpler.  How about something
> like this:
> 
>  
> 
> pages = DIV_ROUND_UP(size, PAGE_SIZE);
> 
> /* Steal this from anv_allocator.c */
> 
> pages_log2 = ilog2_round_up(pages);
> 
> row = pages_log2 - 1;
> 
> col = DIV_ROUND_UP(pages, (1 << (pages_log2 - 2)));
> 
> index = row * 4 + col;
> 
>  
> 
>     +   }
>     +
>     +   /* Checking the error condition */
>     +   bucket = (index >= 0 && index <
>     bufmgr->num_buckets)?(&bufmgr->cache_bucket[index]):NULL;
>     +   return bucket;
>      }
> 
>      int
>     --
>     2.7.4
> 
>     _______________________________________________
>     mesa-dev mailing list
>     mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
>     https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
>  
> 
> 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



More information about the mesa-dev mailing list