[Mesa-dev] [PATCH] i965 : optimized bucket index calculation

Tue Sep 12 09:40:22 UTC 2017

Hi Jason,

On the asserts you’ve mentioned below, I assume we need to add them after ‘bufmgr->num_buckets++’ in add_bucket() as num_buckets could be 0 initially. Another clarification on ~1%, we meant approx. 1% there, that’s an improvement we saw in 3Dmark total not a degradation, we’ll correct it in commit msg.

Rest all review comments from you, Tapani and Emil are noted & implemented, we are working on running it through mesa CI/CTS and we should see a v2 for review after that.

Regards,
Yogesh.

From: mesa-dev [mailto:mesa-dev-bounces at lists.freedesktop.org] On Behalf Of Jason Ekstrand
Sent: Friday, September 8, 2017 9:09 PM
To: Muthukumar, Aravindan <aravindan.muthukumar at intel.com>
Cc: mesa-dev at lists.freedesktop.org; J Karanje, Kedar <kedar.j.karanje at intel.com>
Subject: Re: [Mesa-dev] [PATCH] i965 : optimized bucket index calculation

In general, I'm very concerned about how this handles rounding behavior.  Almost everywhere, you round down when what you want to do is round up.  Also, as I said on IRC, I'd like to see some asserts in add_bucket so that we are sure this calculation is correct.  In particular, I'd like to see

assert(bucket_for_size(size) == &bufmgr->cache_bucket[i]);
assert(bucket_for_size(size - 2048) == &bufmgr->cache_bucket[i]);
assert(bucket_for_size(size + 1) != &bufmgr->cache_bucket[i]);

We need to check on both sides of size to be 100% sure we're doing our rounding correctly.

On Fri, Sep 8, 2017 at 1:11 AM, <aravindan.muthukumar at intel.com<mailto:aravindan.muthukumar at intel.com>> wrote:
From: Aravindan Muthukumar <aravindan.muthukumar at intel.com<mailto:aravindan.muthukumar at intel.com>>

Avoiding the loop which was running with O(n) complexity.
Now the complexity has been reduced to O(1)

Tested with piglit.
Slight performance improvement (~1%) in 3d mark.

Which 3dmark test?  Also, what's the error in that 1%?

Change-Id: Id099f1cd24ad5b691a69070eda79b8f4e9be39a6
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar at intel.com<mailto:aravindan.muthukumar at intel.com>>
Signed-off-by: Kedar Karanje <kedar.j.karanje at intel.com<mailto:kedar.j.karanje at intel.com>>
Reviewed-by: Yogesh Marathe <yogesh.marathe at intel.com<mailto:yogesh.marathe at intel.com>>
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c | 48 +++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index 5b4e784..18cb166 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -87,6 +87,11 @@

 #define memclear(s) memset(&s, 0, sizeof(s))

+/* Macros for BO cache size */
+#define CACHE_PAGE_SIZE    4096

Just call this PAGE_SIZE

+#define PAGE_SIZE_SHIFT    12
+#define BO_CACHE_PAGE_SIZE (4 * CACHE_PAGE_SIZE)

I think I'd rather we just use 4 * PAGE_SIZE explicitly than have this extra #define.  I think it's making things harder to read and not easier.

+
 #define FILE_DEBUG_FLAG DEBUG_BUFMGR

 static inline int
@@ -181,19 +186,48 @@ bo_tile_pitch(struct brw_bufmgr *bufmgr, uint32_t pitch, uint32_t tiling)
    return ALIGN(pitch, tile_width);
 }

+/*
+ * This functions is to find the correct bucket fit for the input size.
+ * This function works with O(1) complexity when the requested size
+ * was queried instead of iterating the size through all the buckets.
+ */
 static struct bo_cache_bucket *
 bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)
 {
-   int i;
+   struct bo_cache_bucket *bucket = NULL;
+   int x=0,index = -1;
+   int row, col=0;

-   for (i = 0; i < bufmgr->num_buckets; i++) {
-      struct bo_cache_bucket *bucket = &bufmgr->cache_bucket[i];
-      if (bucket->size >= size) {
-         return bucket;
-      }
+   /* condition for size less  than 4*4096 (4KB) page size */
+   if(size < BO_CACHE_PAGE_SIZE){

This should be "<="

+      index = (size>>PAGE_SIZE_SHIFT)+((size%(1<<PAGE_SIZE_SHIFT)?1:0))-1;

I agree with tapani, that this can easily be an early return.

I think we can also make this calculation a lot more clear:

index = DIV_ROUND_UP(size, PAGE_SIZE) - 1;

    }
+   else{
+      /* When the size is more than 4*4096, the logic follows a matrix method
+       * where the index will be searched using Arithmetico-Geometric progression.
+       * So the given size will be divided by 4096 & the index will be traced out.
+       */
+      x = size>>PAGE_SIZE_SHIFT;

This rounds down not up like you want.

-   return NULL;
+      /* Find the row using Geometric Progression. The highest bit set will give
+       * the row number. num = a * r^(n-1) where num = size a = 4 r = 2
+       */
+      row = 31 - __builtin_clz(x>>1);
+
+     /* Find the column using AP but using the row value
+      * calculated using GP.
+      */
+      col =((x-(1<<(row+1)))/(1<<(row-1)))+1;
+      col += (size%(1<<PAGE_SIZE_SHIFT<<(row-1)))?1:0;
+
+      /* Finding the index value using calculated row and col number */
+      index = ((row-1)<<2) + col + 2;

I think this can probably also be a lot simpler.  How about something like this:

pages = DIV_ROUND_UP(size, PAGE_SIZE);
/* Steal this from anv_allocator.c */
pages_log2 = ilog2_round_up(pages);
row = pages_log2 - 1;
col = DIV_ROUND_UP(pages, (1 << (pages_log2 - 2)));
index = row * 4 + col;

+   }
+
+   /* Checking the error condition */
+   bucket = (index >= 0 && index < bufmgr->num_buckets)?(&bufmgr->cache_bucket[index]):NULL;
+   return bucket;
 }

 int
--
2.7.4

_______________________________________________
mesa-dev mailing list
mesa-dev at lists.freedesktop.org<mailto:mesa-dev at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170912/67e6f2d0/attachment-0001.html>