[Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

Fri Mar 10 02:56:02 UTC 2017

Some typo. Sorry for it.
I have modified it.

yan.wang

From: yan.wang
Date: 2017-03-10 10:52
To: ruiling.song; beignet
Subject: Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.
It comes from darktable perforamnce tuning.
For float type, maxVecSize is 4, so maxLimit = 4 * 8 = 32.
I am not sure the reason of maxLimit = maxVecSize * 8.
32 is too small for searching and could not find more available load after leading load.
It will improve eaw_decompose kernel of darktable from 2.1876s to 1.8855s because reduce send from 3 send (2 float, 2 float, 1 float) to 2 send (4 float, 1 float).
There is another issue when compling eaw_decompose kernel and I will submit another patch for it.
At least need set one low bound for maxLimit like 150 to avoid searching range too small.

yan.wang

From: Song, Ruiling
Date: 2017-03-10 10:39
To: yan.wang at linux.intel.com; beignet at lists.freedesktop.org
Subject: Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of
> yan.wang at linux.intel.com
> Sent: Thursday, March 9, 2017 5:41 PM
> To: beignet at lists.freedesktop.org
> Cc: Yan Wang <yan.wang at linux.intel.com>
> Subject: [Beignet] [PATCH v2] Provide more possible candidate of load/store as
> possible.
> 
> From: Yan Wang <yan.wang at linux.intel.com>
> 
> Avoid searching range too small in some case like vector of float.
> It will lead more load/store merged for improving perforamnce.
> 
> Signed-off-by: Yan Wang <yan.wang at linux.intel.com>
> ---
>  backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> b/backend/src/llvm/llvm_loadstore_optimization.cpp
> index e797e98..e569a8e 100644
> --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> @@ -180,7 +180,7 @@ namespace gbe {
>      BasicBlock::iterator J = start;
>      ++J;
> 
> -    unsigned maxLimit = maxVecSize * 8;
> +    unsigned maxLimit = std::max(maxVecSize * 8, 150u);

Could you give some performance number against some known benchmarks?
Please select some complex enough OpenCL kernel. Maybe luxmark? Darktable?
How it would benefit the runtime performance and how much it would hurt the compile-time performance?
So we could know whether the change is reasonable.

Thanks!
Ruiling
>      bool reordered = false;
> 
>      for(unsigned ss = 0; J != E && ss <= maxLimit; ++ss, ++J) {
> --
> 2.7.4
> 
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
_______________________________________________
Beignet mailing list
Beignet at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/beignet/attachments/20170310/ae60907e/attachment-0001.html>