<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED FIXED - clpeak OpenCL benchmark hangs during compilation on Clover RadeonSI"
href="https://bugs.freedesktop.org/show_bug.cgi?id=96897#c14">Comment # 14</a>
on <a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED FIXED - clpeak OpenCL benchmark hangs during compilation on Clover RadeonSI"
href="https://bugs.freedesktop.org/show_bug.cgi?id=96897">bug 96897</a>
from <span class="vcard"><a class="email" href="mailto:Dieter@nuetzel-hh.de" title="Dieter Nützel <Dieter@nuetzel-hh.de>"> <span class="fn">Dieter Nützel</span></a>
</span></b>
<pre>(In reply to Jan Vesely from <a href="show_bug.cgi?id=96897#c13">comment #13</a>)
<span class="quote">> Initial support for cl_khr_fp16 builtins has been added to libclc in r332677.
> It should be enough to run clpeak.
> clpeak still takes few mins to compile the kernels (~7mins on my carrizo
> laptop)</span >
GREAT work Jan!
After 3 min and ~12 sec float start crunching on my X3470 Xeon
(only one core would be used for kernel compile => 3.6 GHz turbo mode)
My desktop was frozen during float 'Global memory bandwidth (GBPS)' compute
and partly frozen during 'Double-precision compute (GFLOPS)'.
Whole benchmark finished after 6 min and 17 secs.
/home/dieter> time clpeak
Platform: Clover
Device: Radeon RX 580 Series (POLARIS10, DRM 3.23.0,
4.16.9-1.g4f45b1e-default, LLVM 7.0.0)
Driver version : 18.2.0-devel (Linux x64)
Compute units : 36
Clock frequency : 1411 MHz
Global memory bandwidth (GBPS)
float : 2.64
float2 : 2.64
float4 : 2.64
float8 : 2.54
float16 : 1.45
Single-precision compute (GFLOPS)
float : 6341.87
float2 : 6131.34
float4 : 6105.61
float8 : 5933.91
float16 : 5939.44
half-precision compute (GFLOPS)
half : 6307.47
half2 : 6193.25
half4 : 6114.34
half8 : 5729.57
half16 : 6047.90
Double-precision compute (GFLOPS)
double : 404.52
double2 : 404.41
double4 : 404.06
double8 : 403.08
double16 : 401.53
Integer compute (GIOPS)
int : 1222.75
int2 : 1213.90
int4 : 1210.72
int8 : 1208.57
int16 : 1213.99
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 8.78
enqueueReadBuffer : 4.86
enqueueMapBuffer(for read) : 4871.79
memcpy from mapped ptr : 4.94
enqueueUnmap(after write) : 3528.56
memcpy to mapped ptr : 4.94
Kernel launch latency : 293.57 us
206.285u 3.765s 6:17.14 55.6% 0+0k 0+0io 0pf+0w
For reference AMD 17.40
/home/dieter> time clpeak
Platform: AMD Accelerated Parallel Processing
Device: Ellesmere
Driver version : 2482.3 (Linux x64)
Compute units : 36
Clock frequency : 1411 MHz
Global memory bandwidth (GBPS)
float : 202.59
float2 : 209.30
float4 : 209.63
float8 : 162.15
float16 : 138.41
Single-precision compute (GFLOPS)
float : 6342.71
float2 : 6374.96
float4 : 6178.29
float8 : 5973.53
float16 : 6018.79
half-precision compute (GFLOPS)
half : 6306.97
half2 : 6366.06
half4 : 6350.41
half8 : 6154.31
half16 : 6280.47
Double-precision compute (GFLOPS)
double : 404.64
double2 : 404.38
double4 : 398.54
double8 : 403.25
double16 : 401.53
Integer compute (GIOPS)
int : 1206.77
int2 : 1221.26
int4 : 1225.83
int8 : 1225.88
int16 : 1227.35
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 9.03
enqueueReadBuffer : 5.08
enqueueMapBuffer(for read) : 149130.81
memcpy from mapped ptr : 5.09
enqueueUnmap(after write) : 75882.81
memcpy to mapped ptr : 5.08
Kernel launch latency : 93.33 us
23.056u 1.592s 1:08.29 36.0% 0+0k 0+0io 0pf+0w</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>