[Beignet] [PATCH 00/18] Enable profiling by line number.
junyan.he at inbox.com
junyan.he at inbox.com
Thu Dec 24 03:01:52 PST 2015
From: Junyan He <junyan.he at linux.intel.com>
This patch set will let the user to specify the line numbers in the source
code to insert the profiling watch points.
As the first step, we just use the env var OCL_PROFILING_LINES to control
the kernel name and line numbers. The format is:
KERNEL_NAME:PROFILING_MODE:LINE_NUMBER0,LINE_NUMBER1,LINE_NUMBER2,...
for example:
export OCL_PROFILING_LINES="builtin_atanpi_float8:2:2,6,7,8,15"
will insert watch points at 2 6 7 8 15 lines in the kernel named builtin_atanpi_float8.
We have 3 PROFILING_MODE,
level 1: just brief timestamp with line number.
Total log number is 6
Line 2: Timestamp: 190 Thread Exec:6
Line 6: Timestamp: 1174 Thread Exec:6
Line 7: Timestamp: 3092 Thread Exec:6
Line 8: Timestamp: 3105 Thread Exec:6
Line 15: Timestamp: 3241 Thread Exec:6
level 2: timestamp with source, plus:
Format: Average Timestamp Exec number Source
| __kernel void builtin_atanpi_float8(__global float *dst, __global float *src1, __global int *vector) {
TS: 190 Num: 6 ----> | int i = get_global_id(0);
| float8 x1 = (float8) (src1[i * (*vector) + 0],src1[i * (*vector) + 1],src1[i * (*vector) + 2],src1[i * (*vector) + 3],src1[i * (*vector) + 4],src1[i * (*vector) + 5],src1[i * (*vector) + 6],src1[i * (*vector) + 7]);
|
| float8 ret;
TS: 1174 Num: 6 ----> | ret = atanpi(x1);
TS: 3092 Num: 6 ----> | dst[i * (*vector) + 0] = ret[0];
TS: 3105 Num: 6 ----> | dst[i * (*vector) + 1] = ret[1];
| dst[i * (*vector) + 2] = ret[2];
| dst[i * (*vector) + 3] = ret[3];
| dst[i * (*vector) + 4] = ret[4];
| dst[i * (*vector) + 5] = ret[5];
| dst[i * (*vector) + 6] = ret[6];
| dst[i * (*vector) + 7] = ret[7];
TS: 3241 Num: 6 ----> | };
level 3: output the detail logs, add all logs as:
------------------------ Log 0 -----------------------
| fix functions id: 7 simd: 16 kernel id: 0 |
| thread id: 0 EU id: 8 sub slice id: 0 slice id 0 |
| dispatch Mask: 1 prolog: 6860 epilog: 19548 |
| globalX: 3~ 3 globalY: 0~ 0 globalZ: 0~ 0 |
| ts0 : 201 | ts1 : 1180 | ts2 : 12417 |
| ts3 : 12430 | ts4 : 12637 | ts5 : 0 |
| ts6 : 0 | ts7 : 0 | ts8 : 0 |
| ts9 : 0 | ts10: 0 | ts11: 0 |
| ts12: 0 | ts13: 0 | ts14: 0 |
| ts15: 0 | ts16: 0 | ts17: 0 |
| ts18: 0 | ts19: 0 | |
------------------------ Log 1 -----------------------
| fix functions id: 7 simd: 16 kernel id: 0 |
| thread id: 0 EU id: 8 sub slice id: 1 slice id 0 |
| dispatch Mask: 1 prolog: 6877 epilog: 19569 |
| globalX: 4~ 4 globalY: 0~ 0 globalZ: 0~ 0 |
| ts0 : 209 | ts1 : 1190 | ts2 : 12423 |
| ts3 : 12436 | ts4 : 12643 | ts5 : 0 |
| ts6 : 0 | ts7 : 0 | ts8 : 0 |
| ts9 : 0 | ts10: 0 | ts11: 0 |
| ts12: 0 | ts13: 0 | ts14: 0 |
| ts15: 0 | ts16: 0 | ts17: 0 |
| ts18: 0 | ts19: 0 | |
.....
.....
Some problems:
1. On BDW, the timestamp sometimes gives invalid huge value.
It may be a HW issue or feature, we need to check it further.
2. Sometimes the line number of instruction is different from the
source code. This is caused by optimization and we can notice
and analyse it by Gen IR or ASM. I will send a patch to set
optimization level later.
3. Some line numbers are missing when there are lots of inline
function call and macro. I will investigate llvm and clang to
find a better solution.
I will write a detail doc about the profiling later.
With this parch set, please just ignore the previous two patchsets
about debug info.
Signed-off-by: Junyan He <junyan.he at linux.intel.com>
---
backend/src/backend/gen_context.cpp | 20 +-
backend/src/backend/gen_encoder.cpp | 9 +-
backend/src/backend/gen_insn_selection.cpp | 10 +-
backend/src/backend/gen_insn_selection_output.cpp | 3 +
backend/src/backend/gen_program.cpp | 11 +-
backend/src/backend/gen_register.hpp | 4 +-
backend/src/backend/program.cpp | 34 ++--
backend/src/backend/program.h | 3 +-
backend/src/backend/program.hpp | 2 +-
backend/src/ir/context.cpp | 4 +-
backend/src/ir/function.cpp | 6 +-
backend/src/ir/function.hpp | 10 +
backend/src/ir/instruction.cpp | 11 +-
backend/src/ir/instruction.hpp | 3 +
backend/src/ir/lowering.cpp | 15 +-
backend/src/ir/profiling.cpp | 72 +++++++-
backend/src/ir/profiling.hpp | 35 +++-
backend/src/ir/structurizer.cpp | 55 ++++--
backend/src/ir/structurizer.hpp | 6 +-
backend/src/ir/unit.cpp | 9 +-
backend/src/ir/unit.hpp | 12 +-
backend/src/llvm/ExpandConstantExpr.cpp | 6 +-
backend/src/llvm/ExpandLargeIntegers.cpp | 173 +++++++++--------
backend/src/llvm/llvm_gen_backend.cpp | 11 +-
backend/src/llvm/llvm_gen_backend.hpp | 11 +-
backend/src/llvm/llvm_intrinsic_lowering.cpp | 8 +-
backend/src/llvm/llvm_passes.cpp | 13 +-
backend/src/llvm/llvm_profiling.cpp | 215 +++++++++++++++++-----
backend/src/llvm/llvm_sampler_fix.cpp | 7 +
backend/src/llvm/llvm_scalarize.cpp | 5 +-
backend/src/llvm/llvm_to_gen.cpp | 4 +-
backend/src/llvm/llvm_to_gen.hpp | 2 +-
src/cl_program.c | 3 +-
33 files changed, 535 insertions(+), 257 deletions(-)
More information about the Beignet
mailing list