[Beignet] [PATCH 00/18] Enable profiling by line number.

Thu Dec 24 03:01:52 PST 2015

From: Junyan He <junyan.he at linux.intel.com>

This patch set will let the user to specify the line numbers in the source
code to insert the profiling watch points.
As the first step, we just use the env var OCL_PROFILING_LINES to control
the kernel name and line numbers. The format is:
KERNEL_NAME:PROFILING_MODE:LINE_NUMBER0,LINE_NUMBER1,LINE_NUMBER2,...
for example:
export OCL_PROFILING_LINES="builtin_atanpi_float8:2:2,6,7,8,15"
will insert watch points at 2 6 7 8 15 lines in the kernel named builtin_atanpi_float8.
We have 3 PROFILING_MODE,
  level 1: just brief timestamp with line number.
               Total log number is 6
	       Line        2:    Timestamp:       190  Thread Exec:6
	       Line        6:    Timestamp:      1174  Thread Exec:6
	       Line        7:    Timestamp:      3092  Thread Exec:6
	       Line        8:    Timestamp:      3105  Thread Exec:6
	       Line       15:    Timestamp:      3241  Thread Exec:6

   level 2: timestamp with source, plus:
   Format: Average Timestamp  Exec number Source
                                |  __kernel void builtin_atanpi_float8(__global float *dst,  __global float *src1, __global int *vector) {
   TS:      190  Num:   6 ----> |    int i = get_global_id(0);
                                |    float8 x1 = (float8) (src1[i * (*vector) + 0],src1[i * (*vector) + 1],src1[i * (*vector) + 2],src1[i * (*vector) + 3],src1[i * (*vector) + 4],src1[i * (*vector) + 5],src1[i * (*vector) + 6],src1[i * (*vector) + 7]);
                                |  
                                |    float8 ret;
   TS:     1174  Num:   6 ----> |    ret = atanpi(x1);
   TS:     3092  Num:   6 ----> |    dst[i * (*vector) + 0] = ret[0];
   TS:     3105  Num:   6 ----> |    dst[i * (*vector) + 1] = ret[1];
                                |    dst[i * (*vector) + 2] = ret[2];
                                |    dst[i * (*vector) + 3] = ret[3];
                                |    dst[i * (*vector) + 4] = ret[4];
                                |    dst[i * (*vector) + 5] = ret[5];
                                |    dst[i * (*vector) + 6] = ret[6];
                                |    dst[i * (*vector) + 7] = ret[7];
   TS:     3241  Num:   6 ----> |  };

   level 3: output the detail logs, add all logs as:
   ------------------------ Log 0      -----------------------
   | fix functions id:   7     simd:   16   kernel id:    0  |
   | thread id:    0  EU id:   8  sub slice id: 0 slice id 0 |
   | dispatch Mask:   1 prolog:      6860  epilog:     19548 |
   | globalX:   3~   3  globalY:   0~   0  globalZ:   0~   0 |
   |  ts0 :       201  | ts1 :      1180  | ts2 :     12417  |
   |  ts3 :     12430  | ts4 :     12637  | ts5 :         0  |
   |  ts6 :         0  | ts7 :         0  | ts8 :         0  |
   |  ts9 :         0  | ts10:         0  | ts11:         0  |
   |  ts12:         0  | ts13:         0  | ts14:         0  |
   |  ts15:         0  | ts16:         0  | ts17:         0  |
   |  ts18:         0  | ts19:         0  |                  |
   ------------------------ Log 1      -----------------------
   | fix functions id:   7     simd:   16   kernel id:    0  |
   | thread id:    0  EU id:   8  sub slice id: 1 slice id 0 |
   | dispatch Mask:   1 prolog:      6877  epilog:     19569 |
   | globalX:   4~   4  globalY:   0~   0  globalZ:   0~   0 |
   |  ts0 :       209  | ts1 :      1190  | ts2 :     12423  |
   |  ts3 :     12436  | ts4 :     12643  | ts5 :         0  |
   |  ts6 :         0  | ts7 :         0  | ts8 :         0  |
   |  ts9 :         0  | ts10:         0  | ts11:         0  |
   |  ts12:         0  | ts13:         0  | ts14:         0  |
   |  ts15:         0  | ts16:         0  | ts17:         0  |
   |  ts18:         0  | ts19:         0  |                  |
   .....
   .....

Some problems:
1. On BDW, the timestamp sometimes gives invalid huge value.
   It may be a HW issue or feature, we need to check it further.
2. Sometimes the line number of instruction is different from the
   source code. This is caused by optimization and we can notice
   and analyse it by Gen IR or ASM. I will send a patch to set
   optimization level later.
3. Some line numbers are missing when there are lots of inline
   function call and macro. I will investigate llvm and clang to
   find a better solution.

I will write a detail doc about the profiling later.

With this parch set, please just ignore the previous two patchsets
about debug info.

Signed-off-by: Junyan He <junyan.he at linux.intel.com>
---
 backend/src/backend/gen_context.cpp               |  20 +-
 backend/src/backend/gen_encoder.cpp               |   9 +-
 backend/src/backend/gen_insn_selection.cpp        |  10 +-
 backend/src/backend/gen_insn_selection_output.cpp |   3 +
 backend/src/backend/gen_program.cpp               |  11 +-
 backend/src/backend/gen_register.hpp              |   4 +-
 backend/src/backend/program.cpp                   |  34 ++--
 backend/src/backend/program.h                     |   3 +-
 backend/src/backend/program.hpp                   |   2 +-
 backend/src/ir/context.cpp                        |   4 +-
 backend/src/ir/function.cpp                       |   6 +-
 backend/src/ir/function.hpp                       |  10 +
 backend/src/ir/instruction.cpp                    |  11 +-
 backend/src/ir/instruction.hpp                    |   3 +
 backend/src/ir/lowering.cpp                       |  15 +-
 backend/src/ir/profiling.cpp                      |  72 +++++++-
 backend/src/ir/profiling.hpp                      |  35 +++-
 backend/src/ir/structurizer.cpp                   |  55 ++++--
 backend/src/ir/structurizer.hpp                   |   6 +-
 backend/src/ir/unit.cpp                           |   9 +-
 backend/src/ir/unit.hpp                           |  12 +-
 backend/src/llvm/ExpandConstantExpr.cpp           |   6 +-
 backend/src/llvm/ExpandLargeIntegers.cpp          | 173 +++++++++--------
 backend/src/llvm/llvm_gen_backend.cpp             |  11 +-
 backend/src/llvm/llvm_gen_backend.hpp             |  11 +-
 backend/src/llvm/llvm_intrinsic_lowering.cpp      |   8 +-
 backend/src/llvm/llvm_passes.cpp                  |  13 +-
 backend/src/llvm/llvm_profiling.cpp               | 215 +++++++++++++++++-----
 backend/src/llvm/llvm_sampler_fix.cpp             |   7 +
 backend/src/llvm/llvm_scalarize.cpp               |   5 +-
 backend/src/llvm/llvm_to_gen.cpp                  |   4 +-
 backend/src/llvm/llvm_to_gen.hpp                  |   2 +-
 src/cl_program.c                                  |   3 +-
 33 files changed, 535 insertions(+), 257 deletions(-)