[Beignet] [PATCH 00/18] Enable profiling by line number.

He Junyan junyan.he at inbox.com
Wed Jan 27 21:18:56 PST 2016


Ping for review

On Thu, Dec 24, 2015 at 07:01:52PM +0800, junyan.he at inbox.com wrote:
> Date: Thu, 24 Dec 2015 19:01:52 +0800
> From: junyan.he at inbox.com
> To: beignet at lists.freedesktop.org
> Subject: [PATCH 00/18] Enable profiling by line number.
> X-Mailer: git-send-email 1.7.9.5
> 
> From: Junyan He <junyan.he at linux.intel.com>
> 
> This patch set will let the user to specify the line numbers in the source
> code to insert the profiling watch points.
> As the first step, we just use the env var OCL_PROFILING_LINES to control
> the kernel name and line numbers. The format is:
> KERNEL_NAME:PROFILING_MODE:LINE_NUMBER0,LINE_NUMBER1,LINE_NUMBER2,...
> for example:
> export OCL_PROFILING_LINES="builtin_atanpi_float8:2:2,6,7,8,15"
> will insert watch points at 2 6 7 8 15 lines in the kernel named builtin_atanpi_float8.
> We have 3 PROFILING_MODE,
>   level 1: just brief timestamp with line number.
>                Total log number is 6
> 	       Line        2:    Timestamp:       190  Thread Exec:6
> 	       Line        6:    Timestamp:      1174  Thread Exec:6
> 	       Line        7:    Timestamp:      3092  Thread Exec:6
> 	       Line        8:    Timestamp:      3105  Thread Exec:6
> 	       Line       15:    Timestamp:      3241  Thread Exec:6
> 
>    level 2: timestamp with source, plus:
>    Format: Average Timestamp  Exec number Source
>                                 |  __kernel void builtin_atanpi_float8(__global float *dst,  __global float *src1, __global int *vector) {
>    TS:      190  Num:   6 ----> |    int i = get_global_id(0);
>                                 |    float8 x1 = (float8) (src1[i * (*vector) + 0],src1[i * (*vector) + 1],src1[i * (*vector) + 2],src1[i * (*vector) + 3],src1[i * (*vector) + 4],src1[i * (*vector) + 5],src1[i * (*vector) + 6],src1[i * (*vector) + 7]);
>                                 |  
>                                 |    float8 ret;
>    TS:     1174  Num:   6 ----> |    ret = atanpi(x1);
>    TS:     3092  Num:   6 ----> |    dst[i * (*vector) + 0] = ret[0];
>    TS:     3105  Num:   6 ----> |    dst[i * (*vector) + 1] = ret[1];
>                                 |    dst[i * (*vector) + 2] = ret[2];
>                                 |    dst[i * (*vector) + 3] = ret[3];
>                                 |    dst[i * (*vector) + 4] = ret[4];
>                                 |    dst[i * (*vector) + 5] = ret[5];
>                                 |    dst[i * (*vector) + 6] = ret[6];
>                                 |    dst[i * (*vector) + 7] = ret[7];
>    TS:     3241  Num:   6 ----> |  };
> 
> 
>    level 3: output the detail logs, add all logs as:
>    ------------------------ Log 0      -----------------------
>    | fix functions id:   7     simd:   16   kernel id:    0  |
>    | thread id:    0  EU id:   8  sub slice id: 0 slice id 0 |
>    | dispatch Mask:   1 prolog:      6860  epilog:     19548 |
>    | globalX:   3~   3  globalY:   0~   0  globalZ:   0~   0 |
>    |  ts0 :       201  | ts1 :      1180  | ts2 :     12417  |
>    |  ts3 :     12430  | ts4 :     12637  | ts5 :         0  |
>    |  ts6 :         0  | ts7 :         0  | ts8 :         0  |
>    |  ts9 :         0  | ts10:         0  | ts11:         0  |
>    |  ts12:         0  | ts13:         0  | ts14:         0  |
>    |  ts15:         0  | ts16:         0  | ts17:         0  |
>    |  ts18:         0  | ts19:         0  |                  |
>    ------------------------ Log 1      -----------------------
>    | fix functions id:   7     simd:   16   kernel id:    0  |
>    | thread id:    0  EU id:   8  sub slice id: 1 slice id 0 |
>    | dispatch Mask:   1 prolog:      6877  epilog:     19569 |
>    | globalX:   4~   4  globalY:   0~   0  globalZ:   0~   0 |
>    |  ts0 :       209  | ts1 :      1190  | ts2 :     12423  |
>    |  ts3 :     12436  | ts4 :     12643  | ts5 :         0  |
>    |  ts6 :         0  | ts7 :         0  | ts8 :         0  |
>    |  ts9 :         0  | ts10:         0  | ts11:         0  |
>    |  ts12:         0  | ts13:         0  | ts14:         0  |
>    |  ts15:         0  | ts16:         0  | ts17:         0  |
>    |  ts18:         0  | ts19:         0  |                  |
>    .....
>    .....
> 
> 
> 
> Some problems:
> 1. On BDW, the timestamp sometimes gives invalid huge value.
>    It may be a HW issue or feature, we need to check it further.
> 2. Sometimes the line number of instruction is different from the
>    source code. This is caused by optimization and we can notice
>    and analyse it by Gen IR or ASM. I will send a patch to set
>    optimization level later.
> 3. Some line numbers are missing when there are lots of inline
>    function call and macro. I will investigate llvm and clang to
>    find a better solution.
> 
> I will write a detail doc about the profiling later.
> 
> With this parch set, please just ignore the previous two patchsets
> about debug info.
> 
> 
> Signed-off-by: Junyan He <junyan.he at linux.intel.com>
> ---
>  backend/src/backend/gen_context.cpp               |  20 +-
>  backend/src/backend/gen_encoder.cpp               |   9 +-
>  backend/src/backend/gen_insn_selection.cpp        |  10 +-
>  backend/src/backend/gen_insn_selection_output.cpp |   3 +
>  backend/src/backend/gen_program.cpp               |  11 +-
>  backend/src/backend/gen_register.hpp              |   4 +-
>  backend/src/backend/program.cpp                   |  34 ++--
>  backend/src/backend/program.h                     |   3 +-
>  backend/src/backend/program.hpp                   |   2 +-
>  backend/src/ir/context.cpp                        |   4 +-
>  backend/src/ir/function.cpp                       |   6 +-
>  backend/src/ir/function.hpp                       |  10 +
>  backend/src/ir/instruction.cpp                    |  11 +-
>  backend/src/ir/instruction.hpp                    |   3 +
>  backend/src/ir/lowering.cpp                       |  15 +-
>  backend/src/ir/profiling.cpp                      |  72 +++++++-
>  backend/src/ir/profiling.hpp                      |  35 +++-
>  backend/src/ir/structurizer.cpp                   |  55 ++++--
>  backend/src/ir/structurizer.hpp                   |   6 +-
>  backend/src/ir/unit.cpp                           |   9 +-
>  backend/src/ir/unit.hpp                           |  12 +-
>  backend/src/llvm/ExpandConstantExpr.cpp           |   6 +-
>  backend/src/llvm/ExpandLargeIntegers.cpp          | 173 +++++++++--------
>  backend/src/llvm/llvm_gen_backend.cpp             |  11 +-
>  backend/src/llvm/llvm_gen_backend.hpp             |  11 +-
>  backend/src/llvm/llvm_intrinsic_lowering.cpp      |   8 +-
>  backend/src/llvm/llvm_passes.cpp                  |  13 +-
>  backend/src/llvm/llvm_profiling.cpp               | 215 +++++++++++++++++-----
>  backend/src/llvm/llvm_sampler_fix.cpp             |   7 +
>  backend/src/llvm/llvm_scalarize.cpp               |   5 +-
>  backend/src/llvm/llvm_to_gen.cpp                  |   4 +-
>  backend/src/llvm/llvm_to_gen.hpp                  |   2 +-
>  src/cl_program.c                                  |   3 +-
>  33 files changed, 535 insertions(+), 257 deletions(-)




More information about the Beignet mailing list