[Beignet] [PATCH 00/21 V3] Add Profiling support in beignet.

junyan.he at inbox.com junyan.he at inbox.com
Mon Nov 16 15:40:02 PST 2015


From: Junyan He <junyan.he at linux.intel.com>

The profiling support is enabled by this patch set.
The profiling information is as following:
-------------------------- Log 0 --------------------------
| fix functions id:   7     simd:   16   kernel id:    0  |
| thread id:          0     EU id:   1   half slice id: 0 |
| dispatch Mask:   1 prolog:       197  epilog:      6699 |
| globalX:   4~   4  globalY:   0~   0  globalZ:   0~   0 |
|  ts0 :        64  | ts1 :         0  | ts2 :       930  |
|  ts3 :         0  | ts4 :      1046  | ts5 :      1170  |
|  ts6 :         0  | ts7 :         0  | ts8 :         0  |
|  ts9 :      1624  | ts10:      1838  | ts11:         0  |
|  ts12:      2032  | ts13:         0  | ts14:      2312  |
|  ts15:      2560  | ts16:         0  | ts17:         0  |
|  ts18:         0  | ts19:      2972  |                  |

Each hw thread will create one such log items.
Prolog is the timestamp when we enter this kernel, while
epilog is the timestamp we finish and leave it.
ts0~ts19 reocord the time offsets from the prolog, but
the base is 0.
We now just record first 20 blocks' timestamp. Later after
we fully support SourceToBinary, we can set profiling point
at any location.

V2:
1. Fix GLOBAL XYZ wrong value.
Some curbe registers such as lid0, lid1 may have already expired
when we reach the bottom block and cause the wrong global values.
2. Fix the problem of wrong device id in profiling info.
3. Fix the pointer size problems on BDW.
The pointers are 8 bytes value and the dri_bo_emit_reloc will
write 8 bytes. The buffer pointers for printf and profiling are
declared as 4 bytes, and so the value next to the pointer in the
curbe will be erased and cause the wrong results.
4. Place the prolog and epilog logic to the head and tail block.
The old version places the prolog at the beginning of the first block
and places the epilog at the last second block, which just before the
return block. These will cause the proflog and epilog within in predication.
But they should be executed unconditionally.
5. Improve the sub and add functions for timestamp calculation.
>From BDW, the native long type is supported, use it to make calculation
more efficient.

V3:
1. Fix the wrong MOD -1 calculation.
2. Add tm0 register helper function.
3. The curbe allocation manner has changed, so we need to set all the curbe
   registers life interval correct before they can be allocated correctly.

Some known issues:
On DBW, some log like this:
------------------------ Log 5      -----------------------
| fix functions id:   7     simd:   16   kernel id:    0  |
| thread id:    0  EU id:   8  sub slice id: 1 slice id 0 |
| dispatch Mask:   1 prolog:     28578  epilog:     15445 |
| globalX:   4~   4  globalY:   0~   0  globalZ:   0~   0 |
|  ts0 :       186  | ts1 :         0  | ts2 :      1504  |
|  ts3 :         0  | ts4 :4294946425  | ts5 :4294946637  |
|  ts6 :         0  | ts7 :         0  | ts8 :         0  |
|  ts9 :4294947235  | ts10:4294947491  | ts11:         0  |
|  ts12:4294947645  | ts13:         0  | ts14:4294947819  |
|  ts15:4294947999  | ts16:         0  | ts17:         0  |
|  ts18:         0  | ts19:4294948561  |                  |

The big huge time stamp is really strange and invalid.
It can just be found when run may cases together, can when
we switch to one case run, we can never duplicate it.
It may have relationship with HW and will not cause any
regressions, so I choose to fix it later.


Signed-off-by: Junyan He <junyan.he at linux.intel.com>
---
backend/src/CMakeLists.txt                         |    3 +
backend/src/backend/gen8_context.cpp               |   21 +
backend/src/backend/gen8_context.hpp               |    2 +
backend/src/backend/gen_context.cpp                |  451 ++++++++++++++++++++
backend/src/backend/gen_context.hpp                |    9 +
.../src/backend/gen_insn_gen7_schedule_info.hxx    |    2 +
backend/src/backend/gen_insn_scheduling.cpp        |    4 +-
backend/src/backend/gen_insn_selection.cpp         |  140 ++++++
backend/src/backend/gen_insn_selection.hpp         |    8 +
backend/src/backend/gen_insn_selection.hxx         |    2 +
backend/src/backend/gen_program.cpp                |    9 +-
backend/src/backend/gen_program.hpp                |    2 +-
backend/src/backend/gen_reg_allocation.cpp         |   47 ++
backend/src/backend/gen_register.hpp               |   19 +
backend/src/backend/program.cpp                    |   35 +-
backend/src/backend/program.h                      |   17 +
backend/src/backend/program.hpp                    |   25 +-
backend/src/gbe_bin_interpreter.cpp                |    4 +
backend/src/ir/instruction.cpp                     |   96 ++++-
backend/src/ir/instruction.hpp                     |   27 +-
backend/src/ir/instruction.hxx                     |    2 +
backend/src/ir/lowering.cpp                        |    7 +
backend/src/ir/profile.cpp                         |   16 +-
backend/src/ir/profile.hpp                         |    8 +-
backend/src/ir/profiling.cpp                       |   74 ++++
backend/src/ir/profiling.hpp                       |  132 ++++++
backend/src/ir/unit.cpp                            |    6 +-
backend/src/ir/unit.hpp                            |   10 +
backend/src/llvm/llvm_gen_backend.cpp              |   48 ++-
backend/src/llvm/llvm_gen_backend.hpp              |    3 +
backend/src/llvm/llvm_gen_ocl_function.hxx         |    5 +
backend/src/llvm/llvm_profiling.cpp                |  211 +++++++++
backend/src/llvm/llvm_to_gen.cpp                   |    6 +-
backend/src/llvm/llvm_to_gen.hpp                   |    3 +-
src/CMakeLists.txt                                 |    1 +
src/cl_command_queue.c                             |    8 +
src/cl_command_queue_gen7.c                        |   37 ++
src/cl_driver.h                                    |   16 +
src/cl_driver_defs.c                               |    5 +
src/cl_gbe_loader.cpp                              |   15 +
src/cl_gbe_loader.h                                |    3 +
src/intel/intel_gpgpu.c                            |   58 +++
src/intel/intel_gpgpu.h                            |    3 +-
43 files changed, 1579 insertions(+), 21 deletions(-)




More information about the Beignet mailing list