[Beignet] [PATCH 4/5] HSW: enable the surface's cache in HSW.

Thu May 29 18:22:53 PDT 2014

On Fri, May 30, 2014 at 12:37:33AM +0800, Yang Rong wrote:
> HSW's surface cache control is changed, correct it. And also disable
> exec flag for slm. When kernel parse cmd finish, need remove it totally
> 
> Signed-off-by: Yang Rong <rong.r.yang at intel.com>
> ---
>  src/cl_command_queue.c      |  4 +--
>  src/cl_command_queue_gen7.c |  4 +--
>  src/cl_device_id.c          |  2 +-
>  src/cl_driver.h             | 19 +++++++++++++-
>  src/cl_driver_defs.c        |  1 +
>  src/intel/intel_gpgpu.c     | 61 ++++++++++++++++++++++++++++-----------------
>  6 files changed, 62 insertions(+), 29 deletions(-)
> 
>  LOCAL cl_int
> diff --git a/src/cl_device_id.c b/src/cl_device_id.c
> index 018da95..538c88a 100644
> --- a/src/cl_device_id.c
> +++ b/src/cl_device_id.c
> @@ -86,7 +86,7 @@ static struct _cl_device_id intel_hsw_gt2_device = {
>    .max_compute_unit = 140,
>    .max_thread_per_unit = 7,
>    .max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 1024,
Why change max work group size in this patch?

>  static void
>  intel_gpgpu_set_base_address(intel_gpgpu_t *gpgpu)
>  {
> -  const uint32_t def_cc = cc_llc_l3; /* default Cache Control value */
> +  const uint32_t def_cc = cl_gpgpu_get_cache_ctrl(); /* default Cache Control value */
>    BEGIN_BATCH(gpgpu->batch, 10);
>    OUT_BATCH(gpgpu->batch, CMD_STATE_BASE_ADDRESS | 8);
>    /* 0, Gen State Mem Obj CC, Stateless Mem Obj CC, Stateless Access Write Back */
> @@ -233,12 +245,12 @@ intel_gpgpu_set_base_address(intel_gpgpu_t *gpgpu)
>    ADVANCE_BATCH(gpgpu->batch);
>  }
>  
> -uint32_t get_scratch_index_gen7(uint32_t size) {
> +uint32_t intel_gpgpu_get_scratch_index_gen7(uint32_t size) {
>    return size / 1024 - 1;
>  }
>  
> -uint32_t get_scratch_index_gen75(uint32_t size) {
> -    size = size >> 12;
> +uint32_t intel_gpgpu_get_scratch_index_gen75(uint32_t size) {
> +    size = size >> 11;
So this patch also fix the scratch configuration? right? If it is expected, I think you
may need to add related info into the commit log.

> @@ -411,25 +421,29 @@ static void
>  intel_gpgpu_set_L3_gen75(intel_gpgpu_t *gpgpu, uint32_t use_slm)
>  {
>    /* still set L3 in batch buffer for fulsim. */
> -  BEGIN_BATCH(gpgpu->batch, 6);
> +  BEGIN_BATCH(gpgpu->batch, 9);
> +  OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */
> +  OUT_BATCH(gpgpu->batch, GEN7_L3_SQC_REG1_ADDRESS_OFFSET);
> +  OUT_BATCH(gpgpu->batch, 0x00610000);
> +
>    OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */
>    OUT_BATCH(gpgpu->batch, GEN7_L3_CNTL_REG2_ADDRESS_OFFSET);
> +
>    if (use_slm)
> -    OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg1[8]);
> +    OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg1[12]);
can we change to use a specific value here rather than to pick a value from magic array?
Just as baytrail, if the register definition is published on 01.org, a meaningful comment
is also nice to have.

Other part LGTM.