[Beignet] [PATCH 4/5] HSW: enable the surface's cache in HSW.

Yang, Rong R rong.r.yang at intel.com
Wed Jun 4 01:55:18 PDT 2014



-----Original Message-----
From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com] 
Sent: Wednesday, June 04, 2014 10:53 AM
To: Yang, Rong R
Cc: beignet at lists.freedesktop.org
Subject: Re: [Beignet] [PATCH 4/5] HSW: enable the surface's cache in HSW.

On Wed, Jun 04, 2014 at 02:45:04AM +0000, Yang, Rong R wrote:
> 
> 
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Friday, May 30, 2014 9:23 AM
> To: Yang, Rong R
> Cc: beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 4/5] HSW: enable the surface's cache in HSW.
> 
> 
> On Fri, May 30, 2014 at 12:37:33AM +0800, Yang Rong wrote:
> > HSW's surface cache control is changed, correct it. And also disable 
> > exec flag for slm. When kernel parse cmd finish, need remove it 
> > totally
> > 
> > Signed-off-by: Yang Rong <rong.r.yang at intel.com>
> > ---
> >  src/cl_command_queue.c      |  4 +--
> >  src/cl_command_queue_gen7.c |  4 +--
> >  src/cl_device_id.c          |  2 +-
> >  src/cl_driver.h             | 19 +++++++++++++-
> >  src/cl_driver_defs.c        |  1 +
> >  src/intel/intel_gpgpu.c     | 61 ++++++++++++++++++++++++++++-----------------
> >  6 files changed, 62 insertions(+), 29 deletions(-)
> > 
> >  LOCAL cl_int
> > diff --git a/src/cl_device_id.c b/src/cl_device_id.c index 
> > 018da95..538c88a 100644
> > --- a/src/cl_device_id.c
> > +++ b/src/cl_device_id.c
> > @@ -86,7 +86,7 @@ static struct _cl_device_id intel_hsw_gt2_device = {
> >    .max_compute_unit = 140,
> >    .max_thread_per_unit = 7,
> >    .max_work_item_sizes = {512, 512, 512},
> > -  .max_work_group_size = 512,
> > +  .max_work_group_size = 1024,
> Why change max work group size in this patch?
no response for this comment. may be missed.

>>>>For the max_work_group_size of HSW GT2, need more discuss, will send a new version when have conclusion.

> 
> >  static void
> >  intel_gpgpu_set_base_address(intel_gpgpu_t *gpgpu)  {
> > -  const uint32_t def_cc = cc_llc_l3; /* default Cache Control value 
> > */
> > +  const uint32_t def_cc = cl_gpgpu_get_cache_ctrl(); /* default 
> > + Cache Control value */
> >    BEGIN_BATCH(gpgpu->batch, 10);
> >    OUT_BATCH(gpgpu->batch, CMD_STATE_BASE_ADDRESS | 8);
> >    /* 0, Gen State Mem Obj CC, Stateless Mem Obj CC, Stateless 
> > Access Write Back */ @@ -233,12 +245,12 @@ intel_gpgpu_set_base_address(intel_gpgpu_t *gpgpu)
> >    ADVANCE_BATCH(gpgpu->batch);
> >  }
> >  
> > -uint32_t get_scratch_index_gen7(uint32_t size) {
> > +uint32_t intel_gpgpu_get_scratch_index_gen7(uint32_t size) {
> >    return size / 1024 - 1;
> >  }
> >  
> > -uint32_t get_scratch_index_gen75(uint32_t size) {
> > -    size = size >> 12;
> > +uint32_t intel_gpgpu_get_scratch_index_gen75(uint32_t size) {
> > +    size = size >> 11;
> So this patch also fix the scratch configuration? right? If it is expected, I think you may need to add related info into the commit log.
> 
> >>> Yes, it is a bug when calc scratch configuration. Forget to add the commit log, I will send a new version.
ok.

> 
> 
> > @@ -411,25 +421,29 @@ static void
> >  intel_gpgpu_set_L3_gen75(intel_gpgpu_t *gpgpu, uint32_t use_slm)  {
> >    /* still set L3 in batch buffer for fulsim. */
> > -  BEGIN_BATCH(gpgpu->batch, 6);
> > +  BEGIN_BATCH(gpgpu->batch, 9);
> > +  OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 
> > + */  OUT_BATCH(gpgpu->batch, GEN7_L3_SQC_REG1_ADDRESS_OFFSET); 
> > + OUT_BATCH(gpgpu->batch, 0x00610000);
> > +
> >    OUT_BATCH(gpgpu->batch, CMD_LOAD_REGISTER_IMM | 1); /* length - 2 */
> >    OUT_BATCH(gpgpu->batch, GEN7_L3_CNTL_REG2_ADDRESS_OFFSET);
> > +
> >    if (use_slm)
> > -    OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg1[8]);
> > +    OUT_BATCH(gpgpu->batch, gpgpu_l3_config_reg1[12]);
> can we change to use a specific value here rather than to pick a value from magic array?
> Just as baytrail, if the register definition is published on 01.org, a meaningful comment is also nice to have.
> 
> >>>>> There are several recommend L3 allocation in PRM, so there is a magic array, and IVB and HSW's configure should be same. Do you mean only keep one config value for IVB and HSW? Difference configure may affect performance.
The point here is that the magic array is very hard to understand, let's use a meaningful way if possible.
Don't need to change any configuration value, just make it more readable.

> 
> Other part LGTM.
> 


More information about the Beignet mailing list