[Mesa-dev] [Mesa-stable] [PATCH 2/3] i965: Set subslice_total on Haswell.
Jordan Justen
jordan.l.justen at intel.com
Fri Jun 10 18:55:33 UTC 2016
On 2016-06-10 11:44:01, Kenneth Graunke wrote:
> On Thursday, June 9, 2016 1:34:15 PM PDT Francisco Jerez wrote:
> > Kenneth Graunke <kenneth at whitecape.org> writes:
> >
> > > We'll use this for compute shader thread counts shortly.
> > >
> > > Cc: "12.0" <mesa-stable at lists.freedesktop.org>
> > > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > > ---
> > > src/mesa/drivers/dri/i965/intel_screen.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > I'm not sure whether I want to commit this or not...there still seem to
> > > be some issues on Haswell. I think this is right, but maybe there are
> > > just other bugs.
> > >
> > Yeah, I believe the formula below should work for the time being until
> > the kernel is fixed to support the right get-params on Gen7. I wonder
> > though what should we do on IVB? AFAIK IVB GT2 had two subslices rather
> > than one, but if you simply multiply the current max_cs_threads value by
> > the number of subslices you'll go over the total thread count of the
> > GPU. The current max_cs_threads value for IVB GT2 seems bogus AFAICT,
> > it's higher than the thread count per subslice (48?) but lower than the
> > total thread count (96). I wonder if barriers are broken on IVB right
> > now for large enough workgroup size.
>
> I think the Configurations[IVB] > Device Attributes[IVB] page has
> incorrect information about Ivybridge GT2. It claims there are 12 EUs
> and 8 Threads/EU. However, check the "Configurations Overview" page.
> It claims that IVB GT2 has 16 total EUs. The simulator also indicates
> that IVB GT2 has 2 subslices (half slices), 8 EUs per half slice, and
> 8 Threads/EU. This gives us 8 * 8 = 64, which is the value we use now.
>
> Ugh. This wouldn't be the first time the documentation's been wrong
> in this area. I've come to trust the simulator more.
I did test on an IVB GT2 plenty during development. The barrier test
with a local size of 1024 has always worked well with SIMD16 on it,
but failed hard on bdw (which only has 56 per subslice). (As expected,
Broadwell worked well up until 896 on SIMD16.)
-Jordan
More information about the mesa-dev
mailing list