[Mesa-dev] [PATCH 1/2] gallium: add PIPE_COMPUTE_CAP_SUBGROUP_SIZE

Fri Jun 5 05:22:37 PDT 2015

Giuseppe Bilotta <giuseppe.bilotta at gmail.com> writes:

>> On Thu, May 28, 2015 at 1:04 PM, Grigori Goronzy <greg at chown.ath.cx> wrote:
>>> @@ -286,6 +287,13 @@ ilo_get_compute_param(struct pipe_screen *screen,
>>>        ptr = &val.images_supported;
>>>        size = sizeof(val.images_supported);
>>>        break;
>>> +   case PIPE_COMPUTE_CAP_SUBGROUP_SIZE:
>>> +      /* best case is SIMD32 */
>>> +      val.subgroup_size = 32;
>>> +
>>> +      ptr = &val.subgroup_size;
>>> +      size = sizeof(val.subgroup_size);
>>> +      break;
>>>     default:
>>>        ptr = NULL;
>>>        size = 0;
>>
>> Everything else seems fine to me, but IIRC Intel's IGPs have a SIMD
>> width of 16, not 32. (Or if it depends on generation, we should
>> probably have a lookup function like for r600).
>
> Ok, scratch that. I was confused by the fact that Beignet reports a
> preferred work-group size multiple of 16. Intel IGPs support _logical_
> SIMD width of up to 32, but the _hardware_ SIMD width is just 4. So
> the question is if here we should report the _hardware_ width, or the
> maximum _logical_ width.
>
The physical SIMD width of any Intel GPU that as far as I'm aware ILO
supports is 8, however, the hardware can execute 16- and in some cases
32-wide instructions by splitting them internally into instructions of
the native SIMD width.  There is an actual performance benefit from
this, mainly because it can save some overhead and hide part of the
execution latency when several interdependent instructions are
encountered in sequence (e.g. by doing SIMD16 you typically have the
guarantee that there will be no mutual data dependencies between any
pair of native-width instructions arriving into the pipeline one after
the other, so you may avoid stalls).

As this cap is just a performance hint, I think it makes sense to assume
the best-case scenario as Grigori has done.  If the driver later on
decides it doesn't pay off to use the maximum SIMD width it can always
use less, but using more may be difficult if the application didn't keep
it in mind while choosing the workgroup layout.

That said, it doesn't look like ILO supports SIMD32 at this point, and
the first Intel GPU with any hardware support for it was IVB (Gen7).  I
suggest you just return 16 unconditionally for now but keep the comment
saying that the best case is SIMD32 (on Gen7+).

Thanks.

> For OpenCL, the _logical_ aspect is the only relevant one, but I think
> this should be handled on the OpenCL side of things (since it also
> depends on things such as the vectorization of each specific kernel
> and, for future OpenCL 2.0 support, even on the individual launch
> grid). Here, I think the _hardware_ property should be reported
> instead.
>
> -- 
> Giuseppe "Oblomov" Bilotta
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150605/6f22e806/attachment.sig>