[Mesa-dev] [PATCH v2 13/14] radeonsi: Process multiple patches per threadgroup.

Marek Olšák maraeo at gmail.com
Tue May 17 10:42:42 UTC 2016


On Tue, May 17, 2016 at 1:52 AM, Bas Nieuwenhuizen
<bas at basnieuwenhuizen.nl> wrote:
> On Mon, May 16, 2016 at 10:15 PM, Marek Olšák <maraeo at gmail.com> wrote:
>> On Fri, May 13, 2016 at 3:37 AM, Bas Nieuwenhuizen
>> <bas at basnieuwenhuizen.nl> wrote:
>>> Using more than 1 wave per threadgroup does increase performance
>>> generally.  Not using too many patches per threadgroup also
>>> increases performance. Both catalyst and amdgpu-pro seem to
>>> use 40 patches as their maximum, but I haven't really seen
>>> any performance increase from limiting the number of patches
>>> to 40 instead of 64.
>>
>> 40 may be optimal for existing OpenGL apps on some chips.
>>
>> Vulkan doesn't set more than 16.
>>
>> Let's set either 40 or 16 with a comment where the value comes from.
>
> IIRC heaven was more performant with multiple waves per threadgroup,
> which means >16 patches, as it uses 3 CP's per patch. Not sure about
> 40 and I'm away from my dev machine at the moment.

OK. Maybe Vulkan sets more than 16 using external settings not
specified by its code.

>
>>
>>>
>>> Note that the trick where we overlap the input and output LDS
>>> does not work anymore as the insertion of the tess factors
>>> changes the patch stride.
>>
>> I don't understand this. Can you explain it more?
>
> When we didn't have a TCS, we would just use TCS input as TCS output
> and let the fixed function TCS add the per patch outputs (tess
> factors) at the end.
>
> This works fine when you have a single patch, but not with multiple.
> To see why we have to look at the input/output format in LDS. This is
>
> Attributes for patch 0 vertex 0.
> Attributes for patch 0 vertex 1.
> ...
> Per patch attributes for patch 0.
> Attributes for patch 1 vertex 0.
> ...
>
> So the number of per patch attributes changes the stride between
> patches.  As the LS output has 0 per patch attributes, and TCS output
> has at least the tess factors this differs. Therefore the second and
> later patches start at different offset in TCS input and output, so we
> need to copy or move them.
>
> I hope this makes things a bit more clear.

Thanks for the explanation.

Marek


More information about the mesa-dev mailing list