[Mesa-dev] [PATCH 6/9] i965/fs: Fetch one cacheline of pull constants at a time.

Francisco Jerez currojerez at riseup.net
Tue Dec 13 21:17:14 UTC 2016


Kenneth Graunke <kenneth at whitecape.org> writes:

> On Friday, December 9, 2016 11:03:29 AM PST Francisco Jerez wrote:
>> Asking the DC for less than one cacheline (4 owords) of data for
>> uniform pull constants is suboptimal because the DC cannot request
>> less than that from L3, resulting in wasted bandwidth and unnecessary
>> message dispatch overhead, and exacerbating the IVB L3 serialization
>> bug.  The following table summarizes the overall framerate improvement
>> (with statistical significance of 5% and sample size ~10) from the
>> whole series up to this patch for several benchmarks and hardware
>> generations:
>> 
>>                          | SKL           | BDW          | HSW
>> SynMark2 OglShMapPcf     | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38%
>> GfxBench4 gl_manhattan31 |  5.93% ±0.35% | 3.92% ±0.31% |  6.62% ±0.22%
>> GfxBench4 gl_4           |  2.52% ±0.44% | 1.23% ±0.10% |      N/A
>> Unigine Valley           |  0.83% ±0.17% | 0.23% ±0.05% |  0.74% ±0.45%
>
> I suspect OglShMapPcf gained SIMD16 on Skylake due to reduced register
> pressure, from the lower message lengths on pull loads.  (At least, it
> did when I had a series to fix that.)  That's probably a large portion
> of the performance improvement here, and why it's so much larger for
> that workload on Skylake specifically.  It might be worth mentioning it
> in your commit message here.
>

Yeah, that matches my understanding too.  I'll add some shader-db stats
in order to illustrate the effect of this on register pressure, as you
asked me to do in your previous reply.

> Thanks for all your work on this.  I was originally concerned about the
> Ivybridge bug, but given that we're loading one cacheline at a time, it
> seems very unlikely that we'd ever load the same cacheline twice within
> 16 cycles.  We could if we have IF (non-uniform) load[foo] ELSE load[bar]
> where foo and bar are indirect expressions that happen to be equal.  But
> that seems quite uncommon.
>
> Series is:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

Thanks!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161213/91ce1123/attachment.sig>


More information about the mesa-dev mailing list