[Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

Wed Oct 21 11:23:16 PDT 2015

On Tue, Oct 20, 2015 at 02:48:41PM -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 2:41 PM, Ben Widawsky <ben at bwidawsk.net> wrote:
> > On Tue, Oct 20, 2015 at 11:56:15AM +0200, Neil Roberts wrote:
> >> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> >> message types because I found by experimentation that it doesn't work.
> >> I wrote in the comment that I couldn't find any documentation for this
> >> problem. However I've now found the documentation and it has
> >> additional restrictions on further message types so this patch updates
> >> the comment and adds the others.
> >> ---
> >>
> >> That paragraph in the spec also mentions further restrictions that we
> >> should probably worry about like that the shader shouldn't combine
> >> this optimisation with any other render target data port read/writes.
> >>
> >> It also has a fairly pessimistic note saying the optimisation is only
> >> really good for large polygons in a GUI-like workload. I wonder
> >> whether we should be doing some more benchmarking to decide whether
> >> it's really a good idea to enable this as a general optimisation even
> >> for games.
> >
> > I remember seeing this before, but I cannot find it now. All I am seeing
> > regarding performance implications are the bits about requiring a header, and
> > writing to the same pixel from multiple threads. The latter one I assume is only
> > going to happen with MSAA?
> 
> No, I don't think so. As I understand it, the EUs can be executing
> fragment shaders for multiple primitives at the same time, and those
> primitives might overlap. The c in sendc means that it does some extra
> tracking to ensure that the render target writes land in the correct
> order.
> 
> Presumably by using sendc to texture directly to the render target, it
> adds some extra synchronization (before the texturing is done... or
> something?) that especially hurts when there's a lot of overlapping
> primitives (as in the case of lots of small primitives).

Ah, Neil pointed me to the blurb. Putting this here to remind myself... I think
a cheap way to measure things is to turn the sendc into a send. Things will
probably render wrong, but it should eliminate the bottleneck. If we can see
measurable perf difference with send it certainly would indicate we need to
spend time optimizing the optimization.