[Piglit] [PATCH 6/7] ARB_pipeline_statistics_query (frag): basic test

Mon Feb 16 11:59:02 PST 2015

On Mon, Feb 16, 2015 at 02:14:41PM -0500, Alex Deucher wrote:
> On Sat, Feb 14, 2015 at 2:21 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> > On Sat, Feb 14, 2015 at 2:11 AM, Ben Widawsky
> > <benjamin.widawsky at intel.com> wrote:
> >> On Sat, Feb 14, 2015 at 02:07:32AM -0500, Ilia Mirkin wrote:
> >>> On Sat, Feb 14, 2015 at 1:54 AM, Ben Widawsky
> >>> <benjamin.widawsky at intel.com> wrote:
> >>> > +static struct query queries[] = {
> >>> > +       {
> >>> > +        .query = GL_FRAGMENT_SHADER_INVOCATIONS_ARB,
> >>> > +        .name = "GL_FRAGMENT_SHADER_INVOCATIONS_ARB",
> >>> > +        .min = TEST_WIDTH * TEST_HEIGHT / 2,
> >>> > +        .max = 0xffffffff},
> >>> > +       /* XXX:
> >>> > +        * Intel hardware has some very unpredictable results for fragment
> >>> > +        * shader invocations. After a day of head scratching, I've given up.
> >>> > +        * Generating a real min, or max is not possible. The spec allows this.
> >>> > +        * This will also help variance across vendors.
> >>> > +        */
> >>>
> >>> Is there a working theory as to how this could be less than width *
> >>> height? Does it count 1 per quad? (Or how it could be much more than
> >>> width * height... I can see edges getting processed unnecessarily,
> >>> but... max_int seems high.)
> >>
> >> No working theory on min, but I figured if we're going to fudge the max, we may
> >> as well fudge the min. What would you like as a max? I can show you hardware
> >> which generates way more invocations than anything I can contrive. 1440
> >> invocations for an 8x8.
> >
> > Impressive :)
> >
> > Best I can do is suggest that I don't think you're counting what you
> > think you're counting. This has probably occurred to you, but you
> > really should triple-check that you're reading (and writing) from the
> > right place for this counter.
> >
> 
> I echo this sentiment.  You might also check if there are any
> additional state bits related to counts.  For example, IIRC, on radeon
> hw there is some additional state you need to set to get accurate
> counts for occlusion queries.
> 
> Alex

Triple check? I'm way past triple. I've had some other people look into it as
well, so I am not the only one confused.

Alex, FYI, there was some follow-up discussion on IRC which probably should have
been in the commit message in the first place. Haswell works exactly as
expected. For example, a 4x4 rectangle of 2 triangles generates 3 2x2 subspans
per triangle, for a total of 6 subspans, or 24 pixels. For all powers of two
squares tested, Haswell works exactly as expected.

Perhaps not a coincidence, but the HW that counts this stuff changed for IVB,
and then again for HSW. So for some time, the working theory was, we just don't
know how to count pre-HSW (in particular, IVB). No big deal. However, it turns
out Gen8 seems to behave in exactly the same manner as IVB. I've yet to try
Gen9, but there is definitely no errata I can find that I haven't already
implemented.

Further confusion which I didn't mention - very large triangles generate a PS
invocation count that is about 1/4 the total number of pixels. I forget the
exact count, but a 256x256 square was something like 10,000 pixels.

I think we all agree there is no point in holding up the series for this, right?

-- 
Ben Widawsky, Intel Open Source Technology Center