[Piglit] [PATCH 6/7] ARB_pipeline_statistics_query (frag): basic test

Mon Feb 16 12:39:40 PST 2015

On Mon, Feb 16, 2015 at 2:59 PM, Ben Widawsky
<benjamin.widawsky at intel.com> wrote:
> On Mon, Feb 16, 2015 at 02:14:41PM -0500, Alex Deucher wrote:
>> On Sat, Feb 14, 2015 at 2:21 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> > On Sat, Feb 14, 2015 at 2:11 AM, Ben Widawsky
>> > <benjamin.widawsky at intel.com> wrote:
>> >> On Sat, Feb 14, 2015 at 02:07:32AM -0500, Ilia Mirkin wrote:
>> >>> On Sat, Feb 14, 2015 at 1:54 AM, Ben Widawsky
>> >>> <benjamin.widawsky at intel.com> wrote:
>> >>> > +static struct query queries[] = {
>> >>> > +       {
>> >>> > +        .query = GL_FRAGMENT_SHADER_INVOCATIONS_ARB,
>> >>> > +        .name = "GL_FRAGMENT_SHADER_INVOCATIONS_ARB",
>> >>> > +        .min = TEST_WIDTH * TEST_HEIGHT / 2,
>> >>> > +        .max = 0xffffffff},
>> >>> > +       /* XXX:
>> >>> > +        * Intel hardware has some very unpredictable results for fragment
>> >>> > +        * shader invocations. After a day of head scratching, I've given up.
>> >>> > +        * Generating a real min, or max is not possible. The spec allows this.
>> >>> > +        * This will also help variance across vendors.
>> >>> > +        */
>> >>>
>> >>> Is there a working theory as to how this could be less than width *
>> >>> height? Does it count 1 per quad? (Or how it could be much more than
>> >>> width * height... I can see edges getting processed unnecessarily,
>> >>> but... max_int seems high.)
>> >>
>> >> No working theory on min, but I figured if we're going to fudge the max, we may
>> >> as well fudge the min. What would you like as a max? I can show you hardware
>> >> which generates way more invocations than anything I can contrive. 1440
>> >> invocations for an 8x8.
>> >
>> > Impressive :)
>> >
>> > Best I can do is suggest that I don't think you're counting what you
>> > think you're counting. This has probably occurred to you, but you
>> > really should triple-check that you're reading (and writing) from the
>> > right place for this counter.
>> >
>>
>> I echo this sentiment.  You might also check if there are any
>> additional state bits related to counts.  For example, IIRC, on radeon
>> hw there is some additional state you need to set to get accurate
>> counts for occlusion queries.
>>
>> Alex
>
> Triple check? I'm way past triple. I've had some other people look into it as
> well, so I am not the only one confused.
>
> Alex, FYI, there was some follow-up discussion on IRC which probably should have
> been in the commit message in the first place. Haswell works exactly as
> expected. For example, a 4x4 rectangle of 2 triangles generates 3 2x2 subspans
> per triangle, for a total of 6 subspans, or 24 pixels. For all powers of two
> squares tested, Haswell works exactly as expected.
>
> Perhaps not a coincidence, but the HW that counts this stuff changed for IVB,
> and then again for HSW. So for some time, the working theory was, we just don't
> know how to count pre-HSW (in particular, IVB). No big deal. However, it turns
> out Gen8 seems to behave in exactly the same manner as IVB. I've yet to try
> Gen9, but there is definitely no errata I can find that I haven't already
> implemented.
>
> Further confusion which I didn't mention - very large triangles generate a PS
> invocation count that is about 1/4 the total number of pixels. I forget the
> exact count, but a 256x256 square was something like 10,000 pixels.
>
> I think we all agree there is no point in holding up the series for this, right?

No objections from me.  I was just throwing out possible ideas to
explain the behavior you were seeing, but it sounds like you've pretty
well trodden that path at this point.

Alex