[Piglit] [PATCH 6/7] ARB_pipeline_statistics_query (frag): basic test
Alex Deucher
alexdeucher at gmail.com
Mon Feb 16 12:39:40 PST 2015
On Mon, Feb 16, 2015 at 2:59 PM, Ben Widawsky
<benjamin.widawsky at intel.com> wrote:
> On Mon, Feb 16, 2015 at 02:14:41PM -0500, Alex Deucher wrote:
>> On Sat, Feb 14, 2015 at 2:21 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>> > On Sat, Feb 14, 2015 at 2:11 AM, Ben Widawsky
>> > <benjamin.widawsky at intel.com> wrote:
>> >> On Sat, Feb 14, 2015 at 02:07:32AM -0500, Ilia Mirkin wrote:
>> >>> On Sat, Feb 14, 2015 at 1:54 AM, Ben Widawsky
>> >>> <benjamin.widawsky at intel.com> wrote:
>> >>> > +static struct query queries[] = {
>> >>> > + {
>> >>> > + .query = GL_FRAGMENT_SHADER_INVOCATIONS_ARB,
>> >>> > + .name = "GL_FRAGMENT_SHADER_INVOCATIONS_ARB",
>> >>> > + .min = TEST_WIDTH * TEST_HEIGHT / 2,
>> >>> > + .max = 0xffffffff},
>> >>> > + /* XXX:
>> >>> > + * Intel hardware has some very unpredictable results for fragment
>> >>> > + * shader invocations. After a day of head scratching, I've given up.
>> >>> > + * Generating a real min, or max is not possible. The spec allows this.
>> >>> > + * This will also help variance across vendors.
>> >>> > + */
>> >>>
>> >>> Is there a working theory as to how this could be less than width *
>> >>> height? Does it count 1 per quad? (Or how it could be much more than
>> >>> width * height... I can see edges getting processed unnecessarily,
>> >>> but... max_int seems high.)
>> >>
>> >> No working theory on min, but I figured if we're going to fudge the max, we may
>> >> as well fudge the min. What would you like as a max? I can show you hardware
>> >> which generates way more invocations than anything I can contrive. 1440
>> >> invocations for an 8x8.
>> >
>> > Impressive :)
>> >
>> > Best I can do is suggest that I don't think you're counting what you
>> > think you're counting. This has probably occurred to you, but you
>> > really should triple-check that you're reading (and writing) from the
>> > right place for this counter.
>> >
>>
>> I echo this sentiment. You might also check if there are any
>> additional state bits related to counts. For example, IIRC, on radeon
>> hw there is some additional state you need to set to get accurate
>> counts for occlusion queries.
>>
>> Alex
>
> Triple check? I'm way past triple. I've had some other people look into it as
> well, so I am not the only one confused.
>
> Alex, FYI, there was some follow-up discussion on IRC which probably should have
> been in the commit message in the first place. Haswell works exactly as
> expected. For example, a 4x4 rectangle of 2 triangles generates 3 2x2 subspans
> per triangle, for a total of 6 subspans, or 24 pixels. For all powers of two
> squares tested, Haswell works exactly as expected.
>
> Perhaps not a coincidence, but the HW that counts this stuff changed for IVB,
> and then again for HSW. So for some time, the working theory was, we just don't
> know how to count pre-HSW (in particular, IVB). No big deal. However, it turns
> out Gen8 seems to behave in exactly the same manner as IVB. I've yet to try
> Gen9, but there is definitely no errata I can find that I haven't already
> implemented.
>
> Further confusion which I didn't mention - very large triangles generate a PS
> invocation count that is about 1/4 the total number of pixels. I forget the
> exact count, but a 256x256 square was something like 10,000 pixels.
>
> I think we all agree there is no point in holding up the series for this, right?
No objections from me. I was just throwing out possible ideas to
explain the behavior you were seeing, but it sounds like you've pretty
well trodden that path at this point.
Alex
More information about the Piglit
mailing list