[Intel-gfx] [PATCH] tests/kms_ccs: Fix the color/ccs surface generation

Fri Aug 4 14:56:11 UTC 2017

On August 4, 2017 2:59:56 AM Daniel Stone <daniel at fooishbar.org> wrote:

> Hi Jason,
>
> On 4 August 2017 at 01:52, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> Previously, the test used the old 64x64 convention that Ville introduced
>> for CCS tiles and not the current 128x32 Y-tile convention.  Also, the
>> original scheme for generating the CCS data was over-complicated and
>> didn't work correctly because it assumed you could cut the main surface
>> at an arbitrary Y coordinate.  While you clearly *can* do this (the
>> hardware does), it's not a good idea for a generator in a test.  The new
>> scheme, introduced here, is entirely based on the relationship between
>> cache-lines in the main surface and the CCS that's documented in the
>> PRM.  By keeping everything CCS cache-line aligned, our chances of
>> generating correct data for an arbitrary-size surface are much higher.
>
> Thanks for fixing this!
>
>> +                * We need to cut the surface on a CCS cache-line boundary,
>> +                * otherwise, we're going to be in trouble when we try to
>> +                * generate CCS data for the surface.  A cache line in the
>> +                * CCS is 16x16 cache-line-pairs in the main surface.  16
>> +                * cache lines is 64 rows high.
>> +                */
>> +               half_height = ALIGN(height, 128) / 2;
>> +               half_size = half_height * stride;
>> +               for (i = 0; i < fb->size / 4; i++) {
>> +                       if (i < half_size / 4)
>> +                               ptr[i] = RED;
>> +                       else
>> +                               ptr[i] = COMPRESSED_RED;
>
> I toyed with:
> else if (!(i & 0xe))
>         ptr[i] = COMPRESSED_RED;
> else
>         ptr[i] = BLACK;
>
> ... to make the failure mode even more obvious. But that still writes
> in (far) more compressed-red pixels than we strictly need for CCS,
> right? Anyway, follow-up patch.

Yeah, we can probably do quite a bit of patterning so long as we keep the 
compressed/uncompressed split simple and make sure our pattern works in 
whole cache lines.

>> +static unsigned int
>> +y_tile_y_pos(unsigned int offset, unsigned int stride)
>>  {
>> -       return ptr +
>> -               ((y & ~0x3f) * stride) +
>> -               ((x & ~0x7) * 64) +
>> -               ((y & 0x3f) * 8) +
>> -               (x & 7);
>> +       unsigned int y_tiles, y;
>> +       y_tiles = (offset / 4096) / (stride / 128);
>> +       y = y_tiles * 32 + ((offset & 0x1f0) >> 4);
>> +       return y;
>>  }
>>
>> @@ -143,12 +168,26 @@ static void display_fb(data_t *data, int compressed)
>>         size[0] = f.pitches[0] * ALIGN(height, 32);
>>
>>         if (compressed) {
>> -               width = ALIGN(f.width, 16) / 16;
>> -               height = ALIGN(f.height, 8) / 8;
>> -               f.pitches[1] = ALIGN(width * 1, 64);
>> +               /* From the Sky Lake PRM, Vol 12, "Color Control Surface":
>> +                *
>> +                *    "The compression state of the cache-line pair is
>> +                *    specified by 2 bits in the CCS.  Each CCS cache-line
>> +                *    represents an area on the main surface of 16x16 sets
>> +                *    of 128 byte Y-tiled cache-line-pairs. CCS is always Y
>> +                *    tiled."
>> +                *
>> +                * A "cache-line-pair" for a Y-tiled surface is two
>> +                * horizontally adjacent cache lines.  When operating in
>> +                * bytes and rows, this gives us a scale-down factor of
>> +                * 32x16.  Since the main surface has a 32-bit format, we
>> +                * need to multiply width by 4 to get bytes.
>> +                */
>
> Yeah, this code and comment match better (& helped clarify) my
> understanding of how it works. But given my understanding was mostly
> gleaned from other, borderline incomprehensible, comments, as well as
> manually poking bits and observing the result, maybe that's not a
> ringing endorsement.
>
>> +               width = ALIGN(f.width * 4, 32) / 32;
>> +               height = ALIGN(f.height, 16) / 16;
>> +               f.pitches[1] = ALIGN(width * 1, 128);
>>                 f.modifier[1] = modifier;
>>                 f.offsets[1] = size[0];
>> -               size[1] = f.pitches[1] * ALIGN(height, 64);
>> +               size[1] = f.pitches[1] * ALIGN(height, 32);
>
> I changed this to f.height rather than height, because otherwise the
> kernel was rejecting the aux buffer for being too small.

Congratulations, you found a bug in the kernel branch you're running.  The 
downsized height is definitely what we want and it works fine with my 
kernel branch.

> With that, it now passes on SKL + APL for me, so I've pushed with my
> review. Thanks!
>
> Cheers,
> Daniel