[Intel-gfx] [PATCH] tests/kms_ccs: Fix the color/ccs surface generation

Fri Aug 4 09:59:55 UTC 2017

Hi Jason,

On 4 August 2017 at 01:52, Jason Ekstrand <jason at jlekstrand.net> wrote:
> Previously, the test used the old 64x64 convention that Ville introduced
> for CCS tiles and not the current 128x32 Y-tile convention.  Also, the
> original scheme for generating the CCS data was over-complicated and
> didn't work correctly because it assumed you could cut the main surface
> at an arbitrary Y coordinate.  While you clearly *can* do this (the
> hardware does), it's not a good idea for a generator in a test.  The new
> scheme, introduced here, is entirely based on the relationship between
> cache-lines in the main surface and the CCS that's documented in the
> PRM.  By keeping everything CCS cache-line aligned, our chances of
> generating correct data for an arbitrary-size surface are much higher.

Thanks for fixing this!

> +                * We need to cut the surface on a CCS cache-line boundary,
> +                * otherwise, we're going to be in trouble when we try to
> +                * generate CCS data for the surface.  A cache line in the
> +                * CCS is 16x16 cache-line-pairs in the main surface.  16
> +                * cache lines is 64 rows high.
> +                */
> +               half_height = ALIGN(height, 128) / 2;
> +               half_size = half_height * stride;
> +               for (i = 0; i < fb->size / 4; i++) {
> +                       if (i < half_size / 4)
> +                               ptr[i] = RED;
> +                       else
> +                               ptr[i] = COMPRESSED_RED;

I toyed with:
else if (!(i & 0xe))
        ptr[i] = COMPRESSED_RED;
else
        ptr[i] = BLACK;

... to make the failure mode even more obvious. But that still writes
in (far) more compressed-red pixels than we strictly need for CCS,
right? Anyway, follow-up patch.

> +static unsigned int
> +y_tile_y_pos(unsigned int offset, unsigned int stride)
>  {
> -       return ptr +
> -               ((y & ~0x3f) * stride) +
> -               ((x & ~0x7) * 64) +
> -               ((y & 0x3f) * 8) +
> -               (x & 7);
> +       unsigned int y_tiles, y;
> +       y_tiles = (offset / 4096) / (stride / 128);
> +       y = y_tiles * 32 + ((offset & 0x1f0) >> 4);
> +       return y;
>  }
>
> @@ -143,12 +168,26 @@ static void display_fb(data_t *data, int compressed)
>         size[0] = f.pitches[0] * ALIGN(height, 32);
>
>         if (compressed) {
> -               width = ALIGN(f.width, 16) / 16;
> -               height = ALIGN(f.height, 8) / 8;
> -               f.pitches[1] = ALIGN(width * 1, 64);
> +               /* From the Sky Lake PRM, Vol 12, "Color Control Surface":
> +                *
> +                *    "The compression state of the cache-line pair is
> +                *    specified by 2 bits in the CCS.  Each CCS cache-line
> +                *    represents an area on the main surface of 16x16 sets
> +                *    of 128 byte Y-tiled cache-line-pairs. CCS is always Y
> +                *    tiled."
> +                *
> +                * A "cache-line-pair" for a Y-tiled surface is two
> +                * horizontally adjacent cache lines.  When operating in
> +                * bytes and rows, this gives us a scale-down factor of
> +                * 32x16.  Since the main surface has a 32-bit format, we
> +                * need to multiply width by 4 to get bytes.
> +                */

Yeah, this code and comment match better (& helped clarify) my
understanding of how it works. But given my understanding was mostly
gleaned from other, borderline incomprehensible, comments, as well as
manually poking bits and observing the result, maybe that's not a
ringing endorsement.

> +               width = ALIGN(f.width * 4, 32) / 32;
> +               height = ALIGN(f.height, 16) / 16;
> +               f.pitches[1] = ALIGN(width * 1, 128);
>                 f.modifier[1] = modifier;
>                 f.offsets[1] = size[0];
> -               size[1] = f.pitches[1] * ALIGN(height, 64);
> +               size[1] = f.pitches[1] * ALIGN(height, 32);

I changed this to f.height rather than height, because otherwise the
kernel was rejecting the aux buffer for being too small.

With that, it now passes on SKL + APL for me, so I've pushed with my
review. Thanks!

Cheers,
Daniel