[Intel-gfx] [PATCH v2 9/9] drm/i915: Add render decompression support

Jason Ekstrand jason at jlekstrand.net
Wed Jan 11 22:29:37 UTC 2017


On Wed, Jan 11, 2017 at 1:49 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:

> On Tue, Jan 10, 2017 at 9:04 AM, Ville Syrjälä <
> ville.syrjala at linux.intel.com> wrote:
>
>> On Mon, Jan 09, 2017 at 11:20:57AM -0800, Jason Ekstrand wrote:
>> > On Thu, Jan 5, 2017 at 7:14 AM, <ville.syrjala at linux.intel.com> wrote:
>> >
>> > > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
>> > >
>> > > SKL+ display engine can scan out certain kinds of compressed surfaces
>> > > produced by the render engine. This involved telling the display
>> engine
>> > > the location of the color control surfae (CCS) which describes
>> > > which parts of the main surface are compressed and which are not. The
>> > > location of CCS is provided by userspace as just another plane with
>> its
>> > > own offset.
>> > >
>> > > Add the required stuff to validate the user provided AUX plane
>> metadata
>> > > and convert the user provided linear offset into something the
>> hardware
>> > > can consume.
>> > >
>> > > Due to hardware limitations we require that the main surface and
>> > > the AUX surface (CCS) be part of the same bo. The hardware also
>> > > makes life hard by not allowing you to provide separate x/y offsets
>> > > for the main and AUX surfaces (excpet with NV12), so finding suitable
>> > > offsets for both requires a bit of work. Assuming we still want keep
>> > > playing tricks with the offsets. I've just gone with a dumb "search
>> > > backward for suitable offsets" approach, which is far from optimal,
>> > > but it works.
>> > >
>> > > Also not all planes will be capable of scanning out compressed
>> surfaces,
>> > > and eg. 90/270 degree rotation is not supported in combination with
>> > > decompression either.
>> > >
>> > > This patch may contain work from at least the following people:
>> > > * Vandana Kannan <vandana.kannan at intel.com>
>> > > * Daniel Vetter <daniel at ffwll.ch>
>> > > * Ben Widawsky <ben at bwidawsk.net>
>> > >
>> > > v2: Deal with display workarounds 0390, 0531, 1125 (Paulo)
>> > >
>> > > Cc: Paulo Zanoni <paulo.r.zanoni at intel.com>
>> > > Cc: Vandana Kannan <vandana.kannan at intel.com>
>> > > Cc: Daniel Vetter <daniel at ffwll.ch>
>> > > Cc: Ben Widawsky <ben at bwidawsk.net>
>> > > Cc: Jason Ekstrand <jason at jlekstrand.net>
>> > > Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
>> > > ---
>> > >  drivers/gpu/drm/i915/i915_reg.h      |  23 ++++
>> > >  drivers/gpu/drm/i915/intel_display.c | 234
>> ++++++++++++++++++++++++++++++
>> > > ++---
>> > >  drivers/gpu/drm/i915/intel_pm.c      |  29 ++++-
>> > >  drivers/gpu/drm/i915/intel_sprite.c  |   5 +
>> > >  4 files changed, 274 insertions(+), 17 deletions(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/i915/i915_reg.h
>> b/drivers/gpu/drm/i915/i915_
>> > > reg.h
>> > > index 00970aa77afa..6849ba93f1d9 100644
>> > > --- a/drivers/gpu/drm/i915/i915_reg.h
>> > > +++ b/drivers/gpu/drm/i915/i915_reg.h
>> > > @@ -6209,6 +6209,28 @@ enum {
>> > >                         _ID(id, _PS_ECC_STAT_1A, _PS_ECC_STAT_2A),   \
>> > >                         _ID(id, _PS_ECC_STAT_1B, _PS_ECC_STAT_2B))
>> > >
>> > > +#define PLANE_AUX_DIST_1_A             0x701c0
>> > > +#define PLANE_AUX_DIST_2_A             0x702c0
>> > > +#define PLANE_AUX_DIST_1_B             0x711c0
>> > > +#define PLANE_AUX_DIST_2_B             0x712c0
>> > > +#define _PLANE_AUX_DIST_1(pipe) \
>> > > +                       _PIPE(pipe, PLANE_AUX_DIST_1_A,
>> PLANE_AUX_DIST_1_B)
>> > > +#define _PLANE_AUX_DIST_2(pipe) \
>> > > +                       _PIPE(pipe, PLANE_AUX_DIST_2_A,
>> PLANE_AUX_DIST_2_B)
>> > > +#define PLANE_AUX_DIST(pipe, plane)     \
>> > > +       _MMIO_PLANE(plane, _PLANE_AUX_DIST_1(pipe),
>> > > _PLANE_AUX_DIST_2(pipe))
>> > > +
>> > > +#define PLANE_AUX_OFFSET_1_A           0x701c4
>> > > +#define PLANE_AUX_OFFSET_2_A           0x702c4
>> > > +#define PLANE_AUX_OFFSET_1_B           0x711c4
>> > > +#define PLANE_AUX_OFFSET_2_B           0x712c4
>> > > +#define _PLANE_AUX_OFFSET_1(pipe)       \
>> > > +               _PIPE(pipe, PLANE_AUX_OFFSET_1_A,
>> PLANE_AUX_OFFSET_1_B)
>> > > +#define _PLANE_AUX_OFFSET_2(pipe)       \
>> > > +               _PIPE(pipe, PLANE_AUX_OFFSET_2_A,
>> PLANE_AUX_OFFSET_2_B)
>> > > +#define PLANE_AUX_OFFSET(pipe, plane)   \
>> > > +       _MMIO_PLANE(plane, _PLANE_AUX_OFFSET_1(pipe),
>> > > _PLANE_AUX_OFFSET_2(pipe))
>> > > +
>> > >  /* legacy palette */
>> > >  #define _LGC_PALETTE_A           0x4a000
>> > >  #define _LGC_PALETTE_B           0x4a800
>> > > @@ -6433,6 +6455,7 @@ enum {
>> > >  # define CHICKEN3_DGMG_DONE_FIX_DISABLE                (1 << 2)
>> > >
>> > >  #define CHICKEN_PAR1_1         _MMIO(0x42080)
>> > > +#define  SKL_RC_HASH_OUTSIDE   (1 << 15)
>> > >  #define  DPA_MASK_VBLANK_SRD   (1 << 15)
>> > >  #define  FORCE_ARB_IDLE_PLANES (1 << 14)
>> > >  #define  SKL_EDP_PSR_FIX_RDWRAP        (1 << 3)
>> > > diff --git a/drivers/gpu/drm/i915/intel_display.c
>> > > b/drivers/gpu/drm/i915/intel_display.c
>> > > index 38de9df0ec60..2236abebd8bc 100644
>> > > --- a/drivers/gpu/drm/i915/intel_display.c
>> > > +++ b/drivers/gpu/drm/i915/intel_display.c
>> > > @@ -2064,11 +2064,19 @@ intel_tile_width_bytes(const struct
>> > > drm_framebuffer *fb, int plane)
>> > >                         return 128;
>> > >                 else
>> > >                         return 512;
>> > > +       case I915_FORMAT_MOD_Y_TILED_CCS:
>> > > +               if (plane == 1)
>> > > +                       return 64;
>> > > +               /* fall through */
>> > >         case I915_FORMAT_MOD_Y_TILED:
>> > >                 if (IS_GEN2(dev_priv) ||
>> HAS_128_BYTE_Y_TILING(dev_priv))
>> > >                         return 128;
>> > >                 else
>> > >                         return 512;
>> > > +       case I915_FORMAT_MOD_Yf_TILED_CCS:
>> > > +               if (plane == 1)
>> > > +                       return 64;
>> > >
>> >
>> > I still think a CCS tile is 128B wide. :-)
>>
>> The spec kinda suggests the same. But I still couldn't figure out where
>> that notion really came from, so I just went with the value that gave
>> me the expected result on my screen. That is writing 64 bytes into the
>> tile is exactly what's required to fill a single row/column, writing
>> more would wrap.
>>
>
> Yeah.  Ultimately it doesn't matter.  It's an arbitrary number userspace
> multiplies by and the kernel divides by.  We just need to settle on
> something sensible.  In userspace, we've more-or-less settled on 128 to
> match the docs (and what the hardware does for W-tiling and some other edge
> cases) but it's not a huge deal.
>
>
>> >
>> >
>> > > +               /* fall through */
>> > >         case I915_FORMAT_MOD_Yf_TILED:
>> > >                 /*
>> > >                  * Bspec seems to suggest that the Yf tile width would
>> > > @@ -2156,7 +2164,7 @@ static unsigned int intel_surf_alignment(const
>> > > struct drm_framebuffer *fb,
>> > >         struct drm_i915_private *dev_priv = to_i915(fb->dev);
>> > >
>> > >         /* AUX_DIST needs only 4K alignment */
>> > > -       if (fb->format->format == DRM_FORMAT_NV12 && plane == 1)
>> > > +       if (plane == 1)
>> > >                 return 4096;
>> > >
>> > >         switch (fb->modifier) {
>> > > @@ -2166,6 +2174,8 @@ static unsigned int intel_surf_alignment(const
>> > > struct drm_framebuffer *fb,
>> > >                 if (INTEL_GEN(dev_priv) >= 9)
>> > >                         return 256 * 1024;
>> > >                 return 0;
>> > > +       case I915_FORMAT_MOD_Y_TILED_CCS:
>> > > +       case I915_FORMAT_MOD_Yf_TILED_CCS:
>> > >         case I915_FORMAT_MOD_Y_TILED:
>> > >         case I915_FORMAT_MOD_Yf_TILED:
>> > >                 return 1 * 1024 * 1024;
>> > > @@ -2472,6 +2482,7 @@ static unsigned int
>> intel_fb_modifier_to_tiling(uint64_t
>> > > fb_modifier)
>> > >         case I915_FORMAT_MOD_X_TILED:
>> > >                 return I915_TILING_X;
>> > >         case I915_FORMAT_MOD_Y_TILED:
>> > > +       case I915_FORMAT_MOD_Y_TILED_CCS:
>> > >                 return I915_TILING_Y;
>> > >         default:
>> > >                 return I915_TILING_NONE;
>> > > @@ -2536,6 +2547,35 @@ intel_fill_fb_info(struct drm_i915_private
>> > > *dev_priv,
>> > >
>> > >                 intel_fb_offset_to_xy(&x, &y, fb, i);
>> > >
>> > > +               if ((fb->modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
>> > > +                    fb->modifier == I915_FORMAT_MOD_Yf_TILED_CCS) &&
>> i ==
>> > > 1) {
>> > > +                       int main_x, main_y;
>> > > +                       int ccs_x, ccs_y;
>> > > +
>> > > +                       /*
>> > > +                        * Each byte of CCS corresponds to a 16x8
>> area of
>> > > the main surface, and
>> > > +                        * each CCS tile is 64x64 bytes.
>> > > +                        */
>> > > +                       ccs_x = (x * 16) % (64 * 16);
>> > > +                       ccs_y = (y * 8) % (64 * 8);
>> > >
>> >
>> > This makes me nervous.  Why are we multiplying CCS coordinates by
>> something
>> > before we do the modulus?  Why aren't coordinates in both surfaces in
>> > pixels?
>>
>> For converting the linear offset (which is in bytes) into x/y we just
>> consider each pixel to be 1 byte. Hence to get the corresponding pixel
>> coordinates we multiply the byte based coordinates by 16x8. We can't
>> really deal with <1 byte pixels in most places.
>>
>
> So are the x/y offsets provided by userspace in bytes or pixels for
> regular color surfaces?  I had kind-of assumed pixels, but I guess I could
> also see bytes.
>
>
>> > So long as you keep things in pixesl and know that a CCS tile is
>> > 1024x512px and a color tile is 32x32 pixels, you can safely do tile
>> > offsetting and it all makes sense.  Having different units looks like a
>> > recipe for some very confusing bugs.  Am I just completely
>> misunderstanding
>> > what's going on here?
>>
>> Doing things in pixels would involve totally custom code for the CCS.
>> By thinking of CCS as having 1 byte pixels we can share the code already
>> used for everything else (apart from this one special check which is
>> really only necessary because the HW ignores the AUX x/y offsets for CCS.
>>
>> I suppose it would be possible to rewrite a bunch of other things to
>> allow <1 byte pixels but I couldn't be bothered to go there.
>>
>
> I think I need a better mental model of what the X/Y offset API looks like
> before I can reply to that properly.
>

I had a very useful chat with Kristian on IRC today and he explained that
the offset in drm_mode_fb_cmd2 is just to get you to the upper-left pixel
of the image and the actual X/Y offset used for scanout is provided in
pixels when you do SetCrtc.  This seems completely sane and, at that point,
I really don't care how you do things internally so long as the results are
correct.  Consider my comments here dropped.  I didn't realize I was
arguing over an implementation detail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20170111/f6e68367/attachment-0001.html>


More information about the Intel-gfx mailing list