[Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)

Ben Widawsky ben at bwidawsk.net
Sat Dec 31 21:05:25 UTC 2016

On 16-12-29 17:34:19, Ben Widawsky wrote:
>On 16-12-06 13:34:02, Paulo Zanoni wrote:
>>2016-12-01 20:09 GMT-02:00 Ben Widawsky <benjamin.widawsky at intel.com>:
>>>From: Ben Widawsky <ben at bwidawsk.net>
>>>This patch series ultimately adds support within the i965 driver for
>>>Renderbuffer Decompression with GBM. In short, this feature reduces memory
>>>bandwidth by allowing the GPU to work with losslessly compressed data and having
>>>that compression scheme understood by the display engine for decompression. The
>>>display engine will decompress on the fly and scanout the image.
>>>Quoting from the final patch, the bandwidth savings on a SKL GT4 with a 19x10
>>>display running kmscube:
>>>Without compression:
>>>    Read bandwidth: 603.91 MiB/s
>>>    Write bandwidth: 615.28 MiB/s
>>>With compression:
>>>    Read bandwidth: 259.34 MiB/s
>>>    Write bandwidth: 337.83 MiB/s
>>>The hardware achieves this savings by maintaining an auxiliary buffer
>>>containing "opaque" compression information. It's opaque in the sense that the
>>>low level compression scheme is not needed, but, knowledge of the overall
>>>layout of the compressed data is required. The auxiliary buffer is created by
>>>the driver on behalf of the client when requested. That buffer needs to be
>>>passed along wherever the main image's buffer goes.
>>>The overall strategy is that the buffer/surface is created with a list of
>>>modifiers. The list of modifiers the hardware is capable of using will come from
>>>a new kernel API that is aware of the hardware and general constraints. A client
>>>will request the list of modifiers and pass it directly back in during buffer
>>>creation (potentially the client can prune the list, but as of now there is no
>>>reason to.) This new API is being developed by Kristian. I did not get far
>>>enough to play with that.
>>>For EGL, a similar mechanism would exist whereby when importing a buffer into
>>>EGL, one would provide a modifier and probably a pointer to the auxiliary data
>>>upon import. (Import therefore might require multiple dma-buf fds), but for i965
>>>and Intel, this wouldn't be necessary.
>>>Here is a brief description of the series:
>>>1-6 Adds support in GBM for per plane functions where necessary. This is
>>>required because the kernel expects the auxiliary buffer to be passed along as a
>>>plane. It has its own offset, and stride, and the client shouldn't need to
>>>calculate those.
>>>7-9 Adds support in GBM to understand modifiers. When creating a buffer or
>>>surface, the client is expected to pass in a list of modifiers that the driver
>>>will optimally choose from. As a result of this, the GBM APIs need to support
>>>10-12 Support Y-tiled modifier. Y-tiling was already a modifier exposed by the
>>>kernel. With the previous patches in place, it's easy to support this too.
>>>13-26 Plumbing to support sending CCS buffers to display. Leveraging much of the
>>>existing code for MCS buffers, these patches creating an MCS for the scanout
>>>buffer. The trickery here is that a single BO contains both the main surface and
>>>the auxiliary data. Previously, auxiliary data always lived in its own BO.
>>>27 Support CCS-modifier. Finally, the code can parse the CCS fb modifier(s) and
>>>realize the bandwidth savings that come with it.
>>>This was tested using kmscube
>>>(https://github.com/bwidawsk/kmscube/tree/modifiers). The kmscube implementation
>>>is missing support for GET_PLANE2 - which is currently being worked on by
>>>Upstream plan:
>>First of all, I'd like to point that I haven't really been following
>>this feature closely, so maybe my questions are irrelevant to this
>>series. But still, I feel I have to poitn these things since maybe
>>they are relevant. Please tell me if I'm not talking about the same
>>thing as you are.
>>The main question is: where's the matching i915.ko series? Shouldn't
>>that be step 0 in your upstream plan?
>Ville is working on it. All patches except the last can be merged without kernel
>support. That is assuming that we agree upon the general solution, using the
>modifiers and having both buffers be part of the same BO. There is also a
>requisite series from Kristian which will allow the client to query per plane

I guess this is a lie actually. I depend on fourcc_mod_code(INTEL, 4) being
Y-tiled CCS modifier. I can figure out a way to defer this until the last patch.

>>I do recall seeing BSpec text containing "do this thing if render
>>decompression is enabled" and, at that time, our code wasn't
>>implementing those instructions. AFAIU, the Kernel didn't really had
>>support for render decompression, so its specific bits were just
>>ignored. I was assuming that whoever implemented the feature would add
>>all the necessary bits, especially since we didn't seem to have any
>>sort of "if (has_render_decompression(dev_priv))" to call. I am 100%
>>sure there's such an example in the Gen 9 Watermarks instructions, but
>>I'm sure I saw more somewhere else (Display WA page?). And reember:
>>missing watermarks workarounds equals flickering screens.
>>Is this relevant to your series? How will Mesa be able to detect that
>>the Kernel it's running on contains the necessary Render Decompression
>>checks/WAs/code it needs? How can the Kernel detect that Render
>>Decompression is in use and start doing the things it should do?
>Mesa doesn't need to detect that the kernel is doing it. The kernel needs to do
>it if mesa requests it to be done. The assumption is that the kernel advertises
>this via the new modifier flags and no getparam is necessary. If the modifier
>flags exist in the UAPI, the kernel supports it (with workarounds implemented).
>Did I answer all of the questions?
>>>1. All of the patches up through 26 should be mergeable today after review.
>>>2. After 1-12 land, client support of Y-tiling should be achievable. Modesetting
>>>driver can probably be updated as can things like Weston. Clients assuming a new
>>>enough kernel should be able to blindly set the y tiled modifier.
>>>3. Once kernel and libdrm support for CCS modifiers, patch 27 can land, however
>>>CCS isn't yet usable, it is only available as a prototype.
>>>4. Kristian's GET_PLANE2 interface needs to be solidified and land.
>>>5. Clients will utilize #3 and #4 to use CCS.
>>>6. Protocol work, EGL, Wayland, DRIX - etc
>>>When Kristian's interface is ready, kmscube can be modified to make use of it.
>>>Rob: are you interested in a PR for kmscube?
>>>Definition of terms:
>>>Renderbuffer Decompression - In the ARM world, this is AFBC. Having the graphics
>>>driver utilize lossless surface compression for the scanout buffer and sending
>>>those surfaces, compressed, to the kernel (via KMS) for the display engine to
>>>directly consume.
>>>Renderbuffer Compression - Utilizing compressed surfaces for many buffer types
>>>(scanout, textures, whatever), and decompressing (ie. resolving) those surfaces
>>>before passing them along.
>>>Ben Widawsky (27):
>>>  gbm: Move getters to match order in header file (trivial)
>>>  gbm: Fix width height getters return type (trivial)
>>>  gbm: Export a plane getter function
>>>  gbm: Create a gbm_device getter for stride
>>>  gbm: Export a per plane getter for stride
>>>  gbm: Export a per plane getter for offset
>>>  i965/dri: Store the screen associated with the image
>>>  dri: Add an image creation with modifiers
>>>  gbm: Introduce modifiers into surface/bo creation
>>>  i965: Handle Y-tile modifier
>>>  gbm: Get modifiers from DRI
>>>  i965: Bring back always Y-tiled on SKL+
>>>  i965: Separate image allocation with modifiers
>>>  i965: Allow aux buffers to have an offset
>>>  i965/miptree: Add a helper functions for image creation
>>>  i965/miptree: Allocate mcs_buf for an image's CCS_E
>>>  i965: Create correctly sized mcs for an image
>>>  i965/miptree: Add a return for updating of winsys
>>>  i965/miptree: Allocate mt earlier in update winsys
>>>  i965: Pretend that CCS modified images are two planes
>>>  i965: Make CCS stride match kernel's expectations
>>>  i965: Change resolve flags to enum
>>>  i965: Plumb resolve hints from miptrees to blorp
>>>  i965: Add new resolve hints full and partial
>>>  i965: Use partial resolves for CCS buffers being scanned out
>>>  i965: Remove scanout restriction from lossless compression
>>>  i965: Handle compression modifier
>>> include/GL/internal/dri_interface.h              |  28 ++-
>>> src/egl/drivers/dri2/platform_drm.c              |   7 +-
>>> src/gallium/state_trackers/dri/dri2.c            |   1 +
>>> src/gbm/backends/dri/gbm_dri.c                   | 132 ++++++++++++++-
>>> src/gbm/gbm-symbols-check                        |   6 +
>>> src/gbm/main/gbm.c                               | 112 ++++++++++--
>>> src/gbm/main/gbm.h                               |  28 ++-
>>> src/gbm/main/gbmint.h                            |  16 +-
>>> src/mesa/drivers/dri/i965/brw_blorp.c            |  12 +-
>>> src/mesa/drivers/dri/i965/brw_blorp.h            |   3 +-
>>> src/mesa/drivers/dri/i965/brw_context.c          |  53 ++++--
>>> src/mesa/drivers/dri/i965/brw_wm_surface_state.c |   3 +-
>>> src/mesa/drivers/dri/i965/intel_fbo.c            |  17 +-
>>> src/mesa/drivers/dri/i965/intel_image.h          |   5 +
>>> src/mesa/drivers/dri/i965/intel_mipmap_tree.c    | 139 +++++++++++----
>>> src/mesa/drivers/dri/i965/intel_mipmap_tree.h    |  29 +++-
>>> src/mesa/drivers/dri/i965/intel_screen.c         | 207 +++++++++++++++++++++--
>>> src/mesa/drivers/dri/i965/intel_tex_image.c      |  17 +-
>>> 18 files changed, 688 insertions(+), 127 deletions(-)
>>>Cc: Kristian H. Kristensen <hoegsberg at gmail.com>
>>>Cc: Daniel Stone <daniels at collabora.com>
>>>Cc: Rob Clark <robdclark at gmail.com>
>>>mesa-dev mailing list
>>>mesa-dev at lists.freedesktop.org
>>Paulo Zanoni
>mesa-dev mailing list
>mesa-dev at lists.freedesktop.org

More information about the mesa-dev mailing list