[Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)

Ben Widawsky ben at bwidawsk.net
Fri Dec 30 01:34:19 UTC 2016

On 16-12-06 13:34:02, Paulo Zanoni wrote:
>2016-12-01 20:09 GMT-02:00 Ben Widawsky <benjamin.widawsky at intel.com>:
>> From: Ben Widawsky <ben at bwidawsk.net>
>> This patch series ultimately adds support within the i965 driver for
>> Renderbuffer Decompression with GBM. In short, this feature reduces memory
>> bandwidth by allowing the GPU to work with losslessly compressed data and having
>> that compression scheme understood by the display engine for decompression. The
>> display engine will decompress on the fly and scanout the image.
>> Quoting from the final patch, the bandwidth savings on a SKL GT4 with a 19x10
>> display running kmscube:
>> Without compression:
>>     Read bandwidth: 603.91 MiB/s
>>     Write bandwidth: 615.28 MiB/s
>> With compression:
>>     Read bandwidth: 259.34 MiB/s
>>     Write bandwidth: 337.83 MiB/s
>> The hardware achieves this savings by maintaining an auxiliary buffer
>> containing "opaque" compression information. It's opaque in the sense that the
>> low level compression scheme is not needed, but, knowledge of the overall
>> layout of the compressed data is required. The auxiliary buffer is created by
>> the driver on behalf of the client when requested. That buffer needs to be
>> passed along wherever the main image's buffer goes.
>> The overall strategy is that the buffer/surface is created with a list of
>> modifiers. The list of modifiers the hardware is capable of using will come from
>> a new kernel API that is aware of the hardware and general constraints. A client
>> will request the list of modifiers and pass it directly back in during buffer
>> creation (potentially the client can prune the list, but as of now there is no
>> reason to.) This new API is being developed by Kristian. I did not get far
>> enough to play with that.
>> For EGL, a similar mechanism would exist whereby when importing a buffer into
>> EGL, one would provide a modifier and probably a pointer to the auxiliary data
>> upon import. (Import therefore might require multiple dma-buf fds), but for i965
>> and Intel, this wouldn't be necessary.
>> Here is a brief description of the series:
>> 1-6 Adds support in GBM for per plane functions where necessary. This is
>> required because the kernel expects the auxiliary buffer to be passed along as a
>> plane. It has its own offset, and stride, and the client shouldn't need to
>> calculate those.
>> 7-9 Adds support in GBM to understand modifiers. When creating a buffer or
>> surface, the client is expected to pass in a list of modifiers that the driver
>> will optimally choose from. As a result of this, the GBM APIs need to support
>> modifiers.
>> 10-12 Support Y-tiled modifier. Y-tiling was already a modifier exposed by the
>> kernel. With the previous patches in place, it's easy to support this too.
>> 13-26 Plumbing to support sending CCS buffers to display. Leveraging much of the
>> existing code for MCS buffers, these patches creating an MCS for the scanout
>> buffer. The trickery here is that a single BO contains both the main surface and
>> the auxiliary data. Previously, auxiliary data always lived in its own BO.
>> 27 Support CCS-modifier. Finally, the code can parse the CCS fb modifier(s) and
>> realize the bandwidth savings that come with it.
>> This was tested using kmscube
>> (https://github.com/bwidawsk/kmscube/tree/modifiers). The kmscube implementation
>> is missing support for GET_PLANE2 - which is currently being worked on by
>> Kristian.
>> Upstream plan:
>First of all, I'd like to point that I haven't really been following
>this feature closely, so maybe my questions are irrelevant to this
>series. But still, I feel I have to poitn these things since maybe
>they are relevant. Please tell me if I'm not talking about the same
>thing as you are.
>The main question is: where's the matching i915.ko series? Shouldn't
>that be step 0 in your upstream plan?

Ville is working on it. All patches except the last can be merged without kernel
support. That is assuming that we agree upon the general solution, using the
modifiers and having both buffers be part of the same BO. There is also a
requisite series from Kristian which will allow the client to query per plane

>I do recall seeing BSpec text containing "do this thing if render
>decompression is enabled" and, at that time, our code wasn't
>implementing those instructions. AFAIU, the Kernel didn't really had
>support for render decompression, so its specific bits were just
>ignored. I was assuming that whoever implemented the feature would add
>all the necessary bits, especially since we didn't seem to have any
>sort of "if (has_render_decompression(dev_priv))" to call. I am 100%
>sure there's such an example in the Gen 9 Watermarks instructions, but
>I'm sure I saw more somewhere else (Display WA page?). And reember:
>missing watermarks workarounds equals flickering screens.
>Is this relevant to your series? How will Mesa be able to detect that
>the Kernel it's running on contains the necessary Render Decompression
>checks/WAs/code it needs? How can the Kernel detect that Render
>Decompression is in use and start doing the things it should do?

Mesa doesn't need to detect that the kernel is doing it. The kernel needs to do
it if mesa requests it to be done. The assumption is that the kernel advertises
this via the new modifier flags and no getparam is necessary. If the modifier
flags exist in the UAPI, the kernel supports it (with workarounds implemented).


Did I answer all of the questions?

>> 1. All of the patches up through 26 should be mergeable today after review.
>> 2. After 1-12 land, client support of Y-tiling should be achievable. Modesetting
>> driver can probably be updated as can things like Weston. Clients assuming a new
>> enough kernel should be able to blindly set the y tiled modifier.
>> 3. Once kernel and libdrm support for CCS modifiers, patch 27 can land, however
>> CCS isn't yet usable, it is only available as a prototype.
>> 4. Kristian's GET_PLANE2 interface needs to be solidified and land.
>> 5. Clients will utilize #3 and #4 to use CCS.
>> 6. Protocol work, EGL, Wayland, DRIX - etc
>> When Kristian's interface is ready, kmscube can be modified to make use of it.
>> Rob: are you interested in a PR for kmscube?
>> Definition of terms:
>> Renderbuffer Decompression - In the ARM world, this is AFBC. Having the graphics
>> driver utilize lossless surface compression for the scanout buffer and sending
>> those surfaces, compressed, to the kernel (via KMS) for the display engine to
>> directly consume.
>> Renderbuffer Compression - Utilizing compressed surfaces for many buffer types
>> (scanout, textures, whatever), and decompressing (ie. resolving) those surfaces
>> before passing them along.
>> Ben Widawsky (27):
>>   gbm: Move getters to match order in header file (trivial)
>>   gbm: Fix width height getters return type (trivial)
>>   gbm: Export a plane getter function
>>   gbm: Create a gbm_device getter for stride
>>   gbm: Export a per plane getter for stride
>>   gbm: Export a per plane getter for offset
>>   i965/dri: Store the screen associated with the image
>>   dri: Add an image creation with modifiers
>>   gbm: Introduce modifiers into surface/bo creation
>>   i965: Handle Y-tile modifier
>>   gbm: Get modifiers from DRI
>>   i965: Bring back always Y-tiled on SKL+
>>   i965: Separate image allocation with modifiers
>>   i965: Allow aux buffers to have an offset
>>   i965/miptree: Add a helper functions for image creation
>>   i965/miptree: Allocate mcs_buf for an image's CCS_E
>>   i965: Create correctly sized mcs for an image
>>   i965/miptree: Add a return for updating of winsys
>>   i965/miptree: Allocate mt earlier in update winsys
>>   i965: Pretend that CCS modified images are two planes
>>   i965: Make CCS stride match kernel's expectations
>>   i965: Change resolve flags to enum
>>   i965: Plumb resolve hints from miptrees to blorp
>>   i965: Add new resolve hints full and partial
>>   i965: Use partial resolves for CCS buffers being scanned out
>>   i965: Remove scanout restriction from lossless compression
>>   i965: Handle compression modifier
>>  include/GL/internal/dri_interface.h              |  28 ++-
>>  src/egl/drivers/dri2/platform_drm.c              |   7 +-
>>  src/gallium/state_trackers/dri/dri2.c            |   1 +
>>  src/gbm/backends/dri/gbm_dri.c                   | 132 ++++++++++++++-
>>  src/gbm/gbm-symbols-check                        |   6 +
>>  src/gbm/main/gbm.c                               | 112 ++++++++++--
>>  src/gbm/main/gbm.h                               |  28 ++-
>>  src/gbm/main/gbmint.h                            |  16 +-
>>  src/mesa/drivers/dri/i965/brw_blorp.c            |  12 +-
>>  src/mesa/drivers/dri/i965/brw_blorp.h            |   3 +-
>>  src/mesa/drivers/dri/i965/brw_context.c          |  53 ++++--
>>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c |   3 +-
>>  src/mesa/drivers/dri/i965/intel_fbo.c            |  17 +-
>>  src/mesa/drivers/dri/i965/intel_image.h          |   5 +
>>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c    | 139 +++++++++++----
>>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h    |  29 +++-
>>  src/mesa/drivers/dri/i965/intel_screen.c         | 207 +++++++++++++++++++++--
>>  src/mesa/drivers/dri/i965/intel_tex_image.c      |  17 +-
>>  18 files changed, 688 insertions(+), 127 deletions(-)
>> Cc: Kristian H. Kristensen <hoegsberg at gmail.com>
>> Cc: Daniel Stone <daniels at collabora.com>
>> Cc: Rob Clark <robdclark at gmail.com>
>> --
>> 2.10.2
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>Paulo Zanoni

More information about the mesa-dev mailing list