[Intel-gfx] [RFC] drm/i915: Render decompression support for Gen9 and above

Tue Jan 19 02:28:48 PST 2016

Hi,

On 10 September 2015 at 16:02, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Wed, Sep 09, 2015 at 10:04:23AM -0700, Jesse Barnes wrote:
>> On 09/09/2015 09:36 AM, Smith, Gary K wrote:
>> > I don't understand why this is an issue. Surely the fb is to describe
>> > static state about the buffer, not dynamic state. The fb should be
>> > created with the compressed modifier. The compressed property is just
>> > a hint to the kernel that the buffer has been completely resolved,
>> > hence currently it can be treated as an uncompressed fb (and the aux
>> > buffer can be ignored). This is dynamic state that may well change
>> > very regularly over the lifetime of the buffer.
>
> There's fb modifier at all in this patch set, not just a static modifier
> to be able to check the compression data fits plus a "ignore compression"
> runtime knob.
>
>> > It's still a compressed fb, it contains a aux buffer and had to be
>> > created with the compressed fb modifier. However, once the userspace
>> > has fully resolved the buffer, the aux buffer can be ignored and the
>> > compressed fb can be used in any situation where an uncompressed fb
>> > would normally be required. This is dynamic state that may well change
>> > very regularly over the lifetime of the buffer.
>> >
>> > I could allocate two fbs always, and use the appropriate one. We
>> > already do this in order to indicate whether a RGBA buffer currently
>> > needs to be considered as opaque (RGBX) or blended. Experience has
>> > shown that it makes it very complex to debug when the fb keeps on
>> > changing its value. However, because we now have 4 different states
>> > (Blended/Opaque and Compressed or Resolved), we will now end up with
>> > up to 4 fbs per buffer.
>> >
>> > We aren't just talking about a few fbs here, we already see more than
>> > 100 fbs active during complex situations. Potentially doubling this
>> > number is surely a significant increase in memory usage, both from the
>> > management side in userspace and the kernel side.
>
> 8kb kernel memory for the additional 2 copies of drm_framebuffer structs
> for 100 buffers. That's about as much as the minimal overhead for just 1
> underlying gem object (counting the sg table, vma, gtt pte tracking, gem
> object and shmem backing node and pagecache entries). 2 integers in userspace.
>
> Do you have some data to show that overhead?

I agree with this view as well, and it does seem to be the way chosen
for generic userspace on other drivers.

For context, the way ChromeOS and Wayland compositors (Weston, Mutter,
Enlightenment) work is that a userspace library called GBM is
distributed as part of EGL, which is the native EGL platform/winsys
for rendering on KMS. The major difference with GBM, however, is that
it does _not_ do presentation: presentation is explicitly controlled
by the compositor itself.

In order to use this new property, we would have to add API to EGL/GBM
to extract a list of property names to set, which wouldn't really make
for great API. It'd be much cleaner for these users to stick with FB
modifiers, especially as they destroy and recreate the FB objects
(something we've not seen have any performance impact) for every flip
anyway. From my side, I'd be much happier using generically-applicable
FB modifiers, than continuing along the property explosion.

The other sticking point is that if I go from flipping GPU buffers
with render compression enabled to software buffers, from userspace
that means I then need to explicitly go unset the render decompression
flag before I can display software buffers, else the flips just get
rejected; something which isn't the case with FB modifiers. One more
thing to go wrong ...

Cheers,
Daniel