[Intel-gfx] [PATCH] drm/i915: Wait and retry if there is no space in the aperture mappable area

Fri Nov 1 17:56:57 CET 2013

Dragging this discussion back onto the mailing list.

The Intel GenX kernel team is massively distributed over mutliple
continents and lots fo different business groups within Intel, and a
few interested parties outside of Intel. We need to work hard and
actively to make sure everyone is in the loop, that interested parties
can jump into a discussion or at least learn a bit by following it.

Which means _please_ _always_ cc mailing list. Either the public
intel-gfx list or when discussing confidential topics (and they can't
be redacted out without causing confusion) the internal
gfx-internal-devel mailing list.

/end of sermon, thanks for your attention

Short recap for newcomers: Jon's group is hitting reliability issues
in stress testing where gtt mmappings fail with SIGBUS. The frequency
sharply increased when they started to support more outputs/sprites
and so squarely points at mappable gtt fragmentation due to pinned
buffers.

On Fri, Nov 1, 2013 at 1:47 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> On Fri, Nov 01, 2013 at 12:02:07PM +0000, Bloomfield, Jon wrote:
>> > > > That statement is false.
>> > > Fair enough, but why is it false ?
>> >
>> > You often have buffers larger than the mappable aperture. At present, such
>> > mappings fail (they should pruned out earlier, but in the fault they would
>> > give a SIGBUS). That limit is oft requested to be raised by magic.
>> But a mapping like this should fail when the user asks to map the buffer, not when they subsequently access it. It can never succeed, so tell the user immediately.
>>
>> > > Userspace fails when accessing a 90MB mapping.
>> > > Don't understand how a trampoline would simplify this - trampoline as in
>> > vectored jump ?
>> > > Don't understand the 'educate' comment - they are simply mapping a
>> > 90MB buffer and trying to use it. Why should that be slow in the general case
>> > ?
>> >
>> > Failing to map a 90MiB buffer is a kernel bug, pure and simple. Failing to map
>> > a 900MiB buffer is a user education issue. (It could be done, but there are at
>> > least a dozen ways the user could do the operation differently that will be
>> > faster.)
>> Why a kernel bug ? If there are too many surfaces pinned for display (or sufficiently fragmented), is it guaranteed that a 90MB map should still succeed ?
>>
>> > The borderline is whether the user expects a 128MiB mapping to
>> > always work. Worst case scenario (without a
>> > trampoline) is that requires the outputs to be disabled first i.e.
>> > process deadlock which leads to a system deadlock with anything like X.
>> Again, how would a trampoline help ?
>
> All I am stating here is that your idea can^Wwill lead to a system
> deadlock. NAK.

I agree with Chris, implicit waiting for a random pin count to drop is
a deadlock nightmare. We'd need at least an explicti unpin_lru where
we track all soon-to-be-unpinned objects. And then we need to walk
that list in evict_something as a last resort.

Still resolving the locking inversion this causes between gem and kms
code won't be fun at all, and we still haven't gained any solid
guarantees for userspace.The mappable gtt can still be easily
fragmented, it's just a notch less likely to cause problems.

We need a better solution, and we need one with actual guarantees for
userspace to depend upon.

So here's my idea to solve this for real:
- split long-term pinned objects into two classes: Those that might
need to be access through the mappable gtt mmio window (like scanout
buffers) and those that don't (like hw contexts).
- Pin objects that are never accessed through the mappable gtt by the
cpu to the unmappable part of the gtt. This will get rid of all the hw
contexts causing havoc by sitting in the middle of the mappable gtt
due to bad luck.
- Restrict all other pinned objects to [0, mappable_size / 2]. This
will ensure that the range [mappable_size / 2, mappable] can always be
evicted. Add some vicious checks to the code to make sure we never
stumble over a pinned object in there. Also, expose this value to
userspace as the guaranteed upper limit for gtt mmaps.

Unfortunately we can't enforce this when creating the mapping since
old userspace didn't bother with this, and most often (if you never do
a modeset at least) it works _really_ well.

Of course we need solid testcases to make sure we don't break this
again. Luckily the gtt allocater is bottom-up and fully predictable if
nothing else is running (which is the default assumption for i-g-t
tests). So we need the following:
- two objects A1/A2 of size gtt_mappable / 3
- some means to trick the kernel into pinning an object of the class
we want to test. We need 2 such objects B1, B2.

1. fault in object A1 through cpu gtt access.
2. trick kernel into pinning B1.
3. fault in object A2 through cpu gtt access.
4. trick kernel into pinning B2.
5. Try to fault in a new object with gtt cpu access of size mappable/2.

Now we have perfectly fragemented the gtt and there's no room for an
object of size mappable/2. But if we restrict pinned objects it will
work.

Repeat the above test for all classes of pinned objects (hw context,
scanout buffers, cursors, ...).

I think this is conceptually the simplest change that actually gives
us real guarantees. And it doesn't change anything fundamentally with
how gem works (as opposed to the pinned buffer eviction logic) and so
should be much less tricky to implement.

Please poke holes into the plan.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch