[Intel-gfx] [PATCH 7/7] drm/i915/gem: Acquire all vma/objects under reservation_ww_class

Thu Jun 25 15:10:23 UTC 2020

Quoting Christian König (2020-06-25 15:02:41)
> Am 25.06.20 um 15:23 schrieb Chris Wilson:
> > Quoting Christian König (2020-06-25 13:59:16)
> >> Am 25.06.20 um 14:48 schrieb Chris Wilson:
> >>> Quoting Christian König (2020-06-25 09:11:35)
> >>>> Am 24.06.20 um 22:18 schrieb Chris Wilson:
> >>>>> Quoting Dave Airlie (2020-06-24 20:04:02)
> >>>>>> On Wed, 24 Jun 2020 at 07:19, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >>>>>>> Quoting Dave Airlie (2020-06-23 22:01:24)
> >>>>>>>> On Tue, 23 Jun 2020 at 20:03, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >>>>>>>>> Quoting Thomas Hellström (Intel) (2020-06-23 10:33:20)
> >>>>>>>>>> Hi, Chris!
> >>>>>>>>>>
> >>>>>>>>>> On 6/22/20 11:59 AM, Chris Wilson wrote:
> >>>>>>>>>>> In order to actually handle eviction and what not, we need to process
> >>>>>>>>>>> all the objects together under a common lock, reservation_ww_class. As
> >>>>>>>>>>> such, do a memory reservation pass after looking up the object/vma,
> >>>>>>>>>>> which then feeds into the rest of execbuf [relocation, cmdparsing,
> >>>>>>>>>>> flushing and ofc execution].
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>>>>>>>> ---
> >>>>>>>>>>>      .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 91 ++++++++++++++-----
> >>>>>>>>>>>      1 file changed, 70 insertions(+), 21 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>> Which tree is this against? The series doesn't apply cleanly against
> >>>>>>>>>> drm-tip?
> >>>>>>>>> It's continuing on from the scheduler patches, the bug fixes and the
> >>>>>>>>> iris-deferred-fence work. I thought throwing all of those old patches
> >>>>>>>>> into the pile would have been distracting.
> >>>>>>>>>
> >>>>>>>>>> ...
> >>>>>>>>>>
> >>>>>>>>>>> +static int eb_reserve_mm(struct i915_execbuffer *eb)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +     const u64 idx = eb->context->timeline->fence_context;
> >>>>>>>>>>> +     struct ww_acquire_ctx acquire;
> >>>>>>>>>>> +     struct eb_vma *ev;
> >>>>>>>>>>> +     int err;
> >>>>>>>>>>> +
> >>>>>>>>>>> +     eb->mm_fence = __dma_fence_create_proxy(0, 0);
> >>>>>>>>>>> +     if (!eb->mm_fence)
> >>>>>>>>>>> +             return -ENOMEM;
> >>>>>>>>>> Where are the proxy fence functions defined?
> >>>>>>>>> In dma-fence-proxy.c ;)
> >>>>>>>> The dma-fence-proxy that Christian NAKed before?
> >>>>>>> I do not have an email from Christian about dma-fence-proxy in the last
> >>>>>>> 3 years it has been on the list.
> >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2Faeb0373d-0583-d922-3b73-93668c27d177%40amd.com%2F&data=02%7C01%7Cchristian.koenig%40amd.com%7Ccb060e358d844784815708d819061868%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637286861292346372&sdata=DlHistmqPi%2BtwdcT%2FycrtRpoLGZ6xcBD%2FkPvVZcQ2YQ%3D&reserved=0
> >>>>> Darn, I skimmed the thread title and thought it was just about the
> >>>>> timelines.
> >>>>>
> >>>>>> I'm assuming this was about patch 8 there which to me looks like proxy
> >>>>>> fences but maybe by threading is off reading that.
> >>>>> The deadlocks are easy to resolve. The fence is either signaled normally
> >>>>> by userspace, they create a deadlock that is rejected by checking the dag
> >>>>> and the fence signaled with an error (and work cancelled, error
> >>>>> propagated back to userspace if they kept the output fence around), or
> >>>>> userspace forgets entirely about the fence they were waiting on in which
> >>>>> case it is signaled by closing the syncobjs [sadly not in error though,
> >>>>> I hoping to report EPIPE] on process termination.
> >>>> And exactly that concept is still a big NAK.
> >>>>
> >>>> The kernel memory management depends on dma_fences to be signaling as
> >>>> soon as they are existing.
> >>>>
> >>>> Just imagine what Daniel's dependency patches would splat out when you
> >>>> do something like this and correctly annotate the signaling code path.
> >>> Nothing at all. Forward progress of the waiter does not solely depend on
> >>> the signaler, just as in bc9c80fe01a2570a2fd78abbc492b377b5fda068.
> >>>    
> >>>> Proxy fences, especially when they depend on userspace for signaling are
> >>>> an absolutely NO-GO.
> >>> We are in full control of the signaling and are able to cancel the pending
> >>> userspace operation, move it off to one side and shutdown the HW,
> >>> whatever. We can and do do dependency analysis of the fence contexts to
> >>> avoid deadlocks, just as easily as detecting recursion.
> >>>
> >>> To claim that userspace is not already able to control signaling, is a
> >>> false dichotomy. Userspace is fully able to lock the HW resources
> >>> indefinitely (even if you cap every job, one can always build a chain of
> >>> jobs to circumvent any imposed timeout, a couple of seconds timeout
> >>> becomes several months of jobs before the GPU runs out of memory and is
> >>> unable to accept any more jobs). Any ioctl that blocks while holding a HW
> >>> resource renders itself liable to a user controllable livelock, you know
> >>> this, because it is blocking the signaling of those earlier jobs.
> >>> Worrying about things that are entirely within our control and hence
> >>> avoidable, misses the point.
> >> You are completely missing the problem here.
> >>
> >> As you correctly pointed out that an userspace thread blocks on
> >> something is perfectly acceptable. And that's how
> >> bc9c80fe01a2570a2fd78abbc492b377b5fda068 works as well.
> >>
> >> And bc9c80fe01a2570a2fd78abbc492b377b5fda068 only implements waiting so
> >> that during CS or WAIT IOCTL we can block for the fence to appear.
> >>
> >>
> >> What happens in your approach is that the kernel starts to wait for
> >> userspace in its memory reclaim path. That is exactly the kind of
> >> problem Daniels patches now point out immediately.
> > No we don't.

To be clear, adding a wait to direct reclaim incurs latency across the
whole system, and attracts the ire of users and core developers alike.

Having fielded the bug reports for that, we try to avoid any case where
we would wait inside direct reclaim. We still do cause kswapd to wait if
there's nothing else left to clean up. We also try to apply backpressure
to client memory allocators directly; I would like to improve that path
to have memory prioritisation.

So I still consider direct reclaim latency to be a serious enough issue
that a blanket recommendation should be: don't wait.

> Well then Daniels patches are still missing that case :)

We have the DAG of fences, we can use that information to avoid adding
an implicit coupling between execution contexts. Borrowing lockdep for
its heavily aliased chains, when we have the finegrained information
available for scheduling to solve what is essentially a scheduling issue
seems shallow.

> See when signaling a fence depends userspace doing something, we 
> obviously insert circle dependencies between whatever userspace might do 
> in a kernel system call and the kernel reclaim path.
> 
> That this can't work correctly is actually completely obvious if you see 
> it from this side.

The waits are unbounded, indefinite or even just a matter of milliseconds;
ergo you are not allowed to wait inside direct reclaim. Userspace
dictates forward progress of signaling chains, worse if one client can
indirectly manipulate another's progress; the entire kernel is inside
that execution context. Totally agree on that. [But inside the kernel, we
do have the information to track even implicit execution coupling inside 
the drivers, and where we don't have that information we have to assume
it is outside of our control.]
-Chris