[PATCH 2/2] dma-buf: cleanup shared fence removal

Tue Jul 2 13:50:42 UTC 2019

On Fri, Jun 28, 2019 at 7:32 PM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
>
> Am 28.06.19 um 18:40 schrieb Daniel Vetter:
> > On Fri, Jun 28, 2019 at 5:21 PM Koenig, Christian
> > <Christian.Koenig at amd.com> wrote:
> >> Am 28.06.19 um 16:38 schrieb Daniel Vetter:
> >> [SNIP]
> >>>>>>> - when you submit command buffers, you _dont_ attach fences to all
> >>>>>>> involved buffers
> >>>>>> That's not going to work because then the memory management then thinks
> >>>>>> that the buffer is immediately movable, which it isn't,
> >>>>> I guess we need to fix that then. I pretty much assumed that
> >>>>> ->notify_move could add whatever fences you might want to add. Which
> >>>>> would very neatly allow us to solve this problem here, instead of
> >>>>> coming up with fake fences and fun stuff like that.
> >>>> Adding the fence later on is not a solution because we need something
> >>>> which beforehand can check if a buffer is movable or not.
> >>>>
> >>>> In the case of a move_notify the decision to move it is already done and
> >>>> you can't say oh sorry I have to evict my process and reprogram the
> >>>> hardware or whatever.
> >>>>
> >>>> Especially when you do this in an OOM situation.
> >>> Why? I mean when the fence for a CS is there already, it might also
> >>> still hang out in the scheduler, or be blocked on a fence from another
> >>> driver, or anything like that. I don't see a conceptual difference.
> >>> Plus with dynamic dma-buf the entire point is that an attached fences
> >>> does _not_ mean the buffer is permanently pinned, but can be moved if
> >>> you sync correctly. Might need a bit of tuning or a flag to indicate
> >>> that some buffers should alwasy considered to be busy, and that you
> >>> shouldn't start evicting those. But that's kinda a detail.
> >>>
> >>>   From a very high level there's really no difference betwen
> >>> ->notify_move and the eviction_fence. Both give you a callback when
> >>> someone else needs to move the buffer, that's all. The only difference
> >>> is that the eviction_fence thing jumbles the callback and the fence
> >>> into one, by preattaching a fence just in case. But again from a
> >>> conceptual pov it doesn't matter whether the fence is always hanging
> >>> around, or whether you just attach it when ->notify_move is called.
> >> Sure there is a difference. See when you attach the fence beforehand the
> >> memory management can know that the buffer is busy.
> >>
> >> Just imagine the following: We are in an OOM situation and need to swap
> >> things out to disk!
> >>
> >> When the fence is attached beforehand the handling can be as following:
> >> 1. MM picks a BO from the LRU and starts to evict it.
> >> 2. The eviction fence is enabled and we stop the process using this BO.
> >> 3. As soon as the process is stopped the fence is set into the signaled
> >> state.
> >> 4. MM needs to evict more BOs and since the fence for this process is
> >> now in the signaled state it can intentionally pick the ones up which
> >> are now idle.
> >>
> >> When we attach the fence only on eviction that can't happen and the MM
> >> would just pick the next random BO and potentially stop another process.
> >>
> >> So I think we can summarize that the memory management definitely needs
> >> to know beforehand how costly it is to evict a BO.
> >>
> >> And of course implement this with flags or use counters or whatever, but
> >> we already have the fence infrastructure and I don't see a reason not to
> >> use it.
> > Ok, for the sake of the argument let's buy this.
> >
> > Why do we need a ->notify_move callback then? We have it already, with
> > these special fences.
>
> Yeah, that same thought came to my mind as well.
>
> One difference is that notify_move is only called when the BO is
> actually moved, while the fence can be called for other synchronization
> reasons as well.

Yeah I think that's the only crux. You can either use the resv_obj for
the magic eviction_fence, or for implicit sync. But if you want to do
both at the same time, it's going to be trouble. Or at least I'm not
seeing how that could work ...

atm not a problem with only amdkfd using that submission model, but I
expect this model of having a semi-static working set + userspace just
submitting batches to spread. And it'll probably spread faster than we
can retire implicit fencing ...

> > Other side: If all you want to know is whether you can unmap a buffer
> > immediately, for some short enough value of immediately (I guess a
> > bunch of pagetable writes should be ok), then why not add that? The "I
> > don't want to touch all buffers for every CS, but just have a pinned
> > working set" command submission model is quite different after all,
> > having dedicated infrastructure that fits well sounds like a good idea
> > to me.
>
> Ok, well I think I can agree on that.
>
> One of the main purposes using the fence was essentially to avoid
> creating duplicated infrastructure. And I still think that this is a
> good idea. So I think we should essentially aim for something which
> works for both use cases.
>
> Maybe we see this from the wrong side? Fences started of as event system
> to note about completion of operations.
>
> But what we essentially need for memory management is a) know if the BO
> is used, b) some form of event/callback to stop using it c) an
> event/callback back to let the MM know that it is not used any more.
>
> So taking a step back what should the ideal solution look like?

Did some pondering over the w/e. But brain was all mushed up and in
non-working condition because everything is too hot here :-/ It's
getting better at least.

Another thing I looked a bit into is the amdgpu userptr stuff, and
wondered whether we could/should make dynamic dma-buf also work for
that case. In a ways userptr is the most over-the-top dynamic kind of
memory ... But I don't think that's possible due to all the locks
involved in hmm_mirror and get_user_pages/hmm_fault.

One thing that's become clear at least is that everyone expects to be
able to wait on fences in their hmm/shrinker/whatever core -mm
callbacks. Which means the comment around the amdgpu userptr mn lock
is actually not paranoid enough:

    /* No memory allocation is allowed while holding the mn lock.
     * p->mn is hold until amdgpu_cs_submit is finished and fence is added
     * to BOs.
     */
    amdgpu_mn_lock(p->mn);

The rule is actually a lot stricter I think: Anytime you publish a
fence you're not allowed to allocate memory until you've signalled
that fence, in _any_ path leading up to that dma_fence_signal.
amdgpu_mn_lock() is just another way to publish fences. Ofc there's
also the locking inversion problem on the mn_lock itself, but the "you
can't allocate any memory, anywhere, in any driver to signal a fence"
is scary. I'll try to bake this into documentation patch somehow.

Aside: The cross-release lockdep stuff would help us validate these
more indirect lock dependencies against the fs_reclaim context. But it
looks like cross-release lockdep is stalled indefinitely :-(

Aside from that sidetrack, no real progress on pondering this topic here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch