[PATCH] dma-buf: fix and rework dma_buf_poll v3

Daniel Vetter daniel.vetter at ffwll.ch
Wed Jun 23 13:56:41 UTC 2021


On Wed, Jun 23, 2021 at 2:42 PM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 23.06.21 um 13:30 schrieb Daniel Vetter:
> > On Wed, Jun 23, 2021 at 1:17 PM Christian König
> > <ckoenig.leichtzumerken at gmail.com> wrote:
> >> Am 22.06.21 um 19:02 schrieb Daniel Vetter:
> >>> On Tue, Jun 22, 2021 at 3:07 PM Christian König
> >>> <ckoenig.leichtzumerken at gmail.com> wrote:
> >>>> Crap, hit enter to early before adding a cover letter.
> >>>>
> >>>> This is the same patch as before, but as requested I'm keeping the
> >>>> exclusive fence handling as it is for now.
> >>>>
> >>>> Daniel can you double check this and/or make sure that it is tested?
> >>>>
> >>>> I only smoke tested it and the code is so complicated that I'm not sure
> >>>> I catched all side effects.
> >>> So I've thought about this some more, and we actually have docs for
> >>> how this is supposed to work:
> >>>
> >>> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#implicit-fence-poll-support
> >>>
> >>> Docs are pretty clear that we want both read and write for EPOLLOUT or
> >>> well both exclusive and shared fences. So I guess back to your actual
> >>> thing, but maybe we should get some acks from userspace people for it
> >>> (Michel, Pekka, Simon probably usual suspects).
> >> Ok, sounds good to me. Going to send out a patch rebased to
> >> drm-misc-fixes today.
> >>
> >>> The other thing I brought up and I haven't seen you reply to (maybe
> >>> missed it) is whether we shouldn't just use dma_resv_get_fences(). We
> >>> need to do the refcounting anyway, and this avoids us having to
> >>> open-code this very nasty code. Finally, and imo most important, this
> >>> is what's also used in drm_gem_fence_array_add_implicit(), which many
> >>> drivers use to handle their implicit in-fences. So would be nice to
> >>> have exactly matching code between that and what dma-buf poll does for
> >>> "can I read" and "can I write".
> >>>
> >>> Thoughts?
> >> Yeah, I've seen that. Just didn't had time to answer.
> >>
> >> That goes into the same direction as my thinking that we need to
> >> centralize the RCU and synchronization handling in general.
> >>
> >> What I don't like about the approach is that dma_resv_get_fences() needs
> >> to allocate memory. For a lot of use cases including dma_buf_poll that
> >> is rather overkill.
> >>
> >> To unify the handling I think we should use the iterator I've create the
> >> prototype of. This way we only have an "for_write" parameter and the
> >> iterator in return gives you all the fences you need.
> > Yeah I think in general I agree, especially in the CS code a bunch of
> > temporary allocations aren't great. But in dma_buf_poll I don't think
> > it's a place where anyone cares. That's why I think we can just use
> > that here and forget about all the trickiness. The gem helper otoh
> > wants an iterator (without retry even, since it's holding dma-resv
> > lock).
>
> Well then I would rather say we make nails with heads and grab the
> reservation lock in dma_buf_poll.
>
> That has at least less overhead than allocating memory, cause
> essentially what dma_buf_poll needs is a
> dma_resv_get_next_unsignaled_or_null_fence().

I'm all ok with that plan, avoids even more complexity.

> >> When you then extend that  approach we could say we have an enum
> >> describing the use case. Something like:
> >> 1. For explicit sync, just give me all the must sync fences from memory
> >> management.
> >> 2. For read, give me all the writers and memory management fences.
> >> 3. For write, give me all the readers and writers and memory management
> >> fences.
> >> 4. For memory management, give me everything including things like PTE
> >> updates/TLB flushes.
> >>
> >> So instead of asking for some specific type of fence(s) the drivers
> >> tells the dma_resv object about their use case and in return get the
> >> fences they need to wait for.
> >>
> >> This essentially means that we move the decision what to wait for from
> >> the drivers into the dma_resv object, which I think would be a massive
> >> improvement.
> >>
> >> Functions like dma_resv_get_list(), dma_resv_get_excl(),
> >> dma_resv_get_excl_rcu() etc would then essentially be deprecated and
> >> instead you use dma_resv_get_fences(), dma_resv_for_each_fences(),
> >> dma_resv_wait_timeout(), dma_resv_test_signaled() with a proper use case.
> >>
> >> What do you think?
> > Yeah I think in general the direction makes sense, at least long term.
> > I think for now it's better to go with simplest solutions first until
> > we have everyone aligned on one set of rules, and have these rules
> > properly documented.
>
> I think that looks rather good now, doesn't it?

Well we have 2 out of 3 pieces:
- ttm drivers need to wait in their pin: fixed&documented
- drivers need to follow the rules for setting dma_resv fences: amdgpu
fixed, patch set for other drivers + docs from me on the list
- drivers must not break the fence DAG in dma-resv: tbd, both
auditing/fixing drives and documenting it.

So getting there, but not yet fully arrived.
-Daniel

>
> Christian.
>
> > -Daniel
> >
> >> Christian.
> >>
> >>> -Daniel
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>> Am 22.06.21 um 15:04 schrieb Christian König:
> >>>>> Daniel pointed me towards this function and there are multiple obvious problems
> >>>>> in the implementation.
> >>>>>
> >>>>> First of all the retry loop is not working as intended. In general the retry
> >>>>> makes only sense if you grab the reference first and then check the sequence
> >>>>> values.
> >>>>>
> >>>>> It's also good practice to keep the reference around when installing callbacks
> >>>>> to fences you don't own.
> >>>>>
> >>>>> And last the whole implementation was unnecessary complex and rather hard to
> >>>>> understand which could lead to probably unexpected behavior of the IOCTL.
> >>>>>
> >>>>> Fix all this by reworking the implementation from scratch.
> >>>>>
> >>>>> Only mildly tested and needs a thoughtful review of the code.
> >>>>>
> >>>>> v2: fix the reference counting as well
> >>>>> v3: keep the excl fence handling as is for stable
> >>>>>
> >>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
> >>>>> CC: stable at vger.kernel.org
> >>>>> ---
> >>>>>     drivers/dma-buf/dma-buf.c | 133 ++++++++++++++++----------------------
> >>>>>     include/linux/dma-buf.h   |   2 +-
> >>>>>     2 files changed, 55 insertions(+), 80 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>>>> index eadd1eaa2fb5..e97c3a9d98d5 100644
> >>>>> --- a/drivers/dma-buf/dma-buf.c
> >>>>> +++ b/drivers/dma-buf/dma-buf.c
> >>>>> @@ -72,7 +72,7 @@ static void dma_buf_release(struct dentry *dentry)
> >>>>>          * If you hit this BUG() it means someone dropped their ref to the
> >>>>>          * dma-buf while still having pending operation to the buffer.
> >>>>>          */
> >>>>> -     BUG_ON(dmabuf->cb_shared.active || dmabuf->cb_excl.active);
> >>>>> +     BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> >>>>>
> >>>>>         dmabuf->ops->release(dmabuf);
> >>>>>
> >>>>> @@ -202,16 +202,20 @@ static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
> >>>>>         wake_up_locked_poll(dcb->poll, dcb->active);
> >>>>>         dcb->active = 0;
> >>>>>         spin_unlock_irqrestore(&dcb->poll->lock, flags);
> >>>>> +     dma_fence_put(fence);
> >>>>>     }
> >>>>>
> >>>>>     static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
> >>>>>     {
> >>>>> +     struct dma_buf_poll_cb_t *dcb;
> >>>>>         struct dma_buf *dmabuf;
> >>>>>         struct dma_resv *resv;
> >>>>>         struct dma_resv_list *fobj;
> >>>>>         struct dma_fence *fence_excl;
> >>>>> -     __poll_t events;
> >>>>>         unsigned shared_count, seq;
> >>>>> +     struct dma_fence *fence;
> >>>>> +     __poll_t events;
> >>>>> +     int r, i;
> >>>>>
> >>>>>         dmabuf = file->private_data;
> >>>>>         if (!dmabuf || !dmabuf->resv)
> >>>>> @@ -225,99 +229,70 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
> >>>>>         if (!events)
> >>>>>                 return 0;
> >>>>>
> >>>>> +     dcb = events & EPOLLOUT ? &dmabuf->cb_out : &dmabuf->cb_in;
> >>>>> +
> >>>>> +     /* Only queue a new one if we are not still waiting for the old one */
> >>>>> +     spin_lock_irq(&dmabuf->poll.lock);
> >>>>> +     if (dcb->active)
> >>>>> +             events = 0;
> >>>>> +     else
> >>>>> +             dcb->active = events;
> >>>>> +     spin_unlock_irq(&dmabuf->poll.lock);
> >>>>> +     if (!events)
> >>>>> +             return 0;
> >>>>> +
> >>>>>     retry:
> >>>>>         seq = read_seqcount_begin(&resv->seq);
> >>>>>         rcu_read_lock();
> >>>>>
> >>>>>         fobj = rcu_dereference(resv->fence);
> >>>>> -     if (fobj)
> >>>>> +     if (fobj && events & EPOLLOUT)
> >>>>>                 shared_count = fobj->shared_count;
> >>>>>         else
> >>>>>                 shared_count = 0;
> >>>>> -     fence_excl = rcu_dereference(resv->fence_excl);
> >>>>> -     if (read_seqcount_retry(&resv->seq, seq)) {
> >>>>> -             rcu_read_unlock();
> >>>>> -             goto retry;
> >>>>> -     }
> >>>>>
> >>>>> -     if (fence_excl && (!(events & EPOLLOUT) || shared_count == 0)) {
> >>>>> -             struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
> >>>>> -             __poll_t pevents = EPOLLIN;
> >>>>> -
> >>>>> -             if (shared_count == 0)
> >>>>> -                     pevents |= EPOLLOUT;
> >>>>> -
> >>>>> -             spin_lock_irq(&dmabuf->poll.lock);
> >>>>> -             if (dcb->active) {
> >>>>> -                     dcb->active |= pevents;
> >>>>> -                     events &= ~pevents;
> >>>>> -             } else
> >>>>> -                     dcb->active = pevents;
> >>>>> -             spin_unlock_irq(&dmabuf->poll.lock);
> >>>>> -
> >>>>> -             if (events & pevents) {
> >>>>> -                     if (!dma_fence_get_rcu(fence_excl)) {
> >>>>> -                             /* force a recheck */
> >>>>> -                             events &= ~pevents;
> >>>>> -                             dma_buf_poll_cb(NULL, &dcb->cb);
> >>>>> -                     } else if (!dma_fence_add_callback(fence_excl, &dcb->cb,
> >>>>> -                                                        dma_buf_poll_cb)) {
> >>>>> -                             events &= ~pevents;
> >>>>> -                             dma_fence_put(fence_excl);
> >>>>> -                     } else {
> >>>>> -                             /*
> >>>>> -                              * No callback queued, wake up any additional
> >>>>> -                              * waiters.
> >>>>> -                              */
> >>>>> -                             dma_fence_put(fence_excl);
> >>>>> -                             dma_buf_poll_cb(NULL, &dcb->cb);
> >>>>> -                     }
> >>>>> +     for (i = 0; i < shared_count; ++i) {
> >>>>> +             fence = rcu_dereference(fobj->shared[i]);
> >>>>> +             fence = dma_fence_get_rcu(fence);
> >>>>> +             if (!fence || read_seqcount_retry(&resv->seq, seq)) {
> >>>>> +                     /* Concurrent modify detected, force re-check */
> >>>>> +                     dma_fence_put(fence);
> >>>>> +                     rcu_read_unlock();
> >>>>> +                     goto retry;
> >>>>>                 }
> >>>>> -     }
> >>>>>
> >>>>> -     if ((events & EPOLLOUT) && shared_count > 0) {
> >>>>> -             struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
> >>>>> -             int i;
> >>>>> -
> >>>>> -             /* Only queue a new callback if no event has fired yet */
> >>>>> -             spin_lock_irq(&dmabuf->poll.lock);
> >>>>> -             if (dcb->active)
> >>>>> -                     events &= ~EPOLLOUT;
> >>>>> -             else
> >>>>> -                     dcb->active = EPOLLOUT;
> >>>>> -             spin_unlock_irq(&dmabuf->poll.lock);
> >>>>> -
> >>>>> -             if (!(events & EPOLLOUT))
> >>>>> +             r = dma_fence_add_callback(fence, &dcb->cb, dma_buf_poll_cb);
> >>>>> +             if (!r) {
> >>>>> +                     /* Callback queued */
> >>>>> +                     events = 0;
> >>>>>                         goto out;
> >>>>> +             }
> >>>>> +             dma_fence_put(fence);
> >>>>> +     }
> >>>>>
> >>>>> -             for (i = 0; i < shared_count; ++i) {
> >>>>> -                     struct dma_fence *fence = rcu_dereference(fobj->shared[i]);
> >>>>> -
> >>>>> -                     if (!dma_fence_get_rcu(fence)) {
> >>>>> -                             /*
> >>>>> -                              * fence refcount dropped to zero, this means
> >>>>> -                              * that fobj has been freed
> >>>>> -                              *
> >>>>> -                              * call dma_buf_poll_cb and force a recheck!
> >>>>> -                              */
> >>>>> -                             events &= ~EPOLLOUT;
> >>>>> -                             dma_buf_poll_cb(NULL, &dcb->cb);
> >>>>> -                             break;
> >>>>> -                     }
> >>>>> -                     if (!dma_fence_add_callback(fence, &dcb->cb,
> >>>>> -                                                 dma_buf_poll_cb)) {
> >>>>> -                             dma_fence_put(fence);
> >>>>> -                             events &= ~EPOLLOUT;
> >>>>> -                             break;
> >>>>> -                     }
> >>>>> +     fence = dma_resv_get_excl(resv);
> >>>>> +     if (fence && shared_count == 0) {
> >>>>> +             fence = dma_fence_get_rcu(fence);
> >>>>> +             if (!fence || read_seqcount_retry(&resv->seq, seq)) {
> >>>>> +                     /* Concurrent modify detected, force re-check */
> >>>>>                         dma_fence_put(fence);
> >>>>> +                     rcu_read_unlock();
> >>>>> +                     goto retry;
> >>>>> +
> >>>>>                 }
> >>>>>
> >>>>> -             /* No callback queued, wake up any additional waiters. */
> >>>>> -             if (i == shared_count)
> >>>>> -                     dma_buf_poll_cb(NULL, &dcb->cb);
> >>>>> +             r = dma_fence_add_callback(fence, &dcb->cb, dma_buf_poll_cb);
> >>>>> +             if (!r) {
> >>>>> +                     /* Callback queued */
> >>>>> +                     events = 0;
> >>>>> +                     goto out;
> >>>>> +             }
> >>>>> +             dma_fence_put(fence_excl);
> >>>>>         }
> >>>>>
> >>>>> +     /* No callback queued, wake up any additional waiters. */
> >>>>> +     dma_buf_poll_cb(NULL, &dcb->cb);
> >>>>> +
> >>>>>     out:
> >>>>>         rcu_read_unlock();
> >>>>>         return events;
> >>>>> @@ -562,8 +537,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >>>>>         dmabuf->owner = exp_info->owner;
> >>>>>         spin_lock_init(&dmabuf->name_lock);
> >>>>>         init_waitqueue_head(&dmabuf->poll);
> >>>>> -     dmabuf->cb_excl.poll = dmabuf->cb_shared.poll = &dmabuf->poll;
> >>>>> -     dmabuf->cb_excl.active = dmabuf->cb_shared.active = 0;
> >>>>> +     dmabuf->cb_in.poll = dmabuf->cb_out.poll = &dmabuf->poll;
> >>>>> +     dmabuf->cb_in.active = dmabuf->cb_out.active = 0;
> >>>>>
> >>>>>         if (!resv) {
> >>>>>                 resv = (struct dma_resv *)&dmabuf[1];
> >>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> >>>>> index efdc56b9d95f..7e747ad54c81 100644
> >>>>> --- a/include/linux/dma-buf.h
> >>>>> +++ b/include/linux/dma-buf.h
> >>>>> @@ -329,7 +329,7 @@ struct dma_buf {
> >>>>>                 wait_queue_head_t *poll;
> >>>>>
> >>>>>                 __poll_t active;
> >>>>> -     } cb_excl, cb_shared;
> >>>>> +     } cb_in, cb_out;
> >>>>>     };
> >>>>>
> >>>>>     /**
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list