[PATCH 1/9] dma-buf: start caching of sg_table objects

Wed May 8 12:44:11 UTC 2019

On Wed, May 8, 2019 at 2:23 PM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
>
> Am 08.05.19 um 14:15 schrieb Daniel Vetter:
> > [CAUTION: External Email]
> >
> > On Wed, May 8, 2019 at 2:00 PM Christian König
> > <ckoenig.leichtzumerken at gmail.com> wrote:
> >> Am 08.05.19 um 12:11 schrieb Daniel Vetter:
> >>> On Wed, May 08, 2019 at 11:15:33AM +0200, Daniel Vetter wrote:
> >>>> On Tue, May 07, 2019 at 10:13:30AM +0200, Christian König wrote:
> >>>>> To allow a smooth transition from pinning buffer objects to dynamic
> >>>>> invalidation we first start to cache the sg_table for an attachment.
> >>>>>
> >>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
> >>>>> ---
> >>>>>    drivers/dma-buf/dma-buf.c | 24 ++++++++++++++++++++++++
> >>>>>    include/linux/dma-buf.h   | 14 ++++++++++++++
> >>>>>    2 files changed, 38 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>>>> index 7c858020d14b..775e13f54083 100644
> >>>>> --- a/drivers/dma-buf/dma-buf.c
> >>>>> +++ b/drivers/dma-buf/dma-buf.c
> >>>>> @@ -573,6 +573,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> >>>>>      list_add(&attach->node, &dmabuf->attachments);
> >>>>>
> >>>>>      mutex_unlock(&dmabuf->lock);
> >>>>> +
> >>>>> +   if (!dma_buf_is_dynamic(dmabuf)) {
> >>>>> +           struct sg_table *sgt;
> >>>>> +
> >>>>> +           sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL);
> >>>>> +           if (!sgt)
> >>>>> +                   sgt = ERR_PTR(-ENOMEM);
> >>>>> +           if (IS_ERR(sgt)) {
> >>>>> +                   dma_buf_detach(dmabuf, attach);
> >>>>> +                   return ERR_CAST(sgt);
> >>>>> +           }
> >>>>> +           attach->sgt = sgt;
> >>>>> +   }
> >>>>> +
> >>>>>      return attach;
> >>>>>
> >>>>>    err_attach:
> >>>>> @@ -595,6 +609,10 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach)
> >>>>>      if (WARN_ON(!dmabuf || !attach))
> >>>>>              return;
> >>>>>
> >>>>> +   if (attach->sgt)
> >>>>> +           dmabuf->ops->unmap_dma_buf(attach, attach->sgt,
> >>>>> +                                      DMA_BIDIRECTIONAL);
> >>>>> +
> >>>>>      mutex_lock(&dmabuf->lock);
> >>>>>      list_del(&attach->node);
> >>>>>      if (dmabuf->ops->detach)
> >>>>> @@ -630,6 +648,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
> >>>>>      if (WARN_ON(!attach || !attach->dmabuf))
> >>>>>              return ERR_PTR(-EINVAL);
> >>>>>
> >>>>> +   if (attach->sgt)
> >>>>> +           return attach->sgt;
> >>>>> +
> >>>>>      sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
> >>>>>      if (!sg_table)
> >>>>>              sg_table = ERR_PTR(-ENOMEM);
> >>>>> @@ -657,6 +678,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
> >>>>>      if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
> >>>>>              return;
> >>>>>
> >>>>> +   if (attach->sgt == sg_table)
> >>>>> +           return;
> >>>>> +
> >>>>>      attach->dmabuf->ops->unmap_dma_buf(attach, sg_table,
> >>>>>                                              direction);
> >>>>>    }
> >>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> >>>>> index 58725f890b5b..52031fdc75bb 100644
> >>>>> --- a/include/linux/dma-buf.h
> >>>>> +++ b/include/linux/dma-buf.h
> >>>>> @@ -322,6 +322,7 @@ struct dma_buf_attachment {
> >>>>>      struct dma_buf *dmabuf;
> >>>>>      struct device *dev;
> >>>>>      struct list_head node;
> >>>>> +   struct sg_table *sgt;
> >>>>>      void *priv;
> >>>>>    };
> >>>>>
> >>>>> @@ -373,6 +374,19 @@ static inline void get_dma_buf(struct dma_buf *dmabuf)
> >>>>>      get_file(dmabuf->file);
> >>>>>    }
> >>>>>
> >>>>> +/**
> >>>>> + * dma_buf_is_dynamic - check if a DMA-buf uses dynamic mappings.
> >>>>> + * @dmabuf: the DMA-buf to check
> >>>>> + *
> >>>>> + * Returns true if a DMA-buf exporter wants to create dynamic sg table mappings
> >>>>> + * for each attachment. False if only a single static sg table should be used.
> >>>>> + */
> >>>>> +static inline bool dma_buf_is_dynamic(struct dma_buf *dmabuf)
> >>>>> +{
> >>>>> +   /* Always use a static mapping for now */
> >>>>> +   return false;
> >>>> Hm I still expect that later on we'll want this to be decided by the
> >>>> attachment: It's only dynamic if both the exporter and the importer
> >>>> support dynamic dma-buf management, otherwise we need to pin.
> >>>>
> >>>> But anyway, I feel like we need to go over the entire thing anyway once
> >>>> more when p2p has landed, on this:
> >>>>
> >>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> >>> Correction that I only just realized, but need to retract that r-b on this
> >>> and the drm sgt cache removal patch: You now hardcode the direction to
> >>> DMA_BIDIRECTIONAL, drm_prime did only keep the cache for a given
> >>> direction.
> >>>
> >>> Now x86 drivers always set DMA_BIDIRECTIONAL, but arm soc drivers (and
> >>> also v4l videobuf layer) try to guess whether it should be DMA_TO_DEVICE
> >>> or DMA_FROM_DEVICE. I have no idea what the implications are, also all the
> >>> cache coherency on dma-bufs is kinda ill-defined.
> >>>
> >>> And we can't throw the sgt cache away at map time if it doesn't fit like
> >>> drm_prime does, because that reintroduces the reservation object lock,
> >>> defeating the entire purpose of this. Also we can't just assume drm_prime
> >>> works for everyone, since the special cases all roll their own dma-buf
> >>> import. I have also not checked what exactly exporters do. No idea what to
> >>> do here now.
> >>>
> >>> /me cries
> >> Well it is kind of abusing to use the DMA direction here in the first
> >> place and I actually considered to completely remove it.
> >>
> >> The key problem is that the DMA direction defines things from the view
> >> point of the CPU, e.g. DMA_TO_DEVICE means the data has been accessed by
> >> the CPU and we are now pushing it to a device.
> >>
> >> But DMA-buf is essentially about device to device transfers, so
> >> DMA_TO_DEVICE as well as DMA_FROM_DEVICE is completely ambiguous because
> >> you don't know the point of view to start with.
> >>
> >> Additional to that the caching implementation in drm_prime.c didn't
> >> handled it full either:
> >>> -     /*
> >>> -      * two mappings with different directions for the same attachment are
> >>> -      * not allowed
> >>> -      */
> >>> -     if (WARN_ON(prime_attach->dir != DMA_NONE))
> >>> -             return ERR_PTR(-EBUSY);
> >> Do the ARM guys and V4L guys actually do anything with the direction
> >> except for passing it on dma_map_sg_attrs() ?
> > I don't think so, but that alone is a big thing because on arm, that
> > actually changes what happens wrt flushing and stuff. On x86 the dma
> > subsystem assumes everything is always coherent, and we have a pile of
> > hacks in drivers to make sure the dma access is coherent enough (since
> > on a gpu this is controlled on a per transaction basis usually, fairly
> > often even under userspace's control).
> >
> > I have tried to better understand what a better dma api interface
> > would look like for gpus and dma-buf, but every time that comes up the
> > dma maintainers tell me I'm a fool and that this _must_ be abstracted
> > away by the dma api and I give up again. Hence the crying and no idea
> > what to do here :-/
>
> Well count me to the people who think that the DMA-API on Linux is not a
> good fit for GPUs.
>
> E.g. I've tried to explain multiple times now that we rather want a
> failed mapping instead of using bounce buffers, but that doesn't seem to
> sink in.
>
> > In reality it probably doesn't matter, as long as no one tries to run
> > a gpu driver with dynamic dma-buf export on arm, but I've seen some
> > people trying to get amdgpu at least going in that combo. So not a
> > long-term solution either. And we can't just not cache, because that
> > breaks the world wrt locking ordering I think, or at least a bunch of
> > drm drivers (and not all drm drivers use the drm_prime helpers, so we
> > can't just do this for us and not cache for anyone else and hope that
> > works out).
>
> Well the amdgpu user space stack absolutely needs coherent mappings or
> otherwise won't work. So I'm not worried about amdgpu.
>
> But this doesn't mean that we can ignore it for ARM.
>
> > One option I haven't looked into much, but might work, is that we call
> > the dma api coherency functions if the requested attachment_map/unmap
> > direction doesn't match. But I honestly don't know whether that's
> > allowed and would result in the same thing in the dma-api backends as
> > doing a mapping with the desired direction right away. Also, it would
> > terribly break the layering, so we'd need to add a map_sync callback
> > (there's been patches, but they never landed) and roll them out to all
> > exporters.
>
> Well no, not really. I think I will just go down the route of making sgt
> caching as optional as possible.
>
> E.g. only create the cache mapping during attach if we actually have the
> locking problem.

Well that's the "review and update everyone, and oh welp is there a
lot of them" approach. Which we tried to avoid ...

> So back to coding for now,

So if you want to go this way, what about resurrecting the
dma_buf_attachment_info rework (I really think that's cleaner than
shitloads of arguments, maybe keep the current dma_buf_attach as-is
and do the info argument thing only for dma_buf_attach_ext() or
similar), and have a ->need_caching flag in there? Then we can set
that for drm_prime and i915 at least. Just an idea to avoid changing
the world ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch