[PATCH 1/9] dma-buf: start caching of sg_table objects

Koenig, Christian Christian.Koenig at amd.com
Wed May 8 12:23:06 UTC 2019


Am 08.05.19 um 14:15 schrieb Daniel Vetter:
> [CAUTION: External Email]
>
> On Wed, May 8, 2019 at 2:00 PM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>> Am 08.05.19 um 12:11 schrieb Daniel Vetter:
>>> On Wed, May 08, 2019 at 11:15:33AM +0200, Daniel Vetter wrote:
>>>> On Tue, May 07, 2019 at 10:13:30AM +0200, Christian König wrote:
>>>>> To allow a smooth transition from pinning buffer objects to dynamic
>>>>> invalidation we first start to cache the sg_table for an attachment.
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>> ---
>>>>>    drivers/dma-buf/dma-buf.c | 24 ++++++++++++++++++++++++
>>>>>    include/linux/dma-buf.h   | 14 ++++++++++++++
>>>>>    2 files changed, 38 insertions(+)
>>>>>
>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>>> index 7c858020d14b..775e13f54083 100644
>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>> @@ -573,6 +573,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>>>>>      list_add(&attach->node, &dmabuf->attachments);
>>>>>
>>>>>      mutex_unlock(&dmabuf->lock);
>>>>> +
>>>>> +   if (!dma_buf_is_dynamic(dmabuf)) {
>>>>> +           struct sg_table *sgt;
>>>>> +
>>>>> +           sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL);
>>>>> +           if (!sgt)
>>>>> +                   sgt = ERR_PTR(-ENOMEM);
>>>>> +           if (IS_ERR(sgt)) {
>>>>> +                   dma_buf_detach(dmabuf, attach);
>>>>> +                   return ERR_CAST(sgt);
>>>>> +           }
>>>>> +           attach->sgt = sgt;
>>>>> +   }
>>>>> +
>>>>>      return attach;
>>>>>
>>>>>    err_attach:
>>>>> @@ -595,6 +609,10 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach)
>>>>>      if (WARN_ON(!dmabuf || !attach))
>>>>>              return;
>>>>>
>>>>> +   if (attach->sgt)
>>>>> +           dmabuf->ops->unmap_dma_buf(attach, attach->sgt,
>>>>> +                                      DMA_BIDIRECTIONAL);
>>>>> +
>>>>>      mutex_lock(&dmabuf->lock);
>>>>>      list_del(&attach->node);
>>>>>      if (dmabuf->ops->detach)
>>>>> @@ -630,6 +648,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
>>>>>      if (WARN_ON(!attach || !attach->dmabuf))
>>>>>              return ERR_PTR(-EINVAL);
>>>>>
>>>>> +   if (attach->sgt)
>>>>> +           return attach->sgt;
>>>>> +
>>>>>      sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
>>>>>      if (!sg_table)
>>>>>              sg_table = ERR_PTR(-ENOMEM);
>>>>> @@ -657,6 +678,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
>>>>>      if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
>>>>>              return;
>>>>>
>>>>> +   if (attach->sgt == sg_table)
>>>>> +           return;
>>>>> +
>>>>>      attach->dmabuf->ops->unmap_dma_buf(attach, sg_table,
>>>>>                                              direction);
>>>>>    }
>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>>>> index 58725f890b5b..52031fdc75bb 100644
>>>>> --- a/include/linux/dma-buf.h
>>>>> +++ b/include/linux/dma-buf.h
>>>>> @@ -322,6 +322,7 @@ struct dma_buf_attachment {
>>>>>      struct dma_buf *dmabuf;
>>>>>      struct device *dev;
>>>>>      struct list_head node;
>>>>> +   struct sg_table *sgt;
>>>>>      void *priv;
>>>>>    };
>>>>>
>>>>> @@ -373,6 +374,19 @@ static inline void get_dma_buf(struct dma_buf *dmabuf)
>>>>>      get_file(dmabuf->file);
>>>>>    }
>>>>>
>>>>> +/**
>>>>> + * dma_buf_is_dynamic - check if a DMA-buf uses dynamic mappings.
>>>>> + * @dmabuf: the DMA-buf to check
>>>>> + *
>>>>> + * Returns true if a DMA-buf exporter wants to create dynamic sg table mappings
>>>>> + * for each attachment. False if only a single static sg table should be used.
>>>>> + */
>>>>> +static inline bool dma_buf_is_dynamic(struct dma_buf *dmabuf)
>>>>> +{
>>>>> +   /* Always use a static mapping for now */
>>>>> +   return false;
>>>> Hm I still expect that later on we'll want this to be decided by the
>>>> attachment: It's only dynamic if both the exporter and the importer
>>>> support dynamic dma-buf management, otherwise we need to pin.
>>>>
>>>> But anyway, I feel like we need to go over the entire thing anyway once
>>>> more when p2p has landed, on this:
>>>>
>>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>>> Correction that I only just realized, but need to retract that r-b on this
>>> and the drm sgt cache removal patch: You now hardcode the direction to
>>> DMA_BIDIRECTIONAL, drm_prime did only keep the cache for a given
>>> direction.
>>>
>>> Now x86 drivers always set DMA_BIDIRECTIONAL, but arm soc drivers (and
>>> also v4l videobuf layer) try to guess whether it should be DMA_TO_DEVICE
>>> or DMA_FROM_DEVICE. I have no idea what the implications are, also all the
>>> cache coherency on dma-bufs is kinda ill-defined.
>>>
>>> And we can't throw the sgt cache away at map time if it doesn't fit like
>>> drm_prime does, because that reintroduces the reservation object lock,
>>> defeating the entire purpose of this. Also we can't just assume drm_prime
>>> works for everyone, since the special cases all roll their own dma-buf
>>> import. I have also not checked what exactly exporters do. No idea what to
>>> do here now.
>>>
>>> /me cries
>> Well it is kind of abusing to use the DMA direction here in the first
>> place and I actually considered to completely remove it.
>>
>> The key problem is that the DMA direction defines things from the view
>> point of the CPU, e.g. DMA_TO_DEVICE means the data has been accessed by
>> the CPU and we are now pushing it to a device.
>>
>> But DMA-buf is essentially about device to device transfers, so
>> DMA_TO_DEVICE as well as DMA_FROM_DEVICE is completely ambiguous because
>> you don't know the point of view to start with.
>>
>> Additional to that the caching implementation in drm_prime.c didn't
>> handled it full either:
>>> -     /*
>>> -      * two mappings with different directions for the same attachment are
>>> -      * not allowed
>>> -      */
>>> -     if (WARN_ON(prime_attach->dir != DMA_NONE))
>>> -             return ERR_PTR(-EBUSY);
>> Do the ARM guys and V4L guys actually do anything with the direction
>> except for passing it on dma_map_sg_attrs() ?
> I don't think so, but that alone is a big thing because on arm, that
> actually changes what happens wrt flushing and stuff. On x86 the dma
> subsystem assumes everything is always coherent, and we have a pile of
> hacks in drivers to make sure the dma access is coherent enough (since
> on a gpu this is controlled on a per transaction basis usually, fairly
> often even under userspace's control).
>
> I have tried to better understand what a better dma api interface
> would look like for gpus and dma-buf, but every time that comes up the
> dma maintainers tell me I'm a fool and that this _must_ be abstracted
> away by the dma api and I give up again. Hence the crying and no idea
> what to do here :-/

Well count me to the people who think that the DMA-API on Linux is not a 
good fit for GPUs.

E.g. I've tried to explain multiple times now that we rather want a 
failed mapping instead of using bounce buffers, but that doesn't seem to 
sink in.

> In reality it probably doesn't matter, as long as no one tries to run
> a gpu driver with dynamic dma-buf export on arm, but I've seen some
> people trying to get amdgpu at least going in that combo. So not a
> long-term solution either. And we can't just not cache, because that
> breaks the world wrt locking ordering I think, or at least a bunch of
> drm drivers (and not all drm drivers use the drm_prime helpers, so we
> can't just do this for us and not cache for anyone else and hope that
> works out).

Well the amdgpu user space stack absolutely needs coherent mappings or 
otherwise won't work. So I'm not worried about amdgpu.

But this doesn't mean that we can ignore it for ARM.

> One option I haven't looked into much, but might work, is that we call
> the dma api coherency functions if the requested attachment_map/unmap
> direction doesn't match. But I honestly don't know whether that's
> allowed and would result in the same thing in the dma-api backends as
> doing a mapping with the desired direction right away. Also, it would
> terribly break the layering, so we'd need to add a map_sync callback
> (there's been patches, but they never landed) and roll them out to all
> exporters.

Well no, not really. I think I will just go down the route of making sgt 
caching as optional as possible.

E.g. only create the cache mapping during attach if we actually have the 
locking problem.

So back to coding for now,
Christian.

>
> Or maybe just back to crying :-)
> -Daniel



More information about the dri-devel mailing list