[PATCH 1/4] dma-buf: change DMA-buf locking convention

Tue Sep 17 14:56:53 UTC 2019

On Tue, Sep 17, 2019 at 4:47 PM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
>
> Am 17.09.19 um 15:45 schrieb Daniel Vetter:
> > On Tue, Sep 17, 2019 at 01:24:10PM +0000, Koenig, Christian wrote:
> >> Am 17.09.19 um 15:13 schrieb Daniel Vetter:
> >>> On Tue, Sep 17, 2019 at 12:40:51PM +0000, Koenig, Christian wrote:
> >>>> Am 17.09.19 um 14:31 schrieb Daniel Vetter:
> >>>>> On Mon, Sep 16, 2019 at 02:23:13PM +0200, Christian König wrote:
> >>>>>> Ping? Any further comment on this or can't we merge at least the locking
> >>>>>> change?
> >>>>> I was at plumbers ...
> >>>>>> Christian.
> >>>>>>
> >>>>>> Am 11.09.19 um 12:53 schrieb Christian König:
> >>>>>>> Am 03.09.19 um 10:05 schrieb Daniel Vetter:
> >>>>>>>> On Thu, Aug 29, 2019 at 04:29:14PM +0200, Christian König wrote:
> >>>>>>>>> This patch is a stripped down version of the locking changes
> >>>>>>>>> necessary to support dynamic DMA-buf handling.
> >>>>>>>>>
> >>>>>>>>> For compatibility we cache the DMA-buf mapping as soon as
> >>>>>>>>> exporter/importer disagree on the dynamic handling.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
> >>>>>>>>> ---
> >>>>>>>>>      drivers/dma-buf/dma-buf.c | 90
> >>>>>>>>> ++++++++++++++++++++++++++++++++++++---
> >>>>>>>>>      include/linux/dma-buf.h   | 51 +++++++++++++++++++++-
> >>>>>>>>>      2 files changed, 133 insertions(+), 8 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>>>>>>>> index 433d91d710e4..65052d52602b 100644
> >>>>>>>>> --- a/drivers/dma-buf/dma-buf.c
> >>>>>>>>> +++ b/drivers/dma-buf/dma-buf.c
> >>>>>>>>> @@ -525,6 +525,10 @@ struct dma_buf *dma_buf_export(const struct
> >>>>>>>>> dma_buf_export_info *exp_info)
> >>>>>>>>>              return ERR_PTR(-EINVAL);
> >>>>>>>>>          }
> >>>>>>>>>      +    if (WARN_ON(exp_info->ops->cache_sgt_mapping &&
> >>>>>>>>> +            exp_info->ops->dynamic_mapping))
> >>>>>>>>> +        return ERR_PTR(-EINVAL);
> >>>>>>>>> +
> >>>>>>>>>          if (!try_module_get(exp_info->owner))
> >>>>>>>>>              return ERR_PTR(-ENOENT);
> >>>>>>>>>      @@ -645,10 +649,11 @@ void dma_buf_put(struct dma_buf *dmabuf)
> >>>>>>>>>      EXPORT_SYMBOL_GPL(dma_buf_put);
> >>>>>>>>>        /**
> >>>>>>>>> - * dma_buf_attach - Add the device to dma_buf's attachments
> >>>>>>>>> list; optionally,
> >>>>>>>>> + * dma_buf_dynamic_attach - Add the device to dma_buf's
> >>>>>>>>> attachments list; optionally,
> >>>>>>>>>       * calls attach() of dma_buf_ops to allow device-specific
> >>>>>>>>> attach functionality
> >>>>>>>>> - * @dmabuf:    [in]    buffer to attach device to.
> >>>>>>>>> - * @dev:    [in]    device to be attached.
> >>>>>>>>> + * @dmabuf:        [in]    buffer to attach device to.
> >>>>>>>>> + * @dev:        [in]    device to be attached.
> >>>>>>>>> + * @dynamic_mapping:    [in]    calling convention for map/unmap
> >>>>>>>>>       *
> >>>>>>>>>       * Returns struct dma_buf_attachment pointer for this
> >>>>>>>>> attachment. Attachments
> >>>>>>>>>       * must be cleaned up by calling dma_buf_detach().
> >>>>>>>>> @@ -662,8 +667,9 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
> >>>>>>>>>       * accessible to @dev, and cannot be moved to a more suitable
> >>>>>>>>> place. This is
> >>>>>>>>>       * indicated with the error code -EBUSY.
> >>>>>>>>>       */
> >>>>>>>>> -struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> >>>>>>>>> -                      struct device *dev)
> >>>>>>>>> +struct dma_buf_attachment *
> >>>>>>>>> +dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
> >>>>>>>>> +               bool dynamic_mapping)
> >>>>>>>>>      {
> >>>>>>>>>          struct dma_buf_attachment *attach;
> >>>>>>>>>          int ret;
> >>>>>>>>> @@ -677,6 +683,7 @@ struct dma_buf_attachment
> >>>>>>>>> *dma_buf_attach(struct dma_buf *dmabuf,
> >>>>>>>>>            attach->dev = dev;
> >>>>>>>>>          attach->dmabuf = dmabuf;
> >>>>>>>>> +    attach->dynamic_mapping = dynamic_mapping;
> >>>>>>>>>            mutex_lock(&dmabuf->lock);
> >>>>>>>>>      @@ -685,16 +692,64 @@ struct dma_buf_attachment
> >>>>>>>>> *dma_buf_attach(struct dma_buf *dmabuf,
> >>>>>>>>>              if (ret)
> >>>>>>>>>                  goto err_attach;
> >>>>>>>>>          }
> >>>>>>>>> +    dma_resv_lock(dmabuf->resv, NULL);
> >>>>>>>>>          list_add(&attach->node, &dmabuf->attachments);
> >>>>>>>>> +    dma_resv_unlock(dmabuf->resv);
> >>>>>>>>>            mutex_unlock(&dmabuf->lock);
> >>>>>>>>>      +    /* When either the importer or the exporter can't handle dynamic
> >>>>>>>>> +     * mappings we cache the mapping here to avoid issues with the
> >>>>>>>>> +     * reservation object lock.
> >>>>>>>>> +     */
> >>>>>>>>> +    if (dma_buf_attachment_is_dynamic(attach) !=
> >>>>>>>>> +        dma_buf_is_dynamic(dmabuf)) {
> >>>>>>>>> +        struct sg_table *sgt;
> >>>>>>>>> +
> >>>>>>>>> +        if (dma_buf_is_dynamic(attach->dmabuf))
> >>>>>>>>> +            dma_resv_lock(attach->dmabuf->resv, NULL);
> >>>>>>>>> +
> >>>>>>>>> +        sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL);
> >>>>>>>> Now we're back to enforcing DMA_BIDI, which works nicely around the
> >>>>>>>> locking pain, but apparently upsets the arm-soc folks who want to
> >>>>>>>> control
> >>>>>>>> this better.
> >>>>>>> Take another look at dma_buf_map_attachment(), we still try to get the
> >>>>>>> caching there for ARM.
> >>>>>>>
> >>>>>>> What we do here is to bidirectionally map the buffer to avoid the
> >>>>>>> locking hydra when importer and exporter disagree on locking.
> >>>>>>>
> >>>>>>> So the ARM folks can easily avoid that by switching to dynamic locking
> >>>>>>> for both.
> >>>>> So you still break the contract between importer and exporter, except not
> >>>>> for anything that's run in intel-gfx-ci so all is good?
> >>>> No, the contract between importer and exporter stays exactly the same it
> >>>> is currently as long as you don't switch to dynamic dma-buf handling.
> >>>>
> >>>> There is no functional change for the ARM folks here. The only change
> >>>> which takes effect is between i915 and amdgpu and that is perfectly
> >>>> covered by intel-gfx-ci.
> >>> There's people who want to run amdgpu on ARM?
> >> Sure there are, we even recently fixed some bugs for this.
> >>
> >> But as far as I know there is no one currently which is affect by this
> >> change on ARM with amdgpu.
> > But don't you break them with this now?
>
> No, we see the bidirectional attachment as compatible with the other ones.
>
> > amdgpu will soon set the dynamic flag on exports, which forces the caching
> > at create time (to avoid the locking fun), which will then result in a
> > EBUSY at map_attachment time because we have a cached mapping, but it's
> > the wrong type.
>
> See the check in dma_buf_map_attachment():
>
>      if (attach->dir != direction && attach->dir != DMA_BIDIRECTIONAL)
>          return ERR_PTR(-EBUSY);

Hm, I misread this. So yeah should work, +/- the issue that we might
not flush enough. But I guess that can be fixed whenever, it's not
like dma-api semantics are a great fit for us. Maybe a fixme comment
would be useful here ... I'll look at this tomorrow or so because atm
brain is slow, I'm down with the usual post-conference cold it seems
:-/
-Daniel

>
> Regards,
> Christian.
>
> >
> >>> Also, x86 doesn't have cache
> >>> flushing in the dma-api, so naturally this isn't any issue for us (we
> >>> still have cache flushing in actual hw, but that's a different topic). So
> >>> "works on x86" isn't really a great way to justify what we do here I
> >>> think.
> >> Well it is the exact same caching we previously had as well, so there is
> >> absolutely no functional change here except that we now explicitly note
> >> that amdgpu always needs bidirectional mappings.
> >>
> >> I agree that we should get rid of this caching as soon as possible, but
> >> we should not fix things which where broken before.
> >>
> >> On the other hand adding dma_sg_sync_for_cpu/device sounds like
> >> something we could easily add separately to the caching if you think
> >> that this will help.
> > The current code maybe lacks some cache flushes, but we already require a
> > fixed direction per attachment. So I guess not a real problem, probably.
> >
> > But with your patches I think we now fail with EBUSY. Not exactly nice ...
> > -Daniel
> >
> >> Christian.
> >>
> >>> -Daniel
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> The other issue with "we solve this with caching the mapping": Currently
> >>>>> map/unmap flush (at least on arm, at least on cases where it matters). If
> >>>>> you just return the cached sg, then we don't do the flushing anymore,
> >>>>> which might break importers/exporters in exactly the same way as just
> >>>>> giving them the wrong mapping. There's zero differences between a BIDI,
> >>>>> TO_CPU, or TO_DEVICE mapping, the only places where this matters is for
> >>>>> cache flushing.
> >>>>>
> >>>>> So here's something that could actually work:
> >>>>> - We cache the mapping.
> >>>>> - We cache a bidirectional mapping.
> >>>>> - We put the right dma_sg_sync_for_cpu/device calls in place for map/unmap
> >>>>>      to give current importers/exporters the same behaviour they're used to
> >>>>>      now.
> >>>>>
> >>>>> And yes the caching we've lifted might have broken something somewhere
> >>>>> already. But generally you only hear about that long time after because
> >>>>> arm vendors roll forward once every few years. Or something like that.
> >>>>> -Daniel
> >>>>>
> >>>>>>> Regards,
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>>> That's why your previous version moved the caching into
> >>>>>>>> map/unmap_sg, which resurrected the locking hydra.
> >>>>>>>>
> >>>>>>>> I think we're going a bit in circles here, and I don't have a good idea
> >>>>>>>> either :-/
> >>>>>>>> -Daniel
> >>>>>>>>
> >>>>>>>>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch