[Intel-gfx] [PATCH 1/4] dma-buf: change DMA-buf locking convention

Tue Sep 17 14:47:15 UTC 2019

Am 17.09.19 um 15:45 schrieb Daniel Vetter:
> On Tue, Sep 17, 2019 at 01:24:10PM +0000, Koenig, Christian wrote:
>> Am 17.09.19 um 15:13 schrieb Daniel Vetter:
>>> On Tue, Sep 17, 2019 at 12:40:51PM +0000, Koenig, Christian wrote:
>>>> Am 17.09.19 um 14:31 schrieb Daniel Vetter:
>>>>> On Mon, Sep 16, 2019 at 02:23:13PM +0200, Christian König wrote:
>>>>>> Ping? Any further comment on this or can't we merge at least the locking
>>>>>> change?
>>>>> I was at plumbers ...
>>>>>> Christian.
>>>>>>
>>>>>> Am 11.09.19 um 12:53 schrieb Christian König:
>>>>>>> Am 03.09.19 um 10:05 schrieb Daniel Vetter:
>>>>>>>> On Thu, Aug 29, 2019 at 04:29:14PM +0200, Christian König wrote:
>>>>>>>>> This patch is a stripped down version of the locking changes
>>>>>>>>> necessary to support dynamic DMA-buf handling.
>>>>>>>>>
>>>>>>>>> For compatibility we cache the DMA-buf mapping as soon as
>>>>>>>>> exporter/importer disagree on the dynamic handling.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/dma-buf/dma-buf.c | 90
>>>>>>>>> ++++++++++++++++++++++++++++++++++++---
>>>>>>>>>      include/linux/dma-buf.h   | 51 +++++++++++++++++++++-
>>>>>>>>>      2 files changed, 133 insertions(+), 8 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>>>>>>> index 433d91d710e4..65052d52602b 100644
>>>>>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>>>>>> @@ -525,6 +525,10 @@ struct dma_buf *dma_buf_export(const struct
>>>>>>>>> dma_buf_export_info *exp_info)
>>>>>>>>>              return ERR_PTR(-EINVAL);
>>>>>>>>>          }
>>>>>>>>>      +    if (WARN_ON(exp_info->ops->cache_sgt_mapping &&
>>>>>>>>> +            exp_info->ops->dynamic_mapping))
>>>>>>>>> +        return ERR_PTR(-EINVAL);
>>>>>>>>> +
>>>>>>>>>          if (!try_module_get(exp_info->owner))
>>>>>>>>>              return ERR_PTR(-ENOENT);
>>>>>>>>>      @@ -645,10 +649,11 @@ void dma_buf_put(struct dma_buf *dmabuf)
>>>>>>>>>      EXPORT_SYMBOL_GPL(dma_buf_put);
>>>>>>>>>        /**
>>>>>>>>> - * dma_buf_attach - Add the device to dma_buf's attachments
>>>>>>>>> list; optionally,
>>>>>>>>> + * dma_buf_dynamic_attach - Add the device to dma_buf's
>>>>>>>>> attachments list; optionally,
>>>>>>>>>       * calls attach() of dma_buf_ops to allow device-specific
>>>>>>>>> attach functionality
>>>>>>>>> - * @dmabuf:    [in]    buffer to attach device to.
>>>>>>>>> - * @dev:    [in]    device to be attached.
>>>>>>>>> + * @dmabuf:        [in]    buffer to attach device to.
>>>>>>>>> + * @dev:        [in]    device to be attached.
>>>>>>>>> + * @dynamic_mapping:    [in]    calling convention for map/unmap
>>>>>>>>>       *
>>>>>>>>>       * Returns struct dma_buf_attachment pointer for this
>>>>>>>>> attachment. Attachments
>>>>>>>>>       * must be cleaned up by calling dma_buf_detach().
>>>>>>>>> @@ -662,8 +667,9 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
>>>>>>>>>       * accessible to @dev, and cannot be moved to a more suitable
>>>>>>>>> place. This is
>>>>>>>>>       * indicated with the error code -EBUSY.
>>>>>>>>>       */
>>>>>>>>> -struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>>>>>>>>> -                      struct device *dev)
>>>>>>>>> +struct dma_buf_attachment *
>>>>>>>>> +dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
>>>>>>>>> +               bool dynamic_mapping)
>>>>>>>>>      {
>>>>>>>>>          struct dma_buf_attachment *attach;
>>>>>>>>>          int ret;
>>>>>>>>> @@ -677,6 +683,7 @@ struct dma_buf_attachment
>>>>>>>>> *dma_buf_attach(struct dma_buf *dmabuf,
>>>>>>>>>            attach->dev = dev;
>>>>>>>>>          attach->dmabuf = dmabuf;
>>>>>>>>> +    attach->dynamic_mapping = dynamic_mapping;
>>>>>>>>>            mutex_lock(&dmabuf->lock);
>>>>>>>>>      @@ -685,16 +692,64 @@ struct dma_buf_attachment
>>>>>>>>> *dma_buf_attach(struct dma_buf *dmabuf,
>>>>>>>>>              if (ret)
>>>>>>>>>                  goto err_attach;
>>>>>>>>>          }
>>>>>>>>> +    dma_resv_lock(dmabuf->resv, NULL);
>>>>>>>>>          list_add(&attach->node, &dmabuf->attachments);
>>>>>>>>> +    dma_resv_unlock(dmabuf->resv);
>>>>>>>>>            mutex_unlock(&dmabuf->lock);
>>>>>>>>>      +    /* When either the importer or the exporter can't handle dynamic
>>>>>>>>> +     * mappings we cache the mapping here to avoid issues with the
>>>>>>>>> +     * reservation object lock.
>>>>>>>>> +     */
>>>>>>>>> +    if (dma_buf_attachment_is_dynamic(attach) !=
>>>>>>>>> +        dma_buf_is_dynamic(dmabuf)) {
>>>>>>>>> +        struct sg_table *sgt;
>>>>>>>>> +
>>>>>>>>> +        if (dma_buf_is_dynamic(attach->dmabuf))
>>>>>>>>> +            dma_resv_lock(attach->dmabuf->resv, NULL);
>>>>>>>>> +
>>>>>>>>> +        sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL);
>>>>>>>> Now we're back to enforcing DMA_BIDI, which works nicely around the
>>>>>>>> locking pain, but apparently upsets the arm-soc folks who want to
>>>>>>>> control
>>>>>>>> this better.
>>>>>>> Take another look at dma_buf_map_attachment(), we still try to get the
>>>>>>> caching there for ARM.
>>>>>>>
>>>>>>> What we do here is to bidirectionally map the buffer to avoid the
>>>>>>> locking hydra when importer and exporter disagree on locking.
>>>>>>>
>>>>>>> So the ARM folks can easily avoid that by switching to dynamic locking
>>>>>>> for both.
>>>>> So you still break the contract between importer and exporter, except not
>>>>> for anything that's run in intel-gfx-ci so all is good?
>>>> No, the contract between importer and exporter stays exactly the same it
>>>> is currently as long as you don't switch to dynamic dma-buf handling.
>>>>
>>>> There is no functional change for the ARM folks here. The only change
>>>> which takes effect is between i915 and amdgpu and that is perfectly
>>>> covered by intel-gfx-ci.
>>> There's people who want to run amdgpu on ARM?
>> Sure there are, we even recently fixed some bugs for this.
>>
>> But as far as I know there is no one currently which is affect by this
>> change on ARM with amdgpu.
> But don't you break them with this now?

No, we see the bidirectional attachment as compatible with the other ones.

> amdgpu will soon set the dynamic flag on exports, which forces the caching
> at create time (to avoid the locking fun), which will then result in a
> EBUSY at map_attachment time because we have a cached mapping, but it's
> the wrong type.

See the check in dma_buf_map_attachment():

     if (attach->dir != direction && attach->dir != DMA_BIDIRECTIONAL)
         return ERR_PTR(-EBUSY);

Regards,
Christian.

>
>>> Also, x86 doesn't have cache
>>> flushing in the dma-api, so naturally this isn't any issue for us (we
>>> still have cache flushing in actual hw, but that's a different topic). So
>>> "works on x86" isn't really a great way to justify what we do here I
>>> think.
>> Well it is the exact same caching we previously had as well, so there is
>> absolutely no functional change here except that we now explicitly note
>> that amdgpu always needs bidirectional mappings.
>>
>> I agree that we should get rid of this caching as soon as possible, but
>> we should not fix things which where broken before.
>>
>> On the other hand adding dma_sg_sync_for_cpu/device sounds like
>> something we could easily add separately to the caching if you think
>> that this will help.
> The current code maybe lacks some cache flushes, but we already require a
> fixed direction per attachment. So I guess not a real problem, probably.
>
> But with your patches I think we now fail with EBUSY. Not exactly nice ...
> -Daniel
>
>> Christian.
>>
>>> -Daniel
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> The other issue with "we solve this with caching the mapping": Currently
>>>>> map/unmap flush (at least on arm, at least on cases where it matters). If
>>>>> you just return the cached sg, then we don't do the flushing anymore,
>>>>> which might break importers/exporters in exactly the same way as just
>>>>> giving them the wrong mapping. There's zero differences between a BIDI,
>>>>> TO_CPU, or TO_DEVICE mapping, the only places where this matters is for
>>>>> cache flushing.
>>>>>
>>>>> So here's something that could actually work:
>>>>> - We cache the mapping.
>>>>> - We cache a bidirectional mapping.
>>>>> - We put the right dma_sg_sync_for_cpu/device calls in place for map/unmap
>>>>>      to give current importers/exporters the same behaviour they're used to
>>>>>      now.
>>>>>
>>>>> And yes the caching we've lifted might have broken something somewhere
>>>>> already. But generally you only hear about that long time after because
>>>>> arm vendors roll forward once every few years. Or something like that.
>>>>> -Daniel
>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> That's why your previous version moved the caching into
>>>>>>>> map/unmap_sg, which resurrected the locking hydra.
>>>>>>>>
>>>>>>>> I think we're going a bit in circles here, and I don't have a good idea
>>>>>>>> either :-/
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>