[PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

Wed Aug 17 09:57:08 UTC 2022

Am 16.08.22 um 19:29 schrieb Rob Clark:
> On Tue, Aug 16, 2022 at 9:51 AM Christian König
> <christian.koenig at amd.com> wrote:
>> Am 16.08.22 um 16:26 schrieb Rob Clark:
>>> On Tue, Aug 16, 2022 at 1:27 AM Christian König
>>> <christian.koenig at amd.com> wrote:
>>>> Am 15.08.22 um 23:15 schrieb Rob Clark:
>>>>> From: Rob Clark <robdclark at chromium.org>
>>>>>
>>>>> This is a fairly narrowly focused interface, providing a way for a VMM
>>>>> in userspace to tell the guest kernel what pgprot settings to use when
>>>>> mapping a buffer to guest userspace.
>>>>>
>>>>> For buffers that get mapped into guest userspace, virglrenderer returns
>>>>> a dma-buf fd to the VMM (crosvm or qemu).  In addition to mapping the
>>>>> pages into the guest VM, it needs to report to drm/virtio in the guest
>>>>> the cache settings to use for guest userspace.  In particular, on some
>>>>> architectures, creating aliased mappings with different cache attributes
>>>>> is frowned upon, so it is important that the guest mappings have the
>>>>> same cache attributes as any potential host mappings.
>>>>>
>>>>> Signed-off-by: Rob Clark <robdclark at chromium.org>
>>>>> ---
>>>>> v2: Combine with coherency, as that is a related concept.. and it is
>>>>>        relevant to the VMM whether coherent access without the SYNC ioctl
>>>>>        is possible; set map_info at export time to make it more clear
>>>>>        that it applies for the lifetime of the dma-buf (for any mmap
>>>>>        created via the dma-buf)
>>>> Well, exactly that's a conceptual NAK from my side.
>>>>
>>>> The caching information can change at any time. For CPU mappings even
>>>> without further notice from the exporter.
>>> You should look before you criticize, as I left you a way out.. the
>>> idea was that DMA_BUF_MAP_INCOHERENT should indicate that the buffer
>>> cannot be mapped to the guest.  We could ofc add more DMA_BUF_MAP_*
>>> values if something else would suit you better.  But the goal is to
>>> give the VMM enough information to dtrt, or return an error if mapping
>>> to guest is not possible.  That seems better than assuming mapping to
>>> guest will work and guessing about cache attrs
>> Well I'm not rejecting the implementation, I'm rejecting this from the
>> conceptual point of view.
>>
>> We intentional gave the exporter full control over the CPU mappings.
>> This approach here breaks that now.
>>
>> I haven't seen the full detailed reason why we should do that and to be
>> honest KVM seems to mess with things it is not supposed to touch.
>>
>> For example the page reference count of mappings marked with VM_IO is a
>> complete no-go. This is a very strong evidence that this was based on
>> rather dangerous halve knowledge about the background of the handling here.
>>
>> So as long as I don't see a full explanation why KVM is grabbing
>> reference to pages while faulting them and why we manually need to
>> forward the caching while the hardware documentation indicates otherwise
>> I will be rejecting this whole approach.
> Didn't we cover this on the previous iteration already.  From a host
> kernel PoV these are just normal host userspace mappings.  The
> userspace VMM mapping becomes the "physical address" in the guest and
> the stage 2 translation tables map it to the guest userspace.
>
> The resulting cache attrs from combination of S1 and S2 translation
> can differ.  So ideally we setup the S2 pgtables in guest aligned with
> host userspace mappings

Well exactly that is not very convincing.

What you want to do is to use one channel for the address and a 
different one for the cache attrs, that's not something I would 
recommend doing in general.

Instead the client pgtables should be setup in a way so that host can 
overwrite them.

Regards,
Christian.

>
> BR,
> -R
>
>> Regards,
>> Christian.
>>
>>> BR,
>>> -R
>>>
>>>> If the hardware can't use the caching information from the host CPU page
>>>> tables directly then that pretty much completely breaks the concept that
>>>> the exporter is responsible for setting up those page tables.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>     drivers/dma-buf/dma-buf.c    | 63 +++++++++++++++++++++++++++------
>>>>>     include/linux/dma-buf.h      | 11 ++++++
>>>>>     include/uapi/linux/dma-buf.h | 68 ++++++++++++++++++++++++++++++++++++
>>>>>     3 files changed, 132 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>>> index 32f55640890c..262c4706f721 100644
>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>> @@ -125,6 +125,32 @@ static struct file_system_type dma_buf_fs_type = {
>>>>>         .kill_sb = kill_anon_super,
>>>>>     };
>>>>>
>>>>> +static int __dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
>>>>> +{
>>>>> +     int ret;
>>>>> +
>>>>> +     /* check if buffer supports mmap */
>>>>> +     if (!dmabuf->ops->mmap)
>>>>> +             return -EINVAL;
>>>>> +
>>>>> +     ret = dmabuf->ops->mmap(dmabuf, vma);
>>>>> +
>>>>> +     /*
>>>>> +      * If the exporter claims to support coherent access, ensure the
>>>>> +      * pgprot flags match the claim.
>>>>> +      */
>>>>> +     if ((dmabuf->map_info != DMA_BUF_MAP_INCOHERENT) && !ret) {
>>>>> +             pgprot_t wc_prot = pgprot_writecombine(vma->vm_page_prot);
>>>>> +             if (dmabuf->map_info == DMA_BUF_COHERENT_WC) {
>>>>> +                     WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) != pgprot_val(wc_prot));
>>>>> +             } else {
>>>>> +                     WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) == pgprot_val(wc_prot));
>>>>> +             }
>>>>> +     }
>>>>> +
>>>>> +     return ret;
>>>>> +}
>>>>> +
>>>>>     static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>>>>>     {
>>>>>         struct dma_buf *dmabuf;
>>>>> @@ -134,16 +160,12 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>>>>>
>>>>>         dmabuf = file->private_data;
>>>>>
>>>>> -     /* check if buffer supports mmap */
>>>>> -     if (!dmabuf->ops->mmap)
>>>>> -             return -EINVAL;
>>>>> -
>>>>>         /* check for overflowing the buffer's size */
>>>>>         if (vma->vm_pgoff + vma_pages(vma) >
>>>>>             dmabuf->size >> PAGE_SHIFT)
>>>>>                 return -EINVAL;
>>>>>
>>>>> -     return dmabuf->ops->mmap(dmabuf, vma);
>>>>> +     return __dma_buf_mmap(dmabuf, vma);
>>>>>     }
>>>>>
>>>>>     static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
>>>>> @@ -326,6 +348,27 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const char __user *buf)
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>> +static long dma_buf_info(struct dma_buf *dmabuf, void __user *uarg)
>>>>> +{
>>>>> +     struct dma_buf_info arg;
>>>>> +
>>>>> +     if (copy_from_user(&arg, uarg, sizeof(arg)))
>>>>> +             return -EFAULT;
>>>>> +
>>>>> +     switch (arg.param) {
>>>>> +     case DMA_BUF_INFO_MAP_INFO:
>>>>> +             arg.value = dmabuf->map_info;
>>>>> +             break;
>>>>> +     default:
>>>>> +             return -EINVAL;
>>>>> +     }
>>>>> +
>>>>> +     if (copy_to_user(uarg, &arg, sizeof(arg)))
>>>>> +             return -EFAULT;
>>>>> +
>>>>> +     return 0;
>>>>> +}
>>>>> +
>>>>>     static long dma_buf_ioctl(struct file *file,
>>>>>                           unsigned int cmd, unsigned long arg)
>>>>>     {
>>>>> @@ -369,6 +412,9 @@ static long dma_buf_ioctl(struct file *file,
>>>>>         case DMA_BUF_SET_NAME_B:
>>>>>                 return dma_buf_set_name(dmabuf, (const char __user *)arg);
>>>>>
>>>>> +     case DMA_BUF_IOCTL_INFO:
>>>>> +             return dma_buf_info(dmabuf, (void __user *)arg);
>>>>> +
>>>>>         default:
>>>>>                 return -ENOTTY;
>>>>>         }
>>>>> @@ -530,6 +576,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>>>>>         dmabuf->priv = exp_info->priv;
>>>>>         dmabuf->ops = exp_info->ops;
>>>>>         dmabuf->size = exp_info->size;
>>>>> +     dmabuf->map_info = exp_info->map_info;
>>>>>         dmabuf->exp_name = exp_info->exp_name;
>>>>>         dmabuf->owner = exp_info->owner;
>>>>>         spin_lock_init(&dmabuf->name_lock);
>>>>> @@ -1245,10 +1292,6 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>>>>>         if (WARN_ON(!dmabuf || !vma))
>>>>>                 return -EINVAL;
>>>>>
>>>>> -     /* check if buffer supports mmap */
>>>>> -     if (!dmabuf->ops->mmap)
>>>>> -             return -EINVAL;
>>>>> -
>>>>>         /* check for offset overflow */
>>>>>         if (pgoff + vma_pages(vma) < pgoff)
>>>>>                 return -EOVERFLOW;
>>>>> @@ -1262,7 +1305,7 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>>>>>         vma_set_file(vma, dmabuf->file);
>>>>>         vma->vm_pgoff = pgoff;
>>>>>
>>>>> -     return dmabuf->ops->mmap(dmabuf, vma);
>>>>> +     return __dma_buf_mmap(dmabuf, vma);
>>>>>     }
>>>>>     EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
>>>>>
>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>>>> index 71731796c8c3..37923c8d5c24 100644
>>>>> --- a/include/linux/dma-buf.h
>>>>> +++ b/include/linux/dma-buf.h
>>>>> @@ -23,6 +23,8 @@
>>>>>     #include <linux/dma-fence.h>
>>>>>     #include <linux/wait.h>
>>>>>
>>>>> +#include <uapi/linux/dma-buf.h>
>>>>> +
>>>>>     struct device;
>>>>>     struct dma_buf;
>>>>>     struct dma_buf_attachment;
>>>>> @@ -307,6 +309,13 @@ struct dma_buf {
>>>>>          */
>>>>>         size_t size;
>>>>>
>>>>> +     /**
>>>>> +      * @map_info:
>>>>> +      *
>>>>> +      * CPU mapping/coherency information for the buffer.
>>>>> +      */
>>>>> +     enum dma_buf_map_info map_info;
>>>>> +
>>>>>         /**
>>>>>          * @file:
>>>>>          *
>>>>> @@ -533,6 +542,7 @@ struct dma_buf_attachment {
>>>>>      * @ops:    Attach allocator-defined dma buf ops to the new buffer
>>>>>      * @size:   Size of the buffer - invariant over the lifetime of the buffer
>>>>>      * @flags:  mode flags for the file
>>>>> + * @map_info:        CPU mapping/coherency information for the buffer
>>>>>      * @resv:   reservation-object, NULL to allocate default one
>>>>>      * @priv:   Attach private data of allocator to this buffer
>>>>>      *
>>>>> @@ -545,6 +555,7 @@ struct dma_buf_export_info {
>>>>>         const struct dma_buf_ops *ops;
>>>>>         size_t size;
>>>>>         int flags;
>>>>> +     enum dma_buf_map_info map_info;
>>>>>         struct dma_resv *resv;
>>>>>         void *priv;
>>>>>     };
>>>>> diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
>>>>> index b1523cb8ab30..07b403ffdb43 100644
>>>>> --- a/include/uapi/linux/dma-buf.h
>>>>> +++ b/include/uapi/linux/dma-buf.h
>>>>> @@ -85,6 +85,72 @@ struct dma_buf_sync {
>>>>>
>>>>>     #define DMA_BUF_NAME_LEN    32
>>>>>
>>>>> +/**
>>>>> + * enum dma_buf_map_info - CPU mapping info
>>>>> + *
>>>>> + * This enum describes coherency of a userspace mapping of the dmabuf.
>>>>> + *
>>>>> + * Importing devices should check dma_buf::map_info flag and reject an
>>>>> + * import if unsupported.  For example, if the exporting device uses
>>>>> + * @DMA_BUF_COHERENT_CACHED but the importing device does not support
>>>>> + * CPU cache coherency, the dma-buf import should fail.
>>>>> + */
>>>>> +enum dma_buf_map_info {
>>>>> +     /**
>>>>> +      * @DMA_BUF_MAP_INCOHERENT: CPU mapping is incoherent.
>>>>> +      *
>>>>> +      * Use of DMA_BUF_IOCTL_SYNC is required for CPU managed coherenency.
>>>>> +      */
>>>>> +     DMA_BUF_MAP_INCOHERENT,
>>>>> +
>>>>> +     /**
>>>>> +      * @DMA_BUF_COHERENT_WC: CPU mapping is coherent but not cached.
>>>>> +      *
>>>>> +      * A cpu mmap'ing is coherent, and DMA_BUF_IOCTL_SYNC is not required.
>>>>> +      * However fences may be still required for synchronizing access.  Ie.
>>>>> +      * coherency can only be relied upon by an explicit-fencing userspace.
>>>>> +      * An implicit-sync userspace must still use DMA_BUF_IOCTL_SYNC.
>>>>> +      *
>>>>> +      * The cpu mapping is writecombine.
>>>>> +      */
>>>>> +     DMA_BUF_COHERENT_WC,
>>>>> +
>>>>> +     /**
>>>>> +      * @DMA_BUF_COHERENT_CACHED: CPU mapping is coherent and CPU cached.
>>>>> +      *
>>>>> +      * A cpu mmap'ing is coherent, and DMA_BUF_IOCTL_SYNC is not required.
>>>>> +      * However fences may be still required for synchronizing access.  Ie.
>>>>> +      * coherency can only be relied upon by an explicit-fencing userspace.
>>>>> +      * An implicit-sync userspace must still use DMA_BUF_IOCTL_SYNC.
>>>>> +      *
>>>>> +      * The cpu mapping is cached.
>>>>> +      */
>>>>> +     DMA_BUF_COHERENT_CACHED,
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct dma_buf_info - Query info about the buffer.
>>>>> + */
>>>>> +struct dma_buf_info {
>>>>> +
>>>>> +#define DMA_BUF_INFO_MAP_INFO    1
>>>>> +
>>>>> +     /**
>>>>> +      * @param: Which param to query
>>>>> +      *
>>>>> +      * DMA_BUF_INFO_MAP_INFO:
>>>>> +      *     Returns enum dma_buf_map_info, describing the coherency and
>>>>> +      *     caching of a CPU mapping of the buffer.
>>>>> +      */
>>>>> +     __u32 param;
>>>>> +     __u32 pad;
>>>>> +
>>>>> +     /**
>>>>> +      * @value: Return value of the query.
>>>>> +      */
>>>>> +     __u64 value;
>>>>> +};
>>>>> +
>>>>>     #define DMA_BUF_BASE                'b'
>>>>>     #define DMA_BUF_IOCTL_SYNC  _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
>>>>>
>>>>> @@ -95,4 +161,6 @@ struct dma_buf_sync {
>>>>>     #define DMA_BUF_SET_NAME_A  _IOW(DMA_BUF_BASE, 1, __u32)
>>>>>     #define DMA_BUF_SET_NAME_B  _IOW(DMA_BUF_BASE, 1, __u64)
>>>>>
>>>>> +#define DMA_BUF_IOCTL_INFO   _IOWR(DMA_BUF_BASE, 2, struct dma_buf_info)
>>>>> +
>>>>>     #endif