[PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info
Christian König
christian.koenig at amd.com
Thu Sep 8 06:01:24 UTC 2022
Am 07.09.22 um 18:43 schrieb Daniel Vetter:
> On Tue, Aug 16, 2022 at 06:50:54PM +0200, Christian König wrote:
>> Am 16.08.22 um 16:26 schrieb Rob Clark:
>>> On Tue, Aug 16, 2022 at 1:27 AM Christian König
>>> <christian.koenig at amd.com> wrote:
>>>> Am 15.08.22 um 23:15 schrieb Rob Clark:
>>>>> From: Rob Clark <robdclark at chromium.org>
>>>>>
>>>>> This is a fairly narrowly focused interface, providing a way for a VMM
>>>>> in userspace to tell the guest kernel what pgprot settings to use when
>>>>> mapping a buffer to guest userspace.
>>>>>
>>>>> For buffers that get mapped into guest userspace, virglrenderer returns
>>>>> a dma-buf fd to the VMM (crosvm or qemu). In addition to mapping the
>>>>> pages into the guest VM, it needs to report to drm/virtio in the guest
>>>>> the cache settings to use for guest userspace. In particular, on some
>>>>> architectures, creating aliased mappings with different cache attributes
>>>>> is frowned upon, so it is important that the guest mappings have the
>>>>> same cache attributes as any potential host mappings.
>>>>>
>>>>> Signed-off-by: Rob Clark <robdclark at chromium.org>
>>>>> ---
>>>>> v2: Combine with coherency, as that is a related concept.. and it is
>>>>> relevant to the VMM whether coherent access without the SYNC ioctl
>>>>> is possible; set map_info at export time to make it more clear
>>>>> that it applies for the lifetime of the dma-buf (for any mmap
>>>>> created via the dma-buf)
>>>> Well, exactly that's a conceptual NAK from my side.
>>>>
>>>> The caching information can change at any time. For CPU mappings even
>>>> without further notice from the exporter.
>>> You should look before you criticize, as I left you a way out.. the
>>> idea was that DMA_BUF_MAP_INCOHERENT should indicate that the buffer
>>> cannot be mapped to the guest. We could ofc add more DMA_BUF_MAP_*
>>> values if something else would suit you better. But the goal is to
>>> give the VMM enough information to dtrt, or return an error if mapping
>>> to guest is not possible. That seems better than assuming mapping to
>>> guest will work and guessing about cache attrs
>> Well I'm not rejecting the implementation, I'm rejecting this from the
>> conceptual point of view.
>>
>> We intentional gave the exporter full control over the CPU mappings. This
>> approach here breaks that now.
>>
>> I haven't seen the full detailed reason why we should do that and to be
>> honest KVM seems to mess with things it is not supposed to touch.
>>
>> For example the page reference count of mappings marked with VM_IO is a
>> complete no-go. This is a very strong evidence that this was based on rather
>> dangerous halve knowledge about the background of the handling here.
> Wut?
Yep, that was also my initial reaction.
> KVM grabs page references of VM_IO vma? I thought the issue was that we
> still had some bo/dma-buf vma that didn't set either VM_IO or VM_PFNMAP,
> and not that kvm was just outright breaking every core mm contract there
> is.
>
> Is this really what's going on in that other thread about "fixing" ttm?
At least it seems so. I haven't looked at the actual KVM code in detail,
but from what I've seen it looks like KVM implements similar
functionality to GUP to figure out which struct page a virtual address
points to, but without the correct checks GUP does.
I absolutely don't understand either the why and what. Especially for
the user case of KVM grabbing a page reference doesn't seem to make any
sense.
My suspicion is that this should have been an MMU notifier instead, but
this is really just a gut feeling. Maybe there is a good reason KVM
grabs this reference.
Christian.
> -Daniel
>
>> So as long as I don't see a full explanation why KVM is grabbing reference
>> to pages while faulting them and why we manually need to forward the caching
>> while the hardware documentation indicates otherwise I will be rejecting
>> this whole approach.
>>
>> Regards,
>> Christian.
>>
>>> BR,
>>> -R
>>>
>>>> If the hardware can't use the caching information from the host CPU page
>>>> tables directly then that pretty much completely breaks the concept that
>>>> the exporter is responsible for setting up those page tables.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> drivers/dma-buf/dma-buf.c | 63 +++++++++++++++++++++++++++------
>>>>> include/linux/dma-buf.h | 11 ++++++
>>>>> include/uapi/linux/dma-buf.h | 68 ++++++++++++++++++++++++++++++++++++
>>>>> 3 files changed, 132 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>>> index 32f55640890c..262c4706f721 100644
>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>> @@ -125,6 +125,32 @@ static struct file_system_type dma_buf_fs_type = {
>>>>> .kill_sb = kill_anon_super,
>>>>> };
>>>>>
>>>>> +static int __dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
>>>>> +{
>>>>> + int ret;
>>>>> +
>>>>> + /* check if buffer supports mmap */
>>>>> + if (!dmabuf->ops->mmap)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + ret = dmabuf->ops->mmap(dmabuf, vma);
>>>>> +
>>>>> + /*
>>>>> + * If the exporter claims to support coherent access, ensure the
>>>>> + * pgprot flags match the claim.
>>>>> + */
>>>>> + if ((dmabuf->map_info != DMA_BUF_MAP_INCOHERENT) && !ret) {
>>>>> + pgprot_t wc_prot = pgprot_writecombine(vma->vm_page_prot);
>>>>> + if (dmabuf->map_info == DMA_BUF_COHERENT_WC) {
>>>>> + WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) != pgprot_val(wc_prot));
>>>>> + } else {
>>>>> + WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) == pgprot_val(wc_prot));
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + return ret;
>>>>> +}
>>>>> +
>>>>> static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>>>>> {
>>>>> struct dma_buf *dmabuf;
>>>>> @@ -134,16 +160,12 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>>>>>
>>>>> dmabuf = file->private_data;
>>>>>
>>>>> - /* check if buffer supports mmap */
>>>>> - if (!dmabuf->ops->mmap)
>>>>> - return -EINVAL;
>>>>> -
>>>>> /* check for overflowing the buffer's size */
>>>>> if (vma->vm_pgoff + vma_pages(vma) >
>>>>> dmabuf->size >> PAGE_SHIFT)
>>>>> return -EINVAL;
>>>>>
>>>>> - return dmabuf->ops->mmap(dmabuf, vma);
>>>>> + return __dma_buf_mmap(dmabuf, vma);
>>>>> }
>>>>>
>>>>> static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
>>>>> @@ -326,6 +348,27 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const char __user *buf)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +static long dma_buf_info(struct dma_buf *dmabuf, void __user *uarg)
>>>>> +{
>>>>> + struct dma_buf_info arg;
>>>>> +
>>>>> + if (copy_from_user(&arg, uarg, sizeof(arg)))
>>>>> + return -EFAULT;
>>>>> +
>>>>> + switch (arg.param) {
>>>>> + case DMA_BUF_INFO_MAP_INFO:
>>>>> + arg.value = dmabuf->map_info;
>>>>> + break;
>>>>> + default:
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + if (copy_to_user(uarg, &arg, sizeof(arg)))
>>>>> + return -EFAULT;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> static long dma_buf_ioctl(struct file *file,
>>>>> unsigned int cmd, unsigned long arg)
>>>>> {
>>>>> @@ -369,6 +412,9 @@ static long dma_buf_ioctl(struct file *file,
>>>>> case DMA_BUF_SET_NAME_B:
>>>>> return dma_buf_set_name(dmabuf, (const char __user *)arg);
>>>>>
>>>>> + case DMA_BUF_IOCTL_INFO:
>>>>> + return dma_buf_info(dmabuf, (void __user *)arg);
>>>>> +
>>>>> default:
>>>>> return -ENOTTY;
>>>>> }
>>>>> @@ -530,6 +576,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>>>>> dmabuf->priv = exp_info->priv;
>>>>> dmabuf->ops = exp_info->ops;
>>>>> dmabuf->size = exp_info->size;
>>>>> + dmabuf->map_info = exp_info->map_info;
>>>>> dmabuf->exp_name = exp_info->exp_name;
>>>>> dmabuf->owner = exp_info->owner;
>>>>> spin_lock_init(&dmabuf->name_lock);
>>>>> @@ -1245,10 +1292,6 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>>>>> if (WARN_ON(!dmabuf || !vma))
>>>>> return -EINVAL;
>>>>>
>>>>> - /* check if buffer supports mmap */
>>>>> - if (!dmabuf->ops->mmap)
>>>>> - return -EINVAL;
>>>>> -
>>>>> /* check for offset overflow */
>>>>> if (pgoff + vma_pages(vma) < pgoff)
>>>>> return -EOVERFLOW;
>>>>> @@ -1262,7 +1305,7 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>>>>> vma_set_file(vma, dmabuf->file);
>>>>> vma->vm_pgoff = pgoff;
>>>>>
>>>>> - return dmabuf->ops->mmap(dmabuf, vma);
>>>>> + return __dma_buf_mmap(dmabuf, vma);
>>>>> }
>>>>> EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
>>>>>
>>>>> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
>>>>> index 71731796c8c3..37923c8d5c24 100644
>>>>> --- a/include/linux/dma-buf.h
>>>>> +++ b/include/linux/dma-buf.h
>>>>> @@ -23,6 +23,8 @@
>>>>> #include <linux/dma-fence.h>
>>>>> #include <linux/wait.h>
>>>>>
>>>>> +#include <uapi/linux/dma-buf.h>
>>>>> +
>>>>> struct device;
>>>>> struct dma_buf;
>>>>> struct dma_buf_attachment;
>>>>> @@ -307,6 +309,13 @@ struct dma_buf {
>>>>> */
>>>>> size_t size;
>>>>>
>>>>> + /**
>>>>> + * @map_info:
>>>>> + *
>>>>> + * CPU mapping/coherency information for the buffer.
>>>>> + */
>>>>> + enum dma_buf_map_info map_info;
>>>>> +
>>>>> /**
>>>>> * @file:
>>>>> *
>>>>> @@ -533,6 +542,7 @@ struct dma_buf_attachment {
>>>>> * @ops: Attach allocator-defined dma buf ops to the new buffer
>>>>> * @size: Size of the buffer - invariant over the lifetime of the buffer
>>>>> * @flags: mode flags for the file
>>>>> + * @map_info: CPU mapping/coherency information for the buffer
>>>>> * @resv: reservation-object, NULL to allocate default one
>>>>> * @priv: Attach private data of allocator to this buffer
>>>>> *
>>>>> @@ -545,6 +555,7 @@ struct dma_buf_export_info {
>>>>> const struct dma_buf_ops *ops;
>>>>> size_t size;
>>>>> int flags;
>>>>> + enum dma_buf_map_info map_info;
>>>>> struct dma_resv *resv;
>>>>> void *priv;
>>>>> };
>>>>> diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
>>>>> index b1523cb8ab30..07b403ffdb43 100644
>>>>> --- a/include/uapi/linux/dma-buf.h
>>>>> +++ b/include/uapi/linux/dma-buf.h
>>>>> @@ -85,6 +85,72 @@ struct dma_buf_sync {
>>>>>
>>>>> #define DMA_BUF_NAME_LEN 32
>>>>>
>>>>> +/**
>>>>> + * enum dma_buf_map_info - CPU mapping info
>>>>> + *
>>>>> + * This enum describes coherency of a userspace mapping of the dmabuf.
>>>>> + *
>>>>> + * Importing devices should check dma_buf::map_info flag and reject an
>>>>> + * import if unsupported. For example, if the exporting device uses
>>>>> + * @DMA_BUF_COHERENT_CACHED but the importing device does not support
>>>>> + * CPU cache coherency, the dma-buf import should fail.
>>>>> + */
>>>>> +enum dma_buf_map_info {
>>>>> + /**
>>>>> + * @DMA_BUF_MAP_INCOHERENT: CPU mapping is incoherent.
>>>>> + *
>>>>> + * Use of DMA_BUF_IOCTL_SYNC is required for CPU managed coherenency.
>>>>> + */
>>>>> + DMA_BUF_MAP_INCOHERENT,
>>>>> +
>>>>> + /**
>>>>> + * @DMA_BUF_COHERENT_WC: CPU mapping is coherent but not cached.
>>>>> + *
>>>>> + * A cpu mmap'ing is coherent, and DMA_BUF_IOCTL_SYNC is not required.
>>>>> + * However fences may be still required for synchronizing access. Ie.
>>>>> + * coherency can only be relied upon by an explicit-fencing userspace.
>>>>> + * An implicit-sync userspace must still use DMA_BUF_IOCTL_SYNC.
>>>>> + *
>>>>> + * The cpu mapping is writecombine.
>>>>> + */
>>>>> + DMA_BUF_COHERENT_WC,
>>>>> +
>>>>> + /**
>>>>> + * @DMA_BUF_COHERENT_CACHED: CPU mapping is coherent and CPU cached.
>>>>> + *
>>>>> + * A cpu mmap'ing is coherent, and DMA_BUF_IOCTL_SYNC is not required.
>>>>> + * However fences may be still required for synchronizing access. Ie.
>>>>> + * coherency can only be relied upon by an explicit-fencing userspace.
>>>>> + * An implicit-sync userspace must still use DMA_BUF_IOCTL_SYNC.
>>>>> + *
>>>>> + * The cpu mapping is cached.
>>>>> + */
>>>>> + DMA_BUF_COHERENT_CACHED,
>>>>> +};
>>>>> +
>>>>> +/**
>>>>> + * struct dma_buf_info - Query info about the buffer.
>>>>> + */
>>>>> +struct dma_buf_info {
>>>>> +
>>>>> +#define DMA_BUF_INFO_MAP_INFO 1
>>>>> +
>>>>> + /**
>>>>> + * @param: Which param to query
>>>>> + *
>>>>> + * DMA_BUF_INFO_MAP_INFO:
>>>>> + * Returns enum dma_buf_map_info, describing the coherency and
>>>>> + * caching of a CPU mapping of the buffer.
>>>>> + */
>>>>> + __u32 param;
>>>>> + __u32 pad;
>>>>> +
>>>>> + /**
>>>>> + * @value: Return value of the query.
>>>>> + */
>>>>> + __u64 value;
>>>>> +};
>>>>> +
>>>>> #define DMA_BUF_BASE 'b'
>>>>> #define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
>>>>>
>>>>> @@ -95,4 +161,6 @@ struct dma_buf_sync {
>>>>> #define DMA_BUF_SET_NAME_A _IOW(DMA_BUF_BASE, 1, __u32)
>>>>> #define DMA_BUF_SET_NAME_B _IOW(DMA_BUF_BASE, 1, __u64)
>>>>>
>>>>> +#define DMA_BUF_IOCTL_INFO _IOWR(DMA_BUF_BASE, 2, struct dma_buf_info)
>>>>> +
>>>>> #endif
More information about the dri-devel
mailing list