[Linaro-mm-sig] [RFCv2 PATCH 2/9 - 4/4] v4l: vb2-dma-contig: update and code refactoring

Thu Mar 22 07:52:21 PDT 2012

Hi Laurent,

On 03/22/2012 08:12 PM, Laurent Pinchart wrote:
> Hi Tomasz,
>
> On Thursday 22 March 2012 14:36:33 Tomasz Stanislawski wrote:
>> Hi Laurent,
>> Thank you very much for your comments and question.
>> They were very useful.
>
> You're welcome.
>
>> Please refer to the comments below.
>>
>> On 03/22/2012 11:50 AM, Laurent Pinchart wrote:
>>> On Thursday 22 March 2012 11:02:23 Laurent Pinchart wrote:
>>>> From: Tomasz Stanislawski<t.stanislaws at samsung.com>
>>>>
>>>> This patch combines updates and fixes to dma-contig allocator.
>>>> Moreover the allocator code was refactored.
>>>> The most important changes are:
>>>> - functions were reordered
>>>> - move compression of scatterlist to separete function
>>>> - add support for multichunk but contiguous scatterlists
>>>> - simplified implementation of vb2-dma-contig context structure
>>>> - let mmap method to use dma_mmap_writecombine
>>>> - add support for scatterlist in userptr mode
>
> [snip]
>
>>>> diff --git a/drivers/media/video/videobuf2-dma-contig.c
>>>> b/drivers/media/video/videobuf2-dma-contig.c index c898e6f..9965465
>>>> 100644
>>>> --- a/drivers/media/video/videobuf2-dma-contig.c
>>>> +++ b/drivers/media/video/videobuf2-dma-contig.c
>
> [snip]
>
>>>>   static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size)
>>>>   {
>>>>
>>>>   	struct device *dev = alloc_ctx;
>>>>   	struct vb2_dc_buf *buf;
>>>>
>>>> +	int ret;
>>>> +	int
>>>> n_pages;http://165.213.219.115/cgi-bin/gitweb.cgi?p=mirror/linux-3.0-mid
>>>> as;a=shortlog;h=refs/heads/exynos-3.0-dev +	struct page **pages = NULL;
>>>>
>>>>   	buf = kzalloc(sizeof *buf, GFP_KERNEL);
>>>>   	if (!buf)
>>>>   	
>>>>   		return ERR_PTR(-ENOMEM);
>>>>
>>>> -	buf->vaddr = dma_alloc_coherent(dev, size,&buf->dma_addr,
> GFP_KERNEL);
>>>> +	buf->dev = dev;
>>>> +	buf->size = size;
>>>> +	buf->vaddr = dma_alloc_coherent(buf->dev, buf->size,&buf->dma_addr,
>>>> +		GFP_KERNEL);
>>>> +
>>>> +	ret = -ENOMEM;
>>>>
>>>>   	if (!buf->vaddr) {
>>>>
>>>> -		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
>>>> -		kfree(buf);
>>>> -		return ERR_PTR(-ENOMEM);
>>>> +		dev_err(dev, "dma_alloc_coherent of size %ld failed\n",
>>>> +			size);
>>>> +		goto fail_buf;
>>>>
>>>>   	}
>>>>
>>>> -	buf->dev = dev;
>>>> -	buf->size = size;
>>>> +	WARN_ON((unsigned long)buf->vaddr&  ~PAGE_MASK);
>>>> +	WARN_ON(buf->dma_addr&  ~PAGE_MASK);
>>>> +
>>>> +	n_pages = PAGE_ALIGN(size)>>  PAGE_SHIFT;
>>>> +
>>>> +	pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);
>>>> +	if (!pages) {
>>>> +		printk(KERN_ERR "failed to alloc page table\n");
>>>> +		goto fail_dma;
>>>> +	}
>>>> +
>>>> +	ret = dma_get_pages(dev, buf->vaddr, buf->dma_addr, pages, n_pages);
>>>
>>> As the only purpose of this is to retrieve a list of pages that will be
>>> used to create a single-entry sgt, wouldn't it be possible to shortcut
>>> the code and get the physical address of the buffer directly ?
>>
>> The physical address should not be used since they are meaningless in a
>> context of different devices. It seams that only the list of pages is
>> more-or-less portable between different drivers.
>
> The pages are physically contiguous. The physical address of the first page is
> thus all you need.
>
> struct page and physical addresses can be used interchangeably in this case if
> I'm not mistaken. If you want to go with pages, you could use the first page
> only instead of the physical buffer address.
>
>> The physical address is already present in buf->dma_addr, but it is only
>> valid if the device has no MMU. Notice that vb2-dma-contig possess no
>> knowledge if MMU is present for a given device.
>
> That's why buf->dma_addr can't be considered as a physical address. It's only
> useful in the device context.
>
>> The sg list is not going to be single-entry if the device is provided with
>> its own MMU.
>
> There's something I don't get then. vb2-dma-contig deals with physically
> contiguous buffers. The buffer is backed by physically contiguous pages, so
> the sg list should have a single entry.
>
I think at present, vb2-dma-contig is abused for any kind of memory 
allocation (continuous or not). Wouldnt it be good to have a proper 
working setup for videobuf2-dma-sg instead? Driver which chooses to use 
continuous, shall assign vb2_queue->mem_ops = vb2_dma_contig_memops. The 
devices which know they have a MMU backing can assign the same to 
vb2_dma_sg_memops. But as of now, we try to use vb2_dma_contig_memops 
for all kind of operation. I have also done this mistake, and wish I 
repaired it and posted it before :(

>>>> +	if (ret<  0) {
>>>> +		printk(KERN_ERR "failed to get buffer pages from DMA API\n");
>>>> +		goto fail_pages;
>>>> +	}
>>>> +	if (ret != n_pages) {
>>>> +		ret = -EFAULT;
>>>> +		printk(KERN_ERR "failed to get all pages from DMA API\n");
>>>> +		goto fail_pages;
>>>> +	}
>>>> +
>>>> +	buf->sgt_base = vb2_dc_pages_to_sgt(pages, n_pages, 0, 0);
>>>> +	if (IS_ERR(buf->sgt_base)) {
>>>> +		ret = PTR_ERR(buf->sgt_base);
>>>> +		printk(KERN_ERR "failed to prepare sg table\n");
>>>> +		goto fail_pages;
>>>> +	}
>>>
>>> buf->sgt_base isn't used in this patch. I would move the buf->sgt_base
>>> creation code to the patch that uses it then, or to its own patch just
>>> before the patch that uses it.
>>
>> Good point. The sgt_base is used by exporter only. Thanks for noticing it.
>>
>>>> +
>>>> +	/* pages are no longer needed */
>>>> +	kfree(pages);
>>>>
>>>>   	buf->handler.refcount =&buf->refcount;
>>>>   	buf->handler.put = vb2_dc_put;
>
> [snip]
>
>>>>   /*********************************************/
>>>>   /*       callbacks for USERPTR buffers       */
>>>>   /*********************************************/
>>>>
>>>> +static inline int vma_is_io(struct vm_area_struct *vma)
>>>> +{
>>>> +	return !!(vma->vm_flags&  (VM_IO | VM_PFNMAP));
>>>
>>> Isn't VM_PFNMAP enough ? Wouldn't it be possible (at least in theory) to
>>> get a discontinuous physical range with VM_IO ?
>>
>> Frankly, I found that that in get_user_pages flags are checked against
>> (VM_IO | VM_PFNMAP). Probably for noMMU (not no IOMMU) case it is possible
>> to get vma with VM_IO on and VM_PFNMAP off, isn't it?
>>
>> The problem is that this framework should work in both cases so this
>> check was added just in case :).
>
> OK. We can leave it here and deal with problems if they arise :-)
>
>>>> +}
>>>> +
>>>> +static int vb2_dc_get_pages(unsigned long start, struct page **pages,
>>>> +	int n_pages, struct vm_area_struct **copy_vma, int write)
>>>> +{
>>>> +	struct vm_area_struct *vma;
>>>> +	int n = 0; /* number of get pages */
>>>> +	int ret = -EFAULT;
>>>> +
>>>> +	/* entering critical section for mm access */
>>>> +	down_read(&current->mm->mmap_sem);
>>>
>>> This will generate AB-BA deadlock warnings if lockdep is enabled. This
>>> function is called with the queue lock held, and the mmap() handler which
>>> takes the queue lock is called with current->mm->mmap_sem held.
>>>
>>> This is a known issue with videobuf2, not specific to this patch. The
>>> warning is usually a false positive (which we still need to fix, as it
>>> worries users), but can become a real issue if an MMAP queue and a
>>> USERPTR queue are created by a driver with the same queue lock.
>>
>> Good point. Do you know any good solution to this problem?
>
> http://patchwork.linuxtv.org/patch/8455/
>
> It seems QBUF is safe, but PREPAREBUF isn't (both call __buf_prepare, which
> end up calling the memops get_userptr operation).
>
> I'll post a patch to fix it for PREPAREBUF. If I'm not mistaken, you can drop
> the down_read/up_read here.
>
>>>> +	vma = find_vma(current->mm, start);
>>>> +	if (!vma) {
>>>> +		printk(KERN_ERR "no vma for address %lu\n", start);
>>>> +		goto cleanup;
>>>> +	}
>>>> +
>>>> +	if (vma_is_io(vma)) {
>>>> +		unsigned long pfn;
>>>> +
>>>> +		if (vma->vm_end - start<  n_pages * PAGE_SIZE) {
>>>> +			printk(KERN_ERR "vma is too small\n");
>>>> +			goto cleanup;
>>>> +		}
>>>> +
>>>> +		for (n = 0; n<  n_pages; ++n, start += PAGE_SIZE) {
>>>> +			ret = follow_pfn(vma, start,&pfn);
>>>> +			if (ret) {
>>>> +				printk(KERN_ERR "no page for address %lu\n",
>>>> +					start);
>>>> +				goto cleanup;
>>>> +			}
>>>> +			pages[n] = pfn_to_page(pfn);
>>>> +			get_page(pages[n]);
>>>
>>> This worries me. When the VM_PFNMAP flag is set, the memory pages are not
>>> backed by a struct page. Creating a struct page pointer out of it can be
>>> an acceptable hack (for instance to store a page in an scatterlist with
>>> sg_set_page() and then retrieve its physical address with sg_phys()), but
>>> you should not expect the struct page to be valid for anything else.
>>> Calling get_page() on it will likely crash.
>>
>> You are completetly right. This is the corner case where list of pages is
>> not a portable way of describing the memory.
>> Maybe pfn_valid should be used to check validity of the page (pfn)
>> before getting it?
>
> I think you should just drop the get_page() call. There's no page, so there's
> no need to get a reference count to it.
>
> The VM_PFNMAP flag is mostly used with memory out of the kernel allocator's
> control if I'm not mistaken. The main use case I've seen is memory reserved at
> boot time and use as a frame buffer for instance. In that case the pages can't
> go away, as there no page in the first place.
>
> This won't fix the DMA SG problem though (see below).
>
>>>> +		}
>>>> +	} else {
>>>> +		n = get_user_pages(current, current->mm, start&  PAGE_MASK,
>>>> +			n_pages, write, 1, pages, NULL);
>>>> +		if (n != n_pages) {
>>>> +			printk(KERN_ERR "got only %d of %d user pages\n",
>>>> +				n, n_pages);
>>>> +			goto cleanup;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	*copy_vma = vb2_get_vma(vma);
>>>> +	if (!*copy_vma) {
>>>> +		printk(KERN_ERR "failed to copy vma\n");
>>>> +		ret = -ENOMEM;
>>>> +		goto cleanup;
>>>> +	}
>>>
>>> Do we really need to make a copy of the VMA ? The only reason why we store
>>> a pointer to it is to check the flags in vb2_dc_put_userptr(). We could
>>> store the flags instead and avoid vb2_get_dma()/vb2_put_dma() calls
>>> altogether.
>>
>> I remember that there was a very good reason of copying this vma structure.
>> You caught me on 'cargo-cult' programming.
>> I will do some reverse engineering and try to answer it soon.
>
> OK :-) I'm not copying the VMA in the OMAP3 ISP driver, which is why this
> caught my eyes. If you find the reason why copying it is needed, please add a
> comment to the code.
>
>>>> +
>>>> +	/* leaving critical section for mm access */
>>>> +	up_read(&current->mm->mmap_sem);
>>>> +
>>>> +	return 0;
>>>> +
>>>> +cleanup:
>>>> +	up_read(&current->mm->mmap_sem);
>>>> +
>>>> +	/* putting user pages if used, can be done wothout the lock */
>>>> +	while (n)
>>>> +		put_page(pages[--n]);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>>
>>>>   static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
>>>> -					unsigned long size, int write)
>>>> +	unsigned long size, int write)
>>>>   {
>>>>
>>>>   	struct vb2_dc_buf *buf;
>>>>
>>>> -	struct vm_area_struct *vma;
>>>> -	dma_addr_t dma_addr = 0;
>>>> -	int ret;
>>>> +	unsigned long start, end, offset, offset2;
>>>> +	struct page **pages;
>>>> +	int n_pages;
>>>> +	int ret = 0;
>>>> +	struct sg_table *sgt;
>>>> +	unsigned long contig_size;
>>>>
>>>>   	buf = kzalloc(sizeof *buf, GFP_KERNEL);
>>>>   	if (!buf)
>>>>   	
>>>>   		return ERR_PTR(-ENOMEM);
>>>>
>>>> -	ret = vb2_get_contig_userptr(vaddr, size,&vma,&dma_addr);
>>>> +	buf->dev = alloc_ctx;
>>>> +	buf->dma_dir = write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
>>>> +
>>>> +	start = (unsigned long)vaddr&  PAGE_MASK;
>>>> +	offset = (unsigned long)vaddr&  ~PAGE_MASK;
>>>> +	end = PAGE_ALIGN((unsigned long)vaddr + size);
>>>> +	offset2 = end - (unsigned long)vaddr - size;
>>>> +	n_pages = (end - start)>>  PAGE_SHIFT;
>>>> +
>>>> +	pages = kmalloc(n_pages * sizeof pages[0], GFP_KERNEL);
>>>> +	if (!pages) {
>>>> +		ret = -ENOMEM;
>>>> +		printk(KERN_ERR "failed to allocate pages table\n");
>>>> +		goto fail_buf;
>>>> +	}
>>>> +
>>>> +	/* extract page list from userspace mapping */
>>>> +	ret = vb2_dc_get_pages(start, pages, n_pages,&buf->vma, write);
>>>>
>>>>   	if (ret) {
>>>>
>>>> -		printk(KERN_ERR "Failed acquiring VMA for vaddr 0x%08lx\n",
>>>> -				vaddr);
>>>> -		kfree(buf);
>>>> -		return ERR_PTR(ret);
>>>> +		printk(KERN_ERR "failed to get user pages\n");
>>>> +		goto fail_pages;
>>>> +	}
>>>> +
>>>> +	sgt = vb2_dc_pages_to_sgt(pages, n_pages, offset, offset2);
>>>> +	if (!sgt) {
>>>> +		printk(KERN_ERR "failed to create scatterlist table\n");
>>>> +		ret = -ENOMEM;
>>>> +		goto fail_get_pages;
>>>>
>>>>   	}
>>>
>>> This looks overly complex to me. You create a multi-chunk sgt out of the
>>> user pointer address and map it completely, and then check if it starts
>>> with a big enough contiguous chunk.
>>
>> Notice that vb2_dc_pages_to_sgt does compress contiguous ranges of pfns
>> (pages). So if the memory is contiguous, then sigle chunk sglist is
>> produced. The memory used to store pages list is just temporary. It is
>> freed after the sglist is created.
>
> That's exactly my point. The memory needs to be contiguous to be usable. If it
> isn't, vb2-dma-contig will only use the first contiguous chunk. We could thus
> simplify the code by hardcoding the single-chunk assumption. vb2-dma-contig
> would walk user user pages list (or the PFN, depending on the VMA flags) and
> stop at the first discontinuity. It would then create a single-entry sg list
> and operate on that, without mapping or otherwise touching the rest of the
> VMA, which is unusable to the device anyway.
>
>>> Why don't you create an sgt with a single continuous
>>> chunk then ? In the VM_PFNMAP case you could check whether the area is
>>> contiguous when you follow the PFNs, stop at the first discontinuity, and
>>> create an sgt with a single element right there. You would then need to
>>> call vb2_dc_pages_to_sgt() in the normal case only, and stop at the first
>>> discontinuity as well.
>>
>> Discontinuity of pfns is not a problem if a device has own IOMMU. It is not
>> known for vb2-dma-contig if mapping this multi-chunk sglist will succeed
>> until calling and checking a result of dma_map_sg.
>
> If the device has an IOMMU it won't need contiguous memory. Shouldn't it then
> use vb2-dma-sg instead ?
>
>> Why bothering if both VM_PFNMAP and non-VM_PFNMAP are handled in the same
>> way after list of pages is obtained? Trating them the same way allows to
>> reuse code and simplify the program flow.
>>
>> The DMA framework does not provide any way to force single chunk mapping in
>> sg. If the device is capable of mapping discontinous pages into a single
>> chunk the DMA framework will probably do merge the pages. The documentation
>> encourages to merge the list but it is not obligatory.
>>
>> The reason is that if 'struct scatterlist' contains no dma_length field,
>> what is controlled by CONFIG_NEED_SG_DMA_LENGTH macro, then field length
>> is used instead. Chunk marging cannot be done in such a case.
>> This is the reason why I look for the longest contiguous block.
>>
>>>> +	/* pages are no longer needed */
>>>> +	kfree(pages);
>>>> +	pages = NULL;
>>>> +
>>>> +	sgt->nents = dma_map_sg(buf->dev, sgt->sgl, sgt->orig_nents,
>>>> +		buf->dma_dir);
>>>> +	if (sgt->nents<= 0) {
>>>> +		printk(KERN_ERR "failed to map scatterlist\n");
>>>> +		ret = -EIO;
>>>> +		goto fail_sgt;
>>>> +	}
>>>> +
>>>> +	contig_size = vb2_dc_get_contiguous_size(sgt);
>>>> +	if (contig_size<  size) {
>>>> +		printk(KERN_ERR "contiguous mapping is too small %lu/%lu\n",
>>>> +			contig_size, size);
>>>> +		ret = -EFAULT;
>>>> +		goto fail_map_sg;
>>>> +	}
>>>> +
>>>> +	buf->dma_addr = sg_dma_address(sgt->sgl);
>>>>
>>>>   	buf->size = size;
>>>>
>>>> -	buf->dma_addr = dma_addr;
>>>> -	buf->vma = vma;
>>>> +	buf->dma_sgt = sgt;
>>>> +
>>>> +	atomic_inc(&buf->refcount);
>>>>
>>>>   	return buf;
>>>>
>>>> +
>>>> +fail_map_sg:
>>>> +	dma_unmap_sg(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
>>>
>>> I think this will break in the VM_PFNMAP case on non-coherent
>>> architectures. arm_dma_unmap_page() will call __dma_page_dev_to_cpu() in
>>> that case, which can dereference struct page. As explain above, the
>>> struct page isn't valid with VM_PFNMAP. I haven't check the dma_map_sg()
>>> and dma_sync_sg_*() calls, but changes are they might break as well.
>>
>> It will crash as long it is true that there is no struct page behind given
>> pfn. In practice, I found that VM_PFNMAP means that one cannot assume that
>> there is a 'struct page' behind PFNs of a given mapping. Thoses struct
>> pages really exists for all our drivers. Anyway, I agree that using those
>> pages is a hack.
>
> They don't exist for the memory used as frame buffer on the OMAP3 (or at least
> didn't exist in the N900 and N9, I haven't checked since). This could become
> just a bad distant memory when drivers will use CMA.
>
>> It could be avoided if vb2_dc_get_pages returned a list of PFNs. Anyway,
>> those PFNs have to be transformed to pages to create an sglist. Those
>> pointers might be accessed somewhere deep inside dma_map_sg internals.
>>
>> The quite good solution would be dropping support for VM_PFNMAP mappings
>> since they cannot be handled reliably.
>
> We should either drop VM_PFNMAP support or fix the DMA SG mapping API to
> support VM_PFNMAP-style memory. I would vote for the former, as that's way
> simpler and we have no VM_PFNMAP use case right now.
>
>>>> +
>>>> +fail_sgt:
>>>> +	vb2_dc_put_sgtable(sgt, 0);
>>>> +
>>>> +fail_get_pages:
>>>> +	while (pages&&  n_pages)
>>>> +		put_page(pages[--n_pages]);
>>>> +	vb2_put_vma(buf->vma);
>>>> +
>>>> +fail_pages:
>>>> +	kfree(pages); /* kfree is NULL-proof */
>>>> +
>>>> +fail_buf:
>>>> +	kfree(buf);
>>>> +
>>>> +	return ERR_PTR(ret);
>>>>
>>>>   }
>
Regards,
Subash