[PATCH v2 00/10] omapdrm: GEM objects fixes

Laurent Pinchart laurent.pinchart at ideasonboard.com
Thu Apr 20 21:33:49 UTC 2017


Hello,

This patch series updates and extends the previously posted "[PATCH 1/7]
omapdrm: Fix GEM objects DMA unmapping" series.

Compared to v1, the series now includes an extra cleanup (06/10), a cache
handling performance improvement (09/10) and a dmabuf refcount fix (10/10).

Memory leaks have been reported when allocating a cached omap_bo (with
OMAP_BO_CACHED). Investigation showed that this can only come from the DMA
mapping debug layer, as on ARM32 the non-coherent, non-IOMMU DMA mapping code
doesn't allocate memory to map a page (and the kmemcheck facility is only
available on x86, so it can't be a source of memory leaks either on ARM).

The DMA debug layer pre-allocates DMA debugging entries and stores them in a
list. As the omapdrm driver maps cached buffer page by page, the list of 4096
pre-allocated entries is starved very soon. However, raising the number of DMA
mapping debug entries to 32 * 4096 (through the dma_debug_entries kernel
command line argument) led to more interesting results.

The number of entries being large enough to handle all the pages mapped by
kmstest, monitoring the DMA mapping statistics through
/sys/kernel/debug/dma-api/ showed that the number of free entries dropped
significantly when kmstest was started and didn't raise when it was stopped.
In particular, running kmstest without flipping resulting in a drop of 1266
free entries, which corresponds to one 1440x900 framebuffer in XR24. That
proved that the pages backing the framebuffer, while freed when the
framebuffer was destroyed, were not unmapped.

I've thus started investigating the driver GEM implementation. After a few
confusing moments that resulted in the 01/10 to 07/10 cleanup patches, I wrote
patch 08/10 that fixes the issue. No memory leak is now noticed through
/sys/kernel/debug/dma-api/ or /proc/meminfo running 'kmstest --flip' for 300
frames in a loop for one hour with cached buffers.

Test patches for kms++ and kmstest to support omapdrm cached mappings are
available at

	git://git.ideasonboard.com/renesas/kmsxx.git omap/cache

Patch 09/10 then improves performances by mapping pages for DMA in the correct
direction. As the device never writes to memory, DMA_TO_DEVICE can be used
instead of DMA_BIDIRECTIONAL.

Cache handling is still slow, but I don't think major performance improvements
can easily be achieved. Cache handling is all about trade-offs. The current
driver implementation tracks dirty pages and cleans the cache for them only.
When only a small part of the buffer has been touched this results in a
performance gain compared to cleaning the cache for the whole buffer. In
particular, when the buffer has not been touched by the CPU at all (which is
common when displaying frames captured from a camera device with buffers
shared through dma-buf), no cache clean operation is performed. This has been
verified using the kmscapture application from kms++ from the above branch.

On the other hand, when most of the buffer has been touched, the extra effort
to track dirty pages results in a performance loss. However, measurements with
perf and ftrace showed that the cost of dirty page tracking is small compared
to cache handling.

One optimization I tried resulted in a very large performance improvement
though. Replacing selective cache cleaning (through the DMA mapping API) by
a whole L1+L2 cache clean raised the frame rate from 30 fps to 60 fps when
running 'kmstest --flip' on my Pandaboard. However, as that requires calling

	flush_cache_all();
	local_irq_save(flags);
	outer_flush_all();
	local_irq_restore(flags);

from the omapdrm driver, there is no way this will be accepted upstream as-is.
We could explore the option of adding heuristics to the ARM32 DMA mapping
implementation to clean the whole cache when the buffer is large, but that
belongs to a separate patch series.

To close the patch series, patch 10/10 fixes a GEM object reference count
issue related to dma-buf. It turns out that GEM object dma-buf export has
suffered from a refcount underflow since kernel v3.5 without anyone noticing.

Laurent Pinchart (10):
  drm: omapdrm: Remove remap argument to omap_gem_get_paddr()
  drm: omapdrm: Rename occurrences of paddr to dma_addr
  drm: omapdrm: Rename omap_gem_(get|put)_paddr() to omap_gem_(un)pin()
  drm: omapdrm: Lower indentation level in omap_gem_dma_sync_buffer()
  drm: omapdrm: Rename the omap_gem_object addrs field to dma_addrs
  drm: omapdrm: Rename GEM DMA sync functions
  drm: omapdrm: Fix incorrect usage of the term 'cache coherency'
  drm: omapdrm: DMA-unmap pages for all buffer types when freeing
    buffers
  drm: omapdrm: Map pages for DMA in DMA_TO_DEVICE direction
  drm: omapdrm: Take GEM obj and DRM dev references when exporting
    dmabuf

 drivers/gpu/drm/omapdrm/omap_drv.h        |  13 +-
 drivers/gpu/drm/omapdrm/omap_fb.c         |  27 ++--
 drivers/gpu/drm/omapdrm/omap_fbdev.c      |  15 +--
 drivers/gpu/drm/omapdrm/omap_gem.c        | 209 ++++++++++++++++--------------
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  39 +++---
 5 files changed, 163 insertions(+), 140 deletions(-)

-- 
Regards,

Laurent Pinchart



More information about the dri-devel mailing list