[PATCH v2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP

Laurent Pinchart laurent.pinchart at ideasonboard.com
Wed Sep 22 22:12:19 UTC 2021


Hi Kieran,

On Wed, Sep 22, 2021 at 10:37:29PM +0100, Kieran Bingham wrote:
> On 30/07/2021 03:05, Laurent Pinchart wrote:
> > On R-Car Gen3, the DU uses a separate IP core named VSP to perform DMA
> > from memory and composition of planes. The DU hardware then only handles
> > the video timings and the interface with the encoders. This differs from
> > Gen2, where the DU included a composer with DMA engines.
> > 
> > When sourcing from the VSP, the DU hardware performs no memory access,
> > and thus has no requirements on imported dma-buf memory types. The GEM
> > CMA helpers however still create a DMA mapping to the DU device, which
> > isn't used. The mapping to the VSP is done when processing the atomic
> > commits, in the plane .prepare_fb() handler.
> > 
> > When the system uses an IOMMU, the VSP device is attached to it, which
> > enables the VSP to use non physically contiguous memory. The DU, as it
> > performs no memory access, isn't connected to the IOMMU. The GEM CMA
> > drm_gem_cma_prime_import_sg_table() helper will in that case fail to map
> > non-contiguous imported dma-bufs, as the DMA mapping to the DU device
> > will have multiple entries in its sgtable. The prevents using non
> > physically contiguous memory for display.
> > 
> > The DRM PRIME and GEM CMA helpers are designed to create the sgtable
> > when the dma-buf is imported. By default, the device referenced by the
> > drm_device is used to create the dma-buf attachment. Drivers can use a
> > different device by using the drm_gem_prime_import_dev() function. While
> > the DU has access to the VSP device, this won't help here, as different
> > CRTCs use different VSP instances, connected to different IOMMU
> > channels. The driver doesn't know at import time which CRTC a GEM object
> > will be used, and thus can't select the right VSP device to pass to
> > drm_gem_prime_import_dev().
> > 
> > To support non-contiguous memory, implement a custom
> > .gem_prime_import_sg_table() operation that accepts all imported dma-buf
> > regardless of the number of scatterlist entries. The sgtable will be
> > mapped to the VSP at .prepare_fb() time, which will reject the
> > framebuffer if the VSP isn't connected to an IOMMU.
> 
> Wow - quite a lot to digest there.

I tried to make it clear, but there's lots to explain :-S

> > Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas at ideasonboard.com>
> > ---
> > 
> > This can arguably be considered as a bit of a hack, as the GEM PRIME
> > import still maps the dma-buf attachment to the DU, which isn't
> > necessary. This is however not a big issue, as the DU isn't connected to
> > any IOMMU, the DMA mapping thus doesn't waste any resource such as I/O
> > memory space. Avoiding the mapping creation would require replacing the
> > helpers completely, resulting in lots of code duplication. If this type
> > of hardware setup was common we could create another set of helpers, but
> > I don't think it would be worth it to support a single device.
> > 
> > I have tested this patch with the cam application from libcamera, on a
> > R-Car H3 ES2.x Salvator-XS board, importing buffers from the vimc
> > driver:
> > 
> > cam -c 'platform/vimc.0 Sensor B' \
> > 	-s pixelformat=BGR888,width=1440,height=900 \
> > 	-C -D HDMI-A-1
> 
> Are VIMC buffers always physically non-contiguous to validate this test?

They're not, but with
https://lore.kernel.org/linux-media/20210730001939.30769-1-laurent.pinchart+renesas@ideasonboard.com/
they can be made to be, with a module parameter.

> > A set of patches to add DRM/KMS output support to cam has been posted.
> > Until it gets merged (hopefully soon), it can be found at [1].
> > 
> > As cam doesn't support DRM/KMS scaling and overlay planes yet, the
> > camera resolution needs to match the display resolution. Due to a
> > peculiarity of the vimc driver, the resolution has to be divisible by 3,
> > which may require changes to the resolution above depending on your
> > monitor.
> > 
> > A test patch is also needed for the kernel, to enable IOMMU support for
> > the VSP, which isn't done by default (yet ?) in mainline. I have pushed
> > a branch to [2] if anyone is interested.
> > 
> > [1] https://lists.libcamera.org/pipermail/libcamera-devel/2021-July/022815.html
> > [2] git://linuxtv.org/pinchartl/media.git drm/du/devel/gem/contig
> > 
> > ---
> > Changes since v1:
> > 
> > - Rewrote commit message to explain issue in more details
> > - Duplicate the imported scatter gather table in
> >   rcar_du_vsp_plane_prepare_fb()
> > - Use separate loop counter j to avoid overwritting i
> > - Update to latest drm_gem_cma API
> > ---
> >  drivers/gpu/drm/rcar-du/rcar_du_drv.c |  6 +++-
> >  drivers/gpu/drm/rcar-du/rcar_du_kms.c | 49 +++++++++++++++++++++++++++
> >  drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 ++++
> >  drivers/gpu/drm/rcar-du/rcar_du_vsp.c | 36 +++++++++++++++++---
> >  4 files changed, 92 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > index cb34b1e477bc..d1f8d51a10fe 100644
> > --- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > +++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > @@ -511,7 +511,11 @@ DEFINE_DRM_GEM_CMA_FOPS(rcar_du_fops);
> >  
> >  static const struct drm_driver rcar_du_driver = {
> >  	.driver_features	= DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC,
> > -	DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(rcar_du_dumb_create),
> > +	.dumb_create		= rcar_du_dumb_create,
> > +	.prime_handle_to_fd	= drm_gem_prime_handle_to_fd,
> > +	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
> > +	.gem_prime_import_sg_table = rcar_du_gem_prime_import_sg_table,
> > +	.gem_prime_mmap		= drm_gem_prime_mmap,
> >  	.fops			= &rcar_du_fops,
> >  	.name			= "rcar-du",
> >  	.desc			= "Renesas R-Car Display Unit",
> > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> > index fdb8a0d127ad..7077af0886cf 100644
> > --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> > +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> > @@ -19,6 +19,7 @@
> >  #include <drm/drm_vblank.h>
> >  
> >  #include <linux/device.h>
> > +#include <linux/dma-buf.h>
> >  #include <linux/of_graph.h>
> >  #include <linux/of_platform.h>
> >  #include <linux/wait.h>
> > @@ -325,6 +326,54 @@ const struct rcar_du_format_info *rcar_du_format_info(u32 fourcc)
> >   * Frame buffer
> >   */
> >  
> > +static const struct drm_gem_object_funcs rcar_du_gem_funcs = {
> > +	.free = drm_gem_cma_free_object,
> > +	.print_info = drm_gem_cma_print_info,
> > +	.get_sg_table = drm_gem_cma_get_sg_table,
> > +	.vmap = drm_gem_cma_vmap,
> > +	.mmap = drm_gem_cma_mmap,
> > +	.vm_ops = &drm_gem_cma_vm_ops,
> > +};
> > +
> > +struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
> > +				struct dma_buf_attachment *attach,
> > +				struct sg_table *sgt)
> > +{
> > +	struct rcar_du_device *rcdu = to_rcar_du_device(dev);
> > +	struct drm_gem_cma_object *cma_obj;
> > +	struct drm_gem_object *gem_obj;
> > +	int ret;
> > +
> > +	if (!rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE))
> > +		return drm_gem_cma_prime_import_sg_table(dev, attach, sgt);
> > +
> > +	/* Create a CMA GEM buffer. */
> > +	cma_obj = kzalloc(sizeof(*cma_obj), GFP_KERNEL);
> > +	if (!cma_obj)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	gem_obj = &cma_obj->base;
> > +	gem_obj->funcs = &rcar_du_gem_funcs;
> > +
> > +	drm_gem_private_object_init(dev, gem_obj, attach->dmabuf->size);
> > +	cma_obj->map_noncoherent = false;
> > +
> > +	ret = drm_gem_create_mmap_offset(gem_obj);
> > +	if (ret) {
> > +		drm_gem_object_release(gem_obj);
> > +		goto error;
> > +	}
> > +
> > +	cma_obj->paddr = 0;
> > +	cma_obj->sgt = sgt;
> > +
> > +	return gem_obj;
> > +
> > +error:
> > +	kfree(cma_obj);
> > +	return ERR_PTR(ret);
> 
> Almost seems overkill to handle this here, rather than inline in the
> failure of drm_gem_create_mmap_offset() which would use 2 less lines in
> the function ..
> 
> But perhaps you're planning for it to be extended?
> 
> However then wouldn't drm_gem_object_release() need to be handled in
> error: as well?
> 
> Personally I'd just inline the kfree in place of the goto error here.

You're right, I'll do so.

> But as that's the worst I can find in here so far:
> 
> Reviewed-by: Kieran Bingham <kieran.bingham+renesas at ideasonboard.com>
> 
> > +}
> > +
> >  int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
> >  			struct drm_mode_create_dumb *args)
> >  {
> > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.h b/drivers/gpu/drm/rcar-du/rcar_du_kms.h
> > index 8f5fff176754..789154e19535 100644
> > --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.h
> > +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.h
> > @@ -12,10 +12,13 @@
> >  
> >  #include <linux/types.h>
> >  
> > +struct dma_buf_attachment;
> >  struct drm_file;
> >  struct drm_device;
> > +struct drm_gem_object;
> >  struct drm_mode_create_dumb;
> >  struct rcar_du_device;
> > +struct sg_table;
> >  
> >  struct rcar_du_format_info {
> >  	u32 fourcc;
> > @@ -34,4 +37,8 @@ int rcar_du_modeset_init(struct rcar_du_device *rcdu);
> >  int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
> >  			struct drm_mode_create_dumb *args);
> >  
> > +struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
> > +				struct dma_buf_attachment *attach,
> > +				struct sg_table *sgt);
> > +
> >  #endif /* __RCAR_DU_KMS_H__ */
> > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
> > index 23e41c83c875..b7fc5b069cbc 100644
> > --- a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
> > +++ b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
> > @@ -187,17 +187,43 @@ int rcar_du_vsp_map_fb(struct rcar_du_vsp *vsp, struct drm_framebuffer *fb,
> >  		       struct sg_table sg_tables[3])
> >  {
> >  	struct rcar_du_device *rcdu = vsp->dev;
> > -	unsigned int i;
> > +	unsigned int i, j;
> >  	int ret;
> >  
> >  	for (i = 0; i < fb->format->num_planes; ++i) {
> >  		struct drm_gem_cma_object *gem = drm_fb_cma_get_gem_obj(fb, i);
> >  		struct sg_table *sgt = &sg_tables[i];
> >  
> > -		ret = dma_get_sgtable(rcdu->dev, sgt, gem->vaddr, gem->paddr,
> > -				      gem->base.size);
> > -		if (ret)
> > -			goto fail;
> > +		if (gem->sgt) {
> > +			struct scatterlist *src;
> > +			struct scatterlist *dst;
> > +
> > +			/*
> > +			 * If the GEM buffer has a scatter gather table, it has
> > +			 * been imported from a dma-buf and has no physical
> > +			 * address as it might not be physically contiguous.
> > +			 * Copy the original scatter gather table to map it to
> > +			 * the VSP.
> > +			 */
> > +			ret = sg_alloc_table(sgt, gem->sgt->orig_nents,
> > +					     GFP_KERNEL);
> > +			if (ret)
> > +				goto fail;
> > +
> > +			src = gem->sgt->sgl;
> > +			dst = sgt->sgl;
> > +			for (j = 0; j < gem->sgt->orig_nents; ++j) {
> > +				sg_set_page(dst, sg_page(src), src->length,
> > +					    src->offset);
> > +				src = sg_next(src);
> > +				dst = sg_next(dst);
> > +			}
> > +		} else {
> > +			ret = dma_get_sgtable(rcdu->dev, sgt, gem->vaddr,
> > +					      gem->paddr, gem->base.size);
> > +			if (ret)
> > +				goto fail;
> > +		}
> >  
> >  		ret = vsp1_du_map_sg(vsp->vsp, sgt);
> >  		if (ret) {

-- 
Regards,

Laurent Pinchart


More information about the dri-devel mailing list