[PATCH] drm: enable render nodes wherever buffer sharing is supported

Tue Apr 28 02:22:02 UTC 2020

On Mon, Apr 27, 2020 at 1:47 PM Eric Anholt <eric at anholt.net> wrote:
>
> On Mon, Apr 27, 2020 at 1:05 PM Peter Collingbourne <pcc at google.com> wrote:
> >
> > Render nodes are not just useful for devices supporting GPU hardware
> > acceleration. Even on devices that only support dumb frame buffers,
> > they are useful in situations where composition (using software
> > rasterization) and KMS are done in different processes with buffer
> > sharing being used to send frame buffers between them. This is the
> > situation on Android, where surfaceflinger is the compositor and the
> > composer HAL uses KMS to display the buffers. Thus it is beneficial
> > to expose render nodes on all devices that support buffer sharing.
> >
> > Because all drivers that currently support render nodes also support
> > buffer sharing, the DRIVER_RENDER flag is no longer necessary to mark
> > devices as supporting render nodes, so remove it and just rely on
> > the presence of a prime_handle_to_fd function pointer to determine
> > whether buffer sharing is supported.
>
> I'm definitely interested in seeing a patch like this land, as I think
> the current state is an ugly historical artifact.  We just have to be
> careful.
>
> Were there any instances of drivers with render engines exposing PRIME
> but not RENDER?  We should be careful to make sure that we're not
> exposing new privileges for those through adding render nodes.

These are the drivers that we'd be adding render nodes for with this change:

$ git grep -l prime_handle_to_fd (git grep -L DRIVER_RENDER (git grep
-l '\.driver_features'))
drivers/gpu/drm/arc/arcpgu_drv.c
drivers/gpu/drm/arm/display/komeda/komeda_kms.c
drivers/gpu/drm/arm/hdlcd_drv.c
drivers/gpu/drm/arm/malidp_drv.c
drivers/gpu/drm/armada/armada_drv.c
drivers/gpu/drm/aspeed/aspeed_gfx_drv.c
drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c
drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
drivers/gpu/drm/imx/imx-drm-core.c
drivers/gpu/drm/ingenic/ingenic-drm.c
drivers/gpu/drm/mcde/mcde_drv.c
drivers/gpu/drm/mediatek/mtk_drm_drv.c
drivers/gpu/drm/meson/meson_drv.c
drivers/gpu/drm/mxsfb/mxsfb_drv.c
drivers/gpu/drm/pl111/pl111_drv.c
drivers/gpu/drm/qxl/qxl_drv.c
drivers/gpu/drm/rcar-du/rcar_du_drv.c
drivers/gpu/drm/rockchip/rockchip_drm_drv.c
drivers/gpu/drm/shmobile/shmob_drm_drv.c
drivers/gpu/drm/sti/sti_drv.c
drivers/gpu/drm/stm/drv.c
drivers/gpu/drm/tilcdc/tilcdc_drv.c
drivers/gpu/drm/tve200/tve200_drv.c
drivers/gpu/drm/xen/xen_drm_front.c
drivers/gpu/drm/zte/zx_drm_drv.c

Some of the drivers provide custom ioctls but they are already
protected from render nodes by not setting DRM_RENDER_ALLOW. Another
thing to check for is drivers providing custom fops that might expose
something undesirable in the render node:

$ git grep -L 'DEFINE_DRM_GEM_CMA_FOPS\|DEFINE_DRM_GEM_FOPS' (git grep
-l prime_handle_to_fd (git grep -L DRIVER_RENDER (git grep -l
'\.driver_features')))
drivers/gpu/drm/mediatek/mtk_drm_drv.c
drivers/gpu/drm/rockchip/rockchip_drm_drv.c
drivers/gpu/drm/xen/xen_drm_front.c

But looking at those drivers in detail, I see that each of them is
using the standard DRM fops handlers (which presumably already handle
render nodes correctly) with the exception of mmap, which they provide
wrappers for that mostly just wrap drm_gem_mmap.

So unless I'm missing something significant (which seems likely -- I'm
not a DRM expert), I don't see a problem so far.

> There's a UAPI risk I see here.  Right now, on a system with a single
> renderer GPU, we can just open /dev/dri/renderD128 and get the GPU for
> rendering, and various things are relying on that (such as libwaffle,
> used in piglit among other things)   Adding render nodes for kms-only
> drivers could displace the actual GPU to 129, and the question is
> whether this will be disruptive.  For Mesa, I think this works out,
> because kmsro should load on the kms device's node and then share
> buffers over to the real GPU that it digs around to find at init time.
> Just saying, I'm not sure I know all of the userspace well enough to
> say "this should be safe despite that"
>
> (And, maybe, if we decide that it's not safe enough, we could punt
> kms-only drivers to a higher starting number?)

Android (minigbm) similarly tries to open /dev/dri/renderD$N in a loop
with 128 <= N < 192 and assumes that the first non-blacklisted (i.e.
not vgem) one that it can open corresponds to the real GPU [1]. I
think that the risk of breaking something on Android is low since
Android's architecture basically already depends on there being a
render node, and it seems unlikely for a device to have more than one
GPU, one of which would be non-functional.

It's also worth bearing in mind that render nodes were added to vgem
in commit 3a6eb795 from 2018. To the extent that exposing additional
render nodes would lead to widespread breakage, this would seem to me
to be a likely way in which it could have happened (since I would
expect that it could cause many machines to go from having one render
node to having more than one), so perhaps the argument can be made
that if we hadn't seen widespread breakage as a result of that change,
we'd be unlikely to see it as a result of this one.

This would be conditional on the userspace code not blacklisting the
vgem render node like minigbm does -- at a glance I couldn't find such
code in Mesa (there does appear to be some code that looks for the
vgem driver name, but it seems to only be used on primary nodes, not
render nodes) or libwaffle.

Peter

[1] https://cs.android.com/android/platform/superproject/+/master:external/minigbm/cros_gralloc/cros_gralloc_driver.cc;l=48