[PATCH 2/4] drm/i2c: tda998x: Remove obsolete drm_connector_register() call

Russell King - ARM Linux linux at armlinux.org.uk
Mon Oct 31 00:09:23 UTC 2016


On Mon, Oct 24, 2016 at 08:53:04AM +0200, Daniel Vetter wrote:
> On Mon, Oct 24, 2016 at 11:58:00AM +0530, Archit Taneja wrote:
> > On 10/22/2016 03:25 PM, Russell King - ARM Linux wrote:
> > > Looking at drm_bridge_disable() and drm_bridge_enable(), the control
> > > model appears to be:
> > > 
> > > 	crtc -> encoder -> connector
> > >                  `-> bridge
> > > 		     `-> bridge
> > > 		         `-> bridge
> > > 
> > > Bridges are always attached to an encoder, and there can be multiple
> > > bridges attached to one encoder.  Bridges can't be attached to the
> > > connector.
> 
> In helpers connectors are no-op objects. We _never_ call any connector
> callbacks when doing a modeset. Connectors are only used to probe output
> state, and as the userspace-visisble endpoint representation. Hence the
> real graph is
> 
> crtc -> encoder [ -> bridge [ -> bridge [...]]] -> connector
> 
> with the last bridge owning the connector. And that last bridge probably
> needs to store a pointer to its connector(s).

That model can't work for TDA998x if the TDA998x is followed by
another "bridge" (eg, to convert the TDMS signals to something else)
unless there's some way to tell a bridge that it isn't the last in
the chain.

However, my graph is accurate as it's reflecting the software
modelling - it reflects how the various objects are bound together in
DRM.  The DRM encoder has pointers to the DRM bridge, which has a
pointer to the next DRM bridge.  The DRM connector doesn't have any
pointers to the connector, only to the DRM encoder.  So, DRM bridges
are childs of the encoder, and the encoder (and attached encoder
bridge chain) can be selected by the DRM connector.

However, you are correct that for different "tasks" like mode setting,
or output probing, the representation is somewhat different - that's
not really what I was talking about though, and I certainly was not
talking about the userspace representation.

What I'm 100% concerned about is how this stuff looks in kernel space
and what the driver(s) should look like.

> > > So, in the case of TDA998x, we end up with the TDA998x providing a
> > > connector, because it has connector functionality, and providing a
> > > bridge.  The encoder is left to the KMS driver, which adds additional
> > > complexity (100+ lines) to each and every KMS driver, requiring the
> > > KMS driver to have much more knowledge of what's attached to the
> > > "CRTC", so it can create these encoders itself.  I still think this
> > > is a backwards step - maybe one step forwards, two backwards.
> 
> We've stubbed out everything that's in an encoder, you definitely don't
> need hundreds of lines to write one any more. If there's still a callback
> left around drm_encoder which has not yet suitable default handling, then
> that's a bug.

Sorry, but I do need exactly what I've written above, I can talk rather
definitively because I've actually got the code in front of me.  Most of
the additional lines is due to the complexity added to the KMS driver to
locate (actually for a third time) all the components in the system,
specifically parsing the DT tree to find the "encoders" (or rather the
TDA998x in this case), creating the DRM encoder objects, and binding the
TDA998x bridge.

Here's the _exact_ diffstat for the hacky conversion so far (including
something like the 10 patches I posted last weekend, which haven't had
any comments yet):

 drivers/gpu/drm/armada/armada_drv.c | 125 +++++
 drivers/gpu/drm/i2c/tda998x_drv.c   | 904 +++++++++++++++++-------------------
 2 files changed, 560 insertions(+), 469 deletions(-)

The actual bridge conversion on its own is:

 drivers/gpu/drm/armada/armada_drv.c | 125 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i2c/tda998x_drv.c   | 139 ++++++++++++++----------------------
 2 files changed, 180 insertions(+), 84 deletions(-)

> Note: Only applies to atomic though, I'm not going to bother with old
> legacy crtc helpers. I guess armada needs to switch to atomic, otherwise
> encoders are indeed a bit a pain.

That's not going to happen - and you know exactly why that's not going
to happen - I've tried to do it and it failed misterably with all sorts
of problems.  The idea that it can be done piecemeal as per your guide
is a falacy - it can't be.  There is no progressive way to do a
conversion.  It seems that KMS drivers need to be rewritten from
scratch, and that means there is a high risk of introducing lots of
new bugs.

I'm just not prepared to go through that - I'd much rather have a stable
kernel driver that actually works than spend the next six months rewriting
and debugging stuff just for the latest ideas about how stuff should be
done.

_OR_ there could be more help from DRM to ease the transition pain from
non-atomic to atomic KMS drivers, so that it can be done in appropriately
sized steps, so that the driver can be adequately tested to ensure that
things don't totally fall apart... you know, like imx-drm has gone from
being a stable driver to keep on falling apart now that it's been
converted to atomic modeset.

> Imo encoders should be that part which is baked into your core ip. If
> there's nothing, then you're perfectly fine with a no-op encoder.

>From my point of view, the TDA998x _is_ an encoder - it takes RGB and
sync signals, and encodes them into the TDMS format for DVI or HDMI.
I guess what I call an encoder is not what DRM calls an encoder though.

What's in the Dove is effectively a pair of CRTCs, some muxes, a set of
VGA DACs and a parallel RGB bus with pixel clock and sync signals.
Apart from the VGA DACs (which aren't used in the TDA998x path) it's
pretty hard to imagine what piece of hardware could be called an
encoder.

So what does the DRM encoder represent, hardware wise in this case?
As I say, in my mind, the TDA998x _is_ the encoder.

> Maybe we
> could do a helper for creating those, if the few lines are copypasted too
> often. Then all the external IP should be bridges (and chained). And with
> chains either you need another bridge, or you're the last bridge, and then
> you're supposed to register the connector as the final endpoint.

Let me repeat: the "DRM connector" is part of the TDA998x - the TDA998x
provides the EDID reading capabilities, and the connection detection
capabilities.  It also provides the CEC communication capabilities as
well, but that's not too relevant to this discussion, apart from
illustrating that it's an all-in-one single chip solution to providing
a full HDMI source implementation.

The TDA998x is not a stand-alone "bridge" which just _encodes_ a parallel
RGB bus to TDMS signals, it's much more than that.  That's why I'm saying
we can't separate out the connector functionality from the encoder
functionality.

> > I do agree that it's a step backward that we now have to search for
> > a corresponding bridge, which we didn't have to do when the chip
> > was represented as an encoder.
> 
> You can still do the exact same thing with bridges as with encoders using
> the component framework. Should not be a step back at all.

Sorry, no you can't at the moment.  As I've already said, grep for
"bridge_list".  Read the code in drivers/gpu/drm/drm_bridge.c, and
notice that there's two places that this list is accessed:

1. inside drm_bridge_add()
2. inside of_drm_find_bridge() which is only available when CONFIG_OF
   is enabled, and requires a DT struct device_node pointer to perform
   the lookup.  struct device_node's do not exist without DT.

> > > There's another issue with TDA998x - we now have audio support in
> > > TDA998x, and converting TDA998x to be a DRM bridge introduces some
> > > fundamental (and as yet unsolved) races between the ASoC code and
> > > the attachment of the DRM bridge to the DRM encoder, and the detachment
> > > when the DRM bridge/connector gets cleaned up.  Right now, there's no
> > > bridge callback when the encoder or drm_device goes away, so doing
> > > stuff like:
> > > 
> > > static int tda998x_audio_get_eld(struct device *dev, void *data,
> > >                                  uint8_t *buf, size_t len)
> > > {
> > >         struct tda998x_priv *priv = dev_get_drvdata(dev);
> > >         struct drm_mode_config *config;
> > >         struct drm_connector *connector;
> > >         int ret = -ENODEV;
> > > 
> > >         /* FIXME: This is racy */
> > >         if (!priv->bridge.encoder || !priv->bridge.encoder->dev)
> > >                 return ret;
> > > 
> > >         config = &priv->bridge.encoder->dev->mode_config;
> > > 
> > >         mutex_lock(&config->mutex);
> > >         list_for_each_entry(connector, &config->connector_list, head) {
> > >                 if (priv->bridge.encoder == connector->encoder) {
> > >                         memcpy(buf, connector->eld,
> > >                                min(sizeof(connector->eld), len));
> > >                         ret = 0;
> > >                 }
> > >         }
> > >         mutex_unlock(&config->mutex);
> > > 
> > > feels very unsafe - nothing really guarantees the validity of the
> > > priv->bridge.encoder or priv->bridge.encoder->dev pointers.  The
> > > config->mutex lock does nothing to solve this.  The same problem
> > > also exists in tda998x_audio_hw_params().
> > 
> > Maybe we could ensure that the bridge attachment/detachment is
> > contained within drm_encoder_init/cleanup funcs, so that their
> > life is tied to the encoder drm_mode_object. It wouldn't be as
> > straightforward, since the drm_bridges create connectors too.
> > Will look more into this.
> 
> I don't see any issue with the above at all. Or well, if there is one
> there's a larger issue: You can't reach this code if you unregister your
> driver's interface _before_ you tear down anything. This is fixed by
> getting rid of the load/unload callbacks. And for additional interfaces
> there's new register/unregister callbacks on connectors (which the bridge
> also should own).

That's easy to say if you're into the "lets rewrite everything all at
the same time" mentality, which from your response I think sums up
your position on everything from atomic mode set to this problem.

Sorry, I really hate the rewrite mentality, that's not good programming
practice, especially when existing implementations work.  What's
instead required are a series of incremental steps to effect the
full outcome, especially when multiple drivers are involved.

If you look at the problems surrounding the removal of the
drm_connector_register() from TDA998x, you'll see why this is important:
it's not the drivers _with_ the mid-layer that's a problem here, but
those which were converted prematurely, or written without using the
mid-layer that are blocking the removal of drm_connector_register().

And the removal of drm_connector_register() from TDA998x blocks the
removal of the mid-layer from armada, because removing the mid-layer
_now_ causes the kernel to WARN - I know, I've tried it already:

[    1.933854] WARNING: CPU: 0 PID: 13 at /home/rmk/git/linux-cubox/lib/kobject.c:244 kobject_add_internal+0xfc/0x2d8
[    1.944286] kobject_add_internal failed for card0-HDMI-A-1 (error: -2 parent: card0)

But... the mid-layer issue you raise is a complete red herring, the
race has absolutely nothing to do with that.

What causes the race is that during the KMS driver's probing, we get
to the point where tda998x_bind() is called.  This registers the
DRM bridge so that the KMS driver can later find and attach to the
bridge.

However, just before creating the DRM bridge, it also creates a
platform device for the audio codec side.  As soon as that platform
device is registered, ASoC is free to bind the audio subsystem and
make it available to userspace.

This means that any of the tda998x_audio_* functions are able to
be called from the point that this platform device is registered.

At this point, priv->bridge.encoder will be NULL, which means that
some of the tda998x_audio_* functions should fail due to that.

KMS driver initialisation can continue, writing the various pointers,
and that happens without locking - but at that stage, it should only
be going from NULL pointers to non-NULL pointers pointing at valid
memory.  However, there are no barriers to ensure that the various
writes occur in the expected order (we're talking about writes in the
KMS driver being visible to reads in the TDA998x audio side, possibly
by another CPU, so locking isn't the answer - I can't see any way such
a lock could be shared between TDA998x and various KMS drivers, or
even some generic dummy DRM encoder helper.  Barriers may be the
answer, we need to ensure that encoder->dev is always valid before
bridge->encoder is valid.)

However, we need to also consider the initialisation failure and
error clean up paths, assuming we have got this far - and that's
where the worry is.  drm_encoder_cleanup() memsets the entire
encoder to zero.  So, from the above, a compiler is perfectly at
liberty to re-read the priv->bridge.encoder->dev pointer between
these two statements:

	if (!priv->bridge.encoder || !priv->bridge.encoder->dev)
		return ret;

	config = &priv->bridge.encoder->dev->mode_config;

and if such a re-read co-incides with the memset() in
drm_encoder_cleanup() becoming visible, this is a possible oops
waiting to happen.

It gets worse if the KMS driver is responsible for freeing the DRM
encoder that it created to attach to the TDA998x - if it frees that
memory before tda998x_unbind() has been called, the audio subsystem
will still be visible to userspace, and creates a potential
use-after-free.

So, none of this has anything what so ever to do with "is the KMS
driver mid-layered or not" - this problem can exist irrespective of
whether I have armada mid-layered or de-mid-layered.

NB. It doesn't actually exist with armada, because armada is not used
with the audio stuff on the cubox, we feed SPDIF to the TDA998x and
let the TDA998x sort itself out, no audio codec is required there,
but the point is that the complexities here are spread between
TDA998x and associated KMS drivers - both have to be doing the
right things for there not to be any subtle bugs here, and that is
a really bad model.  As I've already said, the problem does not
exist as the driver stands in mainline today, only once it is
converted to drm bridge, and it's purely down to the way the bridge
code works.  It is solvable, provided the connector remains part of
TDA998x.

So, like everything, we need to go through a series of steps to make
these changes, and these steps need to happen in the right order,
not as one huge great big lets-change-everything-at-once kind of
approach.

It's either going to take time, feeding changes into the kernel slowly,
or it's going to need a lot of co-operation between different device
driver authors, and sharing of stable commits between different git
trees.

Right now, the drm_connector_register() thing is basically blocking
everything, and that needs to be handled in a way that's acceptable to
all parties.  The drm bridge conversion is something that can only
happen once all the ducks are properly aligned - iow,
drm_connector_register() gone, audio problems solved (eg, via the
10 patch series) and we have a way to convert TDA998x to a bridge
without requiring every KMS user of TDA998x to simultaneously grow
its own drm encoders.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


More information about the dri-devel mailing list