[PATCH 2/4] drm/i2c: tda998x: Remove obsolete drm_connector_register() call

Daniel Vetter daniel at ffwll.ch
Tue Nov 8 09:21:57 UTC 2016


On Mon, Oct 31, 2016 at 12:09:23AM +0000, Russell King - ARM Linux wrote:
> On Mon, Oct 24, 2016 at 08:53:04AM +0200, Daniel Vetter wrote:
> > On Mon, Oct 24, 2016 at 11:58:00AM +0530, Archit Taneja wrote:
> > > On 10/22/2016 03:25 PM, Russell King - ARM Linux wrote:
> > > > Looking at drm_bridge_disable() and drm_bridge_enable(), the control
> > > > model appears to be:
> > > > 
> > > > 	crtc -> encoder -> connector
> > > >                  `-> bridge
> > > > 		     `-> bridge
> > > > 		         `-> bridge
> > > > 
> > > > Bridges are always attached to an encoder, and there can be multiple
> > > > bridges attached to one encoder.  Bridges can't be attached to the
> > > > connector.
> > 
> > In helpers connectors are no-op objects. We _never_ call any connector
> > callbacks when doing a modeset. Connectors are only used to probe output
> > state, and as the userspace-visisble endpoint representation. Hence the
> > real graph is
> > 
> > crtc -> encoder [ -> bridge [ -> bridge [...]]] -> connector
> > 
> > with the last bridge owning the connector. And that last bridge probably
> > needs to store a pointer to its connector(s).
> 
> That model can't work for TDA998x if the TDA998x is followed by
> another "bridge" (eg, to convert the TDMS signals to something else)
> unless there's some way to tell a bridge that it isn't the last in
> the chain.
> 
> However, my graph is accurate as it's reflecting the software
> modelling - it reflects how the various objects are bound together in
> DRM.  The DRM encoder has pointers to the DRM bridge, which has a
> pointer to the next DRM bridge.  The DRM connector doesn't have any
> pointers to the connector, only to the DRM encoder.  So, DRM bridges
> are childs of the encoder, and the encoder (and attached encoder
> bridge chain) can be selected by the DRM connector.

Small note: The connector -> encoder pointer is only used for legacy
modesetting drivers. In atomic we shoveled it into drm_connector_state as
as derived state of the connector->crtc link (which is what setCrtc and
atomic ioctl set).

> However, you are correct that for different "tasks" like mode setting,
> or output probing, the representation is somewhat different - that's
> not really what I was talking about though, and I certainly was not
> talking about the userspace representation.
> 
> What I'm 100% concerned about is how this stuff looks in kernel space
> and what the driver(s) should look like.

Ah, I missed that. Some shared code and pointers in generic drivers to
untangle which exact drm_bridge owns the connector would certainly be
useful. Otoh I'm not aware of any real-world chaining existing yet, I
guess that's why this is unsolved.
 
> > > > So, in the case of TDA998x, we end up with the TDA998x providing a
> > > > connector, because it has connector functionality, and providing a
> > > > bridge.  The encoder is left to the KMS driver, which adds additional
> > > > complexity (100+ lines) to each and every KMS driver, requiring the
> > > > KMS driver to have much more knowledge of what's attached to the
> > > > "CRTC", so it can create these encoders itself.  I still think this
> > > > is a backwards step - maybe one step forwards, two backwards.
> > 
> > We've stubbed out everything that's in an encoder, you definitely don't
> > need hundreds of lines to write one any more. If there's still a callback
> > left around drm_encoder which has not yet suitable default handling, then
> > that's a bug.
> 
> Sorry, but I do need exactly what I've written above, I can talk rather
> definitively because I've actually got the code in front of me.  Most of
> the additional lines is due to the complexity added to the KMS driver to
> locate (actually for a third time) all the components in the system,
> specifically parsing the DT tree to find the "encoders" (or rather the
> TDA998x in this case), creating the DRM encoder objects, and binding the
> TDA998x bridge.
> 
> Here's the _exact_ diffstat for the hacky conversion so far (including
> something like the 10 patches I posted last weekend, which haven't had
> any comments yet):
> 
>  drivers/gpu/drm/armada/armada_drv.c | 125 +++++
>  drivers/gpu/drm/i2c/tda998x_drv.c   | 904 +++++++++++++++++-------------------
>  2 files changed, 560 insertions(+), 469 deletions(-)
> 
> The actual bridge conversion on its own is:
> 
>  drivers/gpu/drm/armada/armada_drv.c | 125 ++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i2c/tda998x_drv.c   | 139 ++++++++++++++----------------------
>  2 files changed, 180 insertions(+), 84 deletions(-)

Hm, this doesn't look good indeed ...

> > Note: Only applies to atomic though, I'm not going to bother with old
> > legacy crtc helpers. I guess armada needs to switch to atomic, otherwise
> > encoders are indeed a bit a pain.

... but I can't justify the effort really in making non-atomic drivers
look good personally. Not going to reject patches from others of course.

> That's not going to happen - and you know exactly why that's not going
> to happen - I've tried to do it and it failed misterably with all sorts
> of problems.  The idea that it can be done piecemeal as per your guide
> is a falacy - it can't be.  There is no progressive way to do a
> conversion.  It seems that KMS drivers need to be rewritten from
> scratch, and that means there is a high risk of introducing lots of
> new bugs.
> 
> I'm just not prepared to go through that - I'd much rather have a stable
> kernel driver that actually works than spend the next six months rewriting
> and debugging stuff just for the latest ideas about how stuff should be
> done.

gpus unfortunately change current idea of how stuff should work every few
years. It's part of the job ...

> _OR_ there could be more help from DRM to ease the transition pain from
> non-atomic to atomic KMS drivers, so that it can be done in appropriately
> sized steps, so that the driver can be adequately tested to ensure that
> things don't totally fall apart... you know, like imx-drm has gone from
> being a stable driver to keep on falling apart now that it's been
> converted to atomic modeset.

drm changes fast and I'm not entirely surprised that the conversion guide
isn't fully up-to-date any more. I can help with debugging issues
(preferrably on irc), but I can't magically fix bugs I'm not aware of.

But in the end it's your call to either convert to atomic, refactor the
core/helpers to need less dummy code for old modeset code or just carry
some dummy code around in your driver. Atomic is definitely the way to go
(since stuff like Android outright requires it to be useable).
 
> > Imo encoders should be that part which is baked into your core ip. If
> > there's nothing, then you're perfectly fine with a no-op encoder.
> 
> From my point of view, the TDA998x _is_ an encoder - it takes RGB and
> sync signals, and encodes them into the TDMS format for DVI or HDMI.
> I guess what I call an encoder is not what DRM calls an encoder though.
> 
> What's in the Dove is effectively a pair of CRTCs, some muxes, a set of
> VGA DACs and a parallel RGB bus with pixel clock and sync signals.
> Apart from the VGA DACs (which aren't used in the TDA998x path) it's
> pretty hard to imagine what piece of hardware could be called an
> encoder.
> 
> So what does the DRM encoder represent, hardware wise in this case?
> As I say, in my mind, the TDA998x _is_ the encoder.

Rule of thumb: If the encoder is created as an integrated part of the
overall display IP block, by the same IP company, it's probably best
represented by a drm_encoder. If otoh it's an external IP block, reused in
a bunch of places, it should be a drm_bridge. Think
s/drm_bridge/drm_non_integrated_encoder/ or similar. It would of course be
neat if the drm_encoder could be entirely no-op'ed out in that case, but
because drm_encoders are also part of the uabi (imo a design mistake) you
need to carry a dummy one around.

Other guideline: The split between drm_crtc and drm_encoder should
represent the display pipeline to outputs cross-bar (if you have one),
since that's how helpers handle different outputs. For some chips there's
a bit of generic per-output stuff, and hence it makes sense to both have
encoder code and a separate drm_bridge.
 
> > Maybe we
> > could do a helper for creating those, if the few lines are copypasted too
> > often. Then all the external IP should be bridges (and chained). And with
> > chains either you need another bridge, or you're the last bridge, and then
> > you're supposed to register the connector as the final endpoint.
> 
> Let me repeat: the "DRM connector" is part of the TDA998x - the TDA998x
> provides the EDID reading capabilities, and the connection detection
> capabilities.  It also provides the CEC communication capabilities as
> well, but that's not too relevant to this discussion, apart from
> illustrating that it's an all-in-one single chip solution to providing
> a full HDMI source implementation.
> 
> The TDA998x is not a stand-alone "bridge" which just _encodes_ a parallel
> RGB bus to TDMS signals, it's much more than that.  That's why I'm saying
> we can't separate out the connector functionality from the encoder
> functionality.

drm_bridge is meant to contain the connector, design-wise. Agreed that the
code and helpers leaves a few things to be desired in this area.

> > > I do agree that it's a step backward that we now have to search for
> > > a corresponding bridge, which we didn't have to do when the chip
> > > was represented as an encoder.
> > 
> > You can still do the exact same thing with bridges as with encoders using
> > the component framework. Should not be a step back at all.
> 
> Sorry, no you can't at the moment.  As I've already said, grep for
> "bridge_list".  Read the code in drivers/gpu/drm/drm_bridge.c, and
> notice that there's two places that this list is accessed:
> 
> 1. inside drm_bridge_add()
> 2. inside of_drm_find_bridge() which is only available when CONFIG_OF
>    is enabled, and requires a DT struct device_node pointer to perform
>    the lookup.  struct device_node's do not exist without DT.

Well, then it needs to be added. It's open-source after all ;-)

> > > > There's another issue with TDA998x - we now have audio support in
> > > > TDA998x, and converting TDA998x to be a DRM bridge introduces some
> > > > fundamental (and as yet unsolved) races between the ASoC code and
> > > > the attachment of the DRM bridge to the DRM encoder, and the detachment
> > > > when the DRM bridge/connector gets cleaned up.  Right now, there's no
> > > > bridge callback when the encoder or drm_device goes away, so doing
> > > > stuff like:
> > > > 
> > > > static int tda998x_audio_get_eld(struct device *dev, void *data,
> > > >                                  uint8_t *buf, size_t len)
> > > > {
> > > >         struct tda998x_priv *priv = dev_get_drvdata(dev);
> > > >         struct drm_mode_config *config;
> > > >         struct drm_connector *connector;
> > > >         int ret = -ENODEV;
> > > > 
> > > >         /* FIXME: This is racy */
> > > >         if (!priv->bridge.encoder || !priv->bridge.encoder->dev)
> > > >                 return ret;
> > > > 
> > > >         config = &priv->bridge.encoder->dev->mode_config;
> > > > 
> > > >         mutex_lock(&config->mutex);
> > > >         list_for_each_entry(connector, &config->connector_list, head) {
> > > >                 if (priv->bridge.encoder == connector->encoder) {
> > > >                         memcpy(buf, connector->eld,
> > > >                                min(sizeof(connector->eld), len));
> > > >                         ret = 0;
> > > >                 }
> > > >         }
> > > >         mutex_unlock(&config->mutex);
> > > > 
> > > > feels very unsafe - nothing really guarantees the validity of the
> > > > priv->bridge.encoder or priv->bridge.encoder->dev pointers.  The
> > > > config->mutex lock does nothing to solve this.  The same problem
> > > > also exists in tda998x_audio_hw_params().
> > > 
> > > Maybe we could ensure that the bridge attachment/detachment is
> > > contained within drm_encoder_init/cleanup funcs, so that their
> > > life is tied to the encoder drm_mode_object. It wouldn't be as
> > > straightforward, since the drm_bridges create connectors too.
> > > Will look more into this.
> > 
> > I don't see any issue with the above at all. Or well, if there is one
> > there's a larger issue: You can't reach this code if you unregister your
> > driver's interface _before_ you tear down anything. This is fixed by
> > getting rid of the load/unload callbacks. And for additional interfaces
> > there's new register/unregister callbacks on connectors (which the bridge
> > also should own).
> 
> That's easy to say if you're into the "lets rewrite everything all at
> the same time" mentality, which from your response I think sums up
> your position on everything from atomic mode set to this problem.
> 
> Sorry, I really hate the rewrite mentality, that's not good programming
> practice, especially when existing implementations work.  What's
> instead required are a series of incremental steps to effect the
> full outcome, especially when multiple drivers are involved.

We've been "rewriting" i915 to be atomic in small steps for 2 years now.
It works.

> If you look at the problems surrounding the removal of the
> drm_connector_register() from TDA998x, you'll see why this is important:
> it's not the drivers _with_ the mid-layer that's a problem here, but
> those which were converted prematurely, or written without using the
> mid-layer that are blocking the removal of drm_connector_register().
> 
> And the removal of drm_connector_register() from TDA998x blocks the
> removal of the mid-layer from armada, because removing the mid-layer
> _now_ causes the kernel to WARN - I know, I've tried it already:
> 
> [    1.933854] WARNING: CPU: 0 PID: 13 at /home/rmk/git/linux-cubox/lib/kobject.c:244 kobject_add_internal+0xfc/0x2d8
> [    1.944286] kobject_add_internal failed for card0-HDMI-A-1 (error: -2 parent: card0)
> 
> But... the mid-layer issue you raise is a complete red herring, the
> race has absolutely nothing to do with that.
> 
> What causes the race is that during the KMS driver's probing, we get
> to the point where tda998x_bind() is called.  This registers the
> DRM bridge so that the KMS driver can later find and attach to the
> bridge.
> 
> However, just before creating the DRM bridge, it also creates a
> platform device for the audio codec side.  As soon as that platform
> device is registered, ASoC is free to bind the audio subsystem and
> make it available to userspace.
> 
> This means that any of the tda998x_audio_* functions are able to
> be called from the point that this platform device is registered.
> 
> At this point, priv->bridge.encoder will be NULL, which means that
> some of the tda998x_audio_* functions should fail due to that.
> 
> KMS driver initialisation can continue, writing the various pointers,
> and that happens without locking - but at that stage, it should only
> be going from NULL pointers to non-NULL pointers pointing at valid
> memory.  However, there are no barriers to ensure that the various
> writes occur in the expected order (we're talking about writes in the
> KMS driver being visible to reads in the TDA998x audio side, possibly
> by another CPU, so locking isn't the answer - I can't see any way such
> a lock could be shared between TDA998x and various KMS drivers, or
> even some generic dummy DRM encoder helper.  Barriers may be the
> answer, we need to ensure that encoder->dev is always valid before
> bridge->encoder is valid.)
> 
> However, we need to also consider the initialisation failure and
> error clean up paths, assuming we have got this far - and that's
> where the worry is.  drm_encoder_cleanup() memsets the entire
> encoder to zero.  So, from the above, a compiler is perfectly at
> liberty to re-read the priv->bridge.encoder->dev pointer between
> these two statements:
> 
> 	if (!priv->bridge.encoder || !priv->bridge.encoder->dev)
> 		return ret;
> 
> 	config = &priv->bridge.encoder->dev->mode_config;
> 
> and if such a re-read co-incides with the memset() in
> drm_encoder_cleanup() becoming visible, this is a possible oops
> waiting to happen.
> 
> It gets worse if the KMS driver is responsible for freeing the DRM
> encoder that it created to attach to the TDA998x - if it frees that
> memory before tda998x_unbind() has been called, the audio subsystem
> will still be visible to userspace, and creates a potential
> use-after-free.
> 
> So, none of this has anything what so ever to do with "is the KMS
> driver mid-layered or not" - this problem can exist irrespective of
> whether I have armada mid-layered or de-mid-layered.

Not entirely clear to me from your description, but I think if the audio
platform device registration and unregistration is put into the new
connector register_late/unregister_early callbacks, and if the load/unload
sequence is fixed to register everything as the last step/unregister as
the first, then this should be fixed. And if the bridge owns the
connector, it can set these callbacks.

If it's not fixed then I need to take another look at your code, because
fixing these kind of issues was exactly the goal with the load/unload
reorg. We have a very similar problem in i915 on connectors, but with the
backlight interfaces.

> NB. It doesn't actually exist with armada, because armada is not used
> with the audio stuff on the cubox, we feed SPDIF to the TDA998x and
> let the TDA998x sort itself out, no audio codec is required there,
> but the point is that the complexities here are spread between
> TDA998x and associated KMS drivers - both have to be doing the
> right things for there not to be any subtle bugs here, and that is
> a really bad model.  As I've already said, the problem does not
> exist as the driver stands in mainline today, only once it is
> converted to drm bridge, and it's purely down to the way the bridge
> code works.  It is solvable, provided the connector remains part of
> TDA998x.
> 
> So, like everything, we need to go through a series of steps to make
> these changes, and these steps need to happen in the right order,
> not as one huge great big lets-change-everything-at-once kind of
> approach.
> 
> It's either going to take time, feeding changes into the kernel slowly,
> or it's going to need a lot of co-operation between different device
> driver authors, and sharing of stable commits between different git
> trees.
> 
> Right now, the drm_connector_register() thing is basically blocking
> everything, and that needs to be handled in a way that's acceptable to
> all parties.  The drm bridge conversion is something that can only
> happen once all the ducks are properly aligned - iow,
> drm_connector_register() gone, audio problems solved (eg, via the
> 10 patch series) and we have a way to convert TDA998x to a bridge
> without requiring every KMS user of TDA998x to simultaneously grow
> its own drm encoders.

tbh I think I'm lost in all the actual conversion issues at hand here. I
jumped into the discussion since there seemed to be some confusion going
on at higher levels.

Aside: This should be all documented in kernel-doc somewhere. If not
please raise this, I'll try to improve the docs - (rfc) doc patches very
much welcome of course too.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list