[PATCH 2/4] drm/dp_mst: Only create connector for connected end device

Lyude Paul lyude at redhat.com
Tue Oct 12 21:17:12 UTC 2021


OK - got sidetracked by an issue at work but I just resumed working on this
today, should hopefully have it done at the start of next week at the latest
(hooray for having time to do things upstream again! :).

On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Lyude Paul <lyude at redhat.com>
> > Sent: Thursday, September 2, 2021 6:00 AM
> > To: Lin, Wayne <Wayne.Lin at amd.com>; dri-devel at lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas at amd.com>; Wentland, Harry
> > <Harry.Wentland at amd.com>; Zuo, Jerry
> > <Jerry.Zuo at amd.com>; Wu, Hersen <hersenxs.wu at amd.com>; Juston Li
> > <juston.li at intel.com>; Imre Deak <imre.deak at intel.com>;
> > Ville Syrjälä <ville.syrjala at linux.intel.com>; Daniel Vetter
> > <daniel.vetter at ffwll.ch>; Sean Paul <sean at poorly.run>; Maarten Lankhorst
> > <maarten.lankhorst at linux.intel.com>; Maxime Ripard <mripard at kernel.org>;
> > Thomas Zimmermann <tzimmermann at suse.de>;
> > David Airlie <airlied at linux.ie>; Daniel Vetter <daniel at ffwll.ch>; Deucher,
> > Alexander <Alexander.Deucher at amd.com>; Siqueira,
> > Rodrigo <Rodrigo.Siqueira at amd.com>; Pillai, Aurabindo
> > <Aurabindo.Pillai at amd.com>; Bas Nieuwenhuizen
> > <bas at basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij at amd.com>; Jani
> > Nikula <jani.nikula at intel.com>; Manasi Navare
> > <manasi.d.navare at intel.com>; Ankit Nautiyal <ankit.k.nautiyal at intel.com>;
> > José Roberto de Souza <jose.souza at intel.com>; Sean
> > Paul <seanpaul at chromium.org>; Ben Skeggs <bskeggs at redhat.com>;
> > stable at vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > Actually - did some more thinking, and I think we shouldn't try to make
> > changes like this until we actually know what the problem is
> > here. I could try to figure out what the actual race conditions I was
> > facing before with trying to add/destroy connectors based on PDT,
> > but we still don't even actually have a clear idea of what's broken here.
> > I'd much rather us figure out exactly how this leak is
> > happening before considering making changes like this, because we have no
> > way of knowing if we've properly fixed it or not if we
> > don't know what the problem is in the first place.
> > 
> > I'm still happy to write up the topology debugging stuff I mentioned to
> > you if you think that would help you debug this issue - since
> > that would make it a lot easier for you to track down what references are
> > keeping a connector alive (and additkionally, where those
> > references were taken in code. thanks
> > stack_depot!)
> Hi Lyude,
> Sorry for late response. A bit busy on other stuff recently..
> 
> Really really thankful for all your help : ) I'm also glad to have the
> debugging tool if it won’t bother you too much. But before debugging,
> I need to have consensus with you about where do we expect to release
> resource allocated for a stream sink when it's reported as
> disconnected. Previous patch suggests releasing resource when connector is
> destroyed which will happen when topology refcount
> reaches zero (i.e. unplug mstb from topology). But when the case is
> receiving CSN notifying connection change, we don't try to destroy
> connector in this case now. And this is not caused by topology/malloc
> refcount leak since I don't expect neither one of them get
> decrease to zero under this case (topology of mstbs and ports is not
> changed). Hence, my plan was to also try to destroy connector under
> this case and the reason is reasonable to me as described in previous mail.
> With this patch set, I can see connectors eventually get
> successfully destroyed after userspace committing set_crtc() to free
> connectors (although also need a fix on the connector refcount
> grabbed by drm_client_modeset_probe() under specific scenario).
> 
> I think the main problem I encountered here is that I couldn't find a place
> that notify us to release resource allocated for a disconnected
> stream sink when receive CSN. If we decide not to destroy connector under
> this case, then I probably need some guidance about where
> to do the release work.
> 
> Thanks again Lyude!
> > 
> > On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote:
> > > (I am going to try responding to this tomorrow btw. I haven't been
> > > super busy this week, but this has been a surprisingly difficult email
> > > to respond to because I need to actually need to do a deep dive some
> > > of the MST helpers tomorrow to figure out more of the specifics on why
> > > I realized we couldn't just hot add/remove port->connector here).
> > > 
> > > On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
> > > > [Public]
> > > > 
> > > > > -----Original Message-----
> > > > > From: Lyude Paul <lyude at redhat.com>
> > > > > Sent: Tuesday, August 24, 2021 5:18 AM
> > > > > To: Lin, Wayne <Wayne.Lin at amd.com>;
> > > > > dri-devel at lists.freedesktop.org
> > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas at amd.com>; Wentland,
> > > > > Harry <Harry.Wentland at amd.com>; Zuo, Jerry <Jerry.Zuo at amd.com>;
> > > > > Wu, Hersen <hersenxs.wu at amd.com>; Juston Li <juston.li at intel.com>;
> > > > > Imre Deak <imre.deak at intel.com>; Ville Syrjälä
> > > > > <ville.syrjala at linux.intel.com>; Daniel Vetter
> > > > > <daniel.vetter at ffwll.ch>; Sean Paul <sean at poorly.run>; Maarten
> > > > > Lankhorst <maarten.lankhorst at linux.intel.com>; Maxime Ripard
> > > > > <mripard at kernel.org>; Thomas Zimmermann <tzimmermann at suse.de>;
> > > > > David Airlie <airlied at linux.ie>; Daniel Vetter <daniel at ffwll.ch>;
> > > > > Deucher, Alexander <Alexander.Deucher at amd.com>; Siqueira, Rodrigo
> > > > > <Rodrigo.Siqueira at amd.com>; Pillai, Aurabindo
> > > > > <Aurabindo.Pillai at amd.com>; Bas Nieuwenhuizen
> > > > > <bas at basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij at amd.com>;
> > > > > Jani Nikula <jani.nikula at intel.com>; Manasi Navare
> > > > > <manasi.d.navare at intel.com>; Ankit Nautiyal
> > > > > <ankit.k.nautiyal at intel.com>; José Roberto de Souza
> > > > > <jose.souza at intel.com>; Sean Paul <seanpaul at chromium.org>; Ben
> > > > > Skeggs <bskeggs at redhat.com>; stable at vger.kernel.org
> > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > connected end device
> > > > > 
> > > > > [snip]
> > > > > 
> > > > > I think I might still be misunderstanding something, some comments
> > > > > below
> > > > > 
> > > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > > > > > Hi Lyude,
> > > > > > > > 
> > > > > > > > Really thankful for willing to explain in such details.
> > > > > > > > Really appreciate.
> > > > > > > > 
> > > > > > > > I'm trying to fix some problems that observed after these 2
> > > > > > > > patches
> > > > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove
> > > > > > > > ->destroy_connector() callback
> > > > > > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > > > > > drm_dp_mst_topology_cbs.destroy_connector
> > > > > > > > 
> > > > > > > > With above patches, we now change to remove dc_sink when
> > > > > > > > connector is about to be destroyed. However, we found out
> > > > > > > > that connectors won't get destroyed after hotplugs. Thus,
> > > > > > > > after few times hotplugs, we won't create any new dc_sink
> > > > > > > > since number of sink is exceeding our limitation. As the
> > > > > > > > result of that, I'm trying to figure out why the refcount of
> > > > > > > > connectors won't get zero.
> > > > > > > > 
> > > > > > > > Based on my analysis, I found out that if we connect a sst
> > > > > > > > monitor to a mst hub then connect the hub to the system, and
> > > > > > > > then unplug the sst monitor from the hub. E.g.
> > > > > > > > src - mst hub - sst monitor => src - mst hub  (unplug) sst
> > > > > > > > monitor
> > > > > > > > 
> > > > > > > > Within this case, we won't try to put refcount of the sst
> > > > > > > > monitor.
> > > > > > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > > > > > But here comes a problem which is confusing me that if I can
> > > > > > > > destroy connector in this case. By comparing to another
> > > > > > > > case, if now mst hub is connected with a mst monitor like
> > > > > > > > this:
> > > > > > > > src - mst hub - mst monitor => src - mst hub  (unplug) mst
> > > > > > > > monitor
> > > > > > > > 
> > > > > > > > We will put the topology refcount of mst monitor's branching
> > > > > > > > unit in and
> > > > > > > > drm_dp_port_set_pdt() and eventually call
> > > > > > > > drm_dp_delayed_destroy_port() to unregister the connector of
> > > > > > > > the logical port. So following the same rule, I think to
> > > > > > > > dynamically unregister a mst connector is what we want and
> > > > > > > > should be reasonable to also destroy sst connectors in my
> > > > > > > > case. But this conflicts the idea what we have here. We want
> > > > > > > > to create connectors for all output ports.
> > > > > > > > So if dynamically creating/destroying connectors is what we
> > > > > > > > want, when is the appropriate time for us to create one is
> > > > > > > > what I'm considering.
> > > > > > > > 
> > > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance.
> > > > > > > > This hub, internally, is constructed by  3 1-to-2 mst branch
> > > > > > > > chips. 2 output ports of 1st chip are hardwired to another 2
> > > > > > > > chips. It's how it makes it to support 1-to-4 mst branching.
> > > > > > > > So within this case, the internal
> > > > > > > > 2 output ports of 1st chip is not connecting to a stream
> > > > > > > > sink and will never get connected to one.  Thus, I'm
> > > > > > > > thinking maybe the best timing to attach a connector to a
> > > > > > > > port is when the port is connected, and the connected PDT is
> > > > > > > > determined as a stream sink.
> > > > > > > > 
> > > > > > > > Sorry if I misunderstand anything here and really thanks for
> > > > > > > > your time to shed light on this : ) Thanks Lyude.
> > > > > > > 
> > > > > > > It's no problem, it is my job after all! Sorry for how long my
> > > > > > > responses have been taking, but my plate seems to be finally
> > > > > > > clearing up for the foreseeable future.
> > > > > > > 
> > > > > > > That being said - it sounds like with this we still aren't
> > > > > > > actually clear on where the topology refcount leak is
> > > > > > > happening - only when it's happening, which says to me that's
> > > > > > > the issue we really need to be figuring out the cause of as
> > > > > > > opposed to trying to workaround it.
> > > > > > > 
> > > > > > > Actually - refcount leaks is an issue I've ran into a number
> > > > > > > of times before in the past, so a while back I actually added
> > > > > > > some nice debugging features to assist with debugging leaks.
> > > > > > > If you enable the following options in your kernel config:
> > > > > > > 
> > > > > > > CONFIG_EXPERT=y # This must be set first before the next
> > > > > > > option CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > > > > > > 
> > > > > > > Unfortunately, I'm suddenly realizing after typing this that
> > > > > > > apparently I never bothered adding a way for us to debug the
> > > > 
> > > > > > > refcounts of ports/mstbs that haven't been released yet - only
> > > > > > > the ones for ones that have. This shouldn't be difficult at
> > > > > > > all for me to add, so I'll send you a patch either today or at
> > > > > > > the start of next week to try debugging with using this, and
> > > > > > > then we can figure out where this leak is really coming from.
> > > > > > 
> > > > > > Thanks Lyude!
> > > > > > 
> > > > > > Sorry to bother you, but I would like to clarify this again.  So
> > > > > > it sounds
> > > > > 
> > > > > It's no problem! It's my job and I'm happy to help :).
> > > > 
> > > > Thanks!
> > > > I would like to learn more from you as below : p
> > > > > 
> > > > > > like you also agree that we should destroy associated connector
> > > > > 
> > > > > Not quite. I think a better way of explaining this might be to
> > > > > point out that the lifetime of an MST port and its connector isn't
> > > > > supposed to be determined by whether or not it has something
> > > > > plugged into it - its lifetime is supposed to depend on whether
> > > > > there's a valid path from us down the MST topology to the port
> > > > > we're trying to reach. So an MSTB with ports that is unplugged
> > > > > would destroy all of its ports - but an unplugged port should just
> > > > > be the same as a disconnected DRM connector - even if the port
> > > > > itself is just hosting a branching device.
> > > > 
> > > > This is the part a bit difficult to me. I treat DRM connector as the
> > > > place where we associate with a stream sink. So if the statement is
> > > > "All DP mst output ports are places we connect with stream sink", I
> > > > would say false to this since I can find the negative example when
> > > > output port is connected with mst branch device. Thus, looks like we
> > > > could only determine whether to create a connector for an output
> > > > port when the peer device type is known?
> > > > > 
> > > > > Additionally - we don't want to try "delaying" connector creation
> > > > > either.
> > > > > In the modern world hotplugging is almost always reliable in
> > > > > normal situations, but even so there's still use cases for wanting
> > > > > force probing for analog devices on DP converters and just in
> > > > > general as it's a feature commonly used by developers or users
> > > > > working around monitors with problematic HPD issues or EDID issues.
> > > > 
> > > > I think I understand that why we want to create connectors for all
> > > > output ports here. But under these mentioned use cases, aren't we
> > > > still capable to force connector to enable stream? MST hub with
> > > > muti-functon capability, it will enumerate connected virtual DP peer
> > > > device.
> > > > For problematic HPD issues or EDID issues, their connection status
> > > > is also connected.
> > > > 
> > > > My understanding of output port is it is an internal node to help
> > > > construct an end-to-end virtual channel between a stream source
> > > > device and a stream sink device. Creating connectors for internal
> > > > nodes within a virtual channel is a bit hard for me to get the idea.
> > > > Please correct me if I misunderstand anything here. Thanks Lyude!
> > > > > 
> > > > > > when we unplug sst monitor from a mst hub in the case that I
> > > > > > described? In the case I described (unplug sst monitor), we only
> > > > > > receive CSN from the hub that notifying us the connection status
> > > > > > of one of its downstream output ports is changed to
> > > > > > disconnected. There is no topology refcount needed to be
> > > > > > decreased on this disconnected port but the malloc refcount.
> > > > > > Since the output port is still declared by
> > > > > 
> > > > > Apologies - I misunderstood your original mail as implying that
> > > > > topology refcounts were being leaked - but it sounds like it's
> > > > > actually malloc refcounts being leaked instead? In any case - that
> > > > > means we're still tracing down a leak, just a malloc ref leak.
> > > > > 
> > > > > But, this still doesn't totally make sense to me. Malloc refs only
> > > > > keep the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in
> > > > > memory.
> > > > > Nothing else is kept around, meaning the DRM connector (and I
> > > > > assume by proxy, the dc_sink) should both be getting dropped still
> > > > > and the only thing that should be leaked is a memory allocation.
> > > > > These things should instead be dropped once there's no longer any
> > > > > topology references around. So, are we _sure_ that the problem
> > > > > here is a missing
> > > > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
> > > > 
> > > > Just my two cents, I don't think it's leak of malloc ref neither. As
> > > > you said, malloc ref is dealing with the last step to free port/mstb.
> > > > If there is still topology refcount on port/mstb in my case, we
> > > > won't free port/mstb.
> > > > > 
> > > > > If we are unfortunately we don't have equivalent tools for
> > > > > malloc() tracing. I'm totally fine with trying to add some if we
> > > > > have trouble figuring out this issue, but I'm a bit suspicious of
> > > > > the commits you mentioned that introduced this problem. If the
> > > > > problem doesn't happen until those two commits, then it's
> > > > > something in the code changes there that are causing this problem.
> > > > 
> > > > I think we probably also have the problem before these commits, but
> > > > we didn't notice this before. Just when we change to clean up all
> > > > things in dm_dp_mst_connector_destroy(), I start to try to figure
> > > > out all these things out.
> > > > > 
> > > > > The main thing I'm suspicious of just from looking at changes in
> > > > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> > > > > amdgpu_dm_update_freesync_caps() that was previously in
> > > > > dm_dp_destroy_mst_connector() appears to be dropped and not
> > > > > re-added in (oh dear, this is a /very/ confusingly similar
> > > > > function
> > > > 
> > > > Lol. I also have hard time on this..
> > > > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this
> > > > > was intentional on my part, but does adding a call back to
> > > > > amdgpu_dm_update_freesync_caps() into
> > > > > dm_dp_destroy_mst_connector() right before the
> > > > > dc_link_remove_remote_sink() call fix anything?
> > > > > 
> > > > > As well, I'm far less suspicious of this one but does re-adding
> > > > > this
> > > > > hunk:
> > > > > 
> > > > >       aconnector->dc_sink = NULL;
> > > > >       aconnector->dc_link->cur_link_settings.lane_count = 0;
> > > > > 
> > > > > After dc_sink_release() fix anything either?
> > > > 
> > > > So the main problem is we don't have chance to call
> > > > dc_link_remove_remote_sink() in the unplugging SST case. We only
> > > > have chance to remove the remote sink of a link when unplug a mstb.
> > > > > 
> > > > > > the mst hub,  I think we shouldn't destroy the port. Actually,
> > > > > > no ports nor mst branch devices should get destroyed in this
> > > > > > case I think.
> > > > > > The result of LINK_ADDRESS is still the same before/after
> > > > > > removing the sst monitor except the
> > > > > > DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.
> > > > > > 
> > > > > > Hence, if you agree that we should put refcount of the connector
> > > > > > of the disconnected port within the unplugging sst monitor case
> > > > > > to release the allocated resource, it means we don't want to
> > > > > > create connectors for those disconnected ports. Which conflicts
> > > > > > current flow to create connectors for all declared output ports.
> > > > > > 
> > > > > > Thanks again for your time Lyude!
> > > > > 
> > > > > --
> > > > > Cheers,
> > > > >  Lyude Paul (she/her)
> > > > >  Software Engineer at Red Hat
> > > > --
> > > > Regards,
> > > > Wayne
> > > > 
> > > 
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> --
> Regards,
> Wayne Lin
> 

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat



More information about the dri-devel mailing list