[PATCH] WIP: drm/dp_mst: Add support for dumping topology ref histories from debugfs
Lin, Wayne
Wayne.Lin at amd.com
Mon Feb 7 07:19:33 UTC 2022
[AMD Official Use Only]
Thank you Lyude!
So, in short, I'm trying to register a call back function to listen to unplug event within MST topology.
Removal of local monitor from system can be detected by HPD pin and it therefore triggers HW interrupt.
We then release corresponding resource for the specific stream sink in sequence.
However, unplug event within MST topology is broadcasted by CSN sideband message which is handled by
drm layer in drm_dp_mst_topology.c, not by our driver. In order to follow the same sequence as we have
for removal of local monitor, I want to know is there any code path to notify driver the unplug event within
MST topology.
Currently, removal of remote monitors by removing its connected mst branch device from the topology, such
unplug event eventually call drm_dp_delayed_destroy_port() and we can try to release resource for those
stream sinks under this sequence. However, there is no code path for the case when remove stream sinks
directly from the topology (i.e. not remove monitors by indirectly removing its connected branch device).
As the result, the rough idea that I want to do is adding up a notification for this case.
Thank you for your time!
> -----Original Message-----
> From: Lyude Paul <lyude at redhat.com>
> Sent: Thursday, February 3, 2022 4:12 AM
> To: Lin, Wayne <Wayne.Lin at amd.com>
> Cc: dri-devel at lists.freedesktop.org
> Subject: Re: [PATCH] WIP: drm/dp_mst: Add support for dumping topology ref histories from debugfs
>
> JFYI I should be able to respond to this soon, I think I'm following your
> issue a bit more (haven't gotten to go through the patches too so I'm probably
> going to see if I can reproduce it locally as well. Sorry for the delay -
> dealing with some X.org stuff and work issues at the moment
>
> On Tue, 2022-01-25 at 08:32 +0000, Lin, Wayne wrote:
> > [Public]
> >
> > Hey Lyude,
> >
> > Sorry, just got chance to get back to this and BIG THANKS to elaborating in
> > such details!
> > Following are some of my comments : )
> >
> > > -----Original Message-----
> > > From: Lyude Paul <lyude at redhat.com>
> > > Sent: Wednesday, January 19, 2022 9:59 AM
> > > To: Lin, Wayne <Wayne.Lin at amd.com>
> > > Cc: dri-devel at lists.freedesktop.org
> > > Subject: Re: [PATCH] WIP: drm/dp_mst: Add support for dumping topology ref
> > > histories from debugfs
> > >
> > > Hey - so, the original issue here was that payloads were not always
> > > deleted when we expected them to be - correct?
> >
> > Actually, what I was trying to do is not relevant to payload maintenance.
> > What I wanted to do is having a way to notify
> > amdgpu to release its enumerated resource "dc_sink" when unplug event
> > occurs. Currently, under specific scenarios,
> > we can't release remote "dc_sink" and ends up can't allocate resource for
> > new connected monitors. I'll explain the scenario
> > underneath.
> >
> > So, get back to the main issue that I was trying to solve when I pushed the
> > patchset:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Fdri-
> devel%2Fcover%2F20210720160342.11415-1-
> Wayne.Lin%40amd.com%2F&data=04%7C01%7Cwayne.lin%40amd.com%7C2e19d9321b5f4e948c1208d9e6884f24%7C3dd8961fe488
> 4e608e11a82d994e183d%7C0%7C0%7C637794295422403301%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiL
> CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=rvtwQioO3pT1BoHb7S532msRa61gP1vIyE8AB7M9I4E%3D&reserved=0
> >
> > I was trying to fix some problems that I observed after these 2 patches
> > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector() callback
> > * 72dc0f51591 drm/dp_mst: Remove drm_dp_mst_topology_cbs.destroy_connector
> >
> > With above patches, it changes our flow to remove dc_sink (by calling
> > dc_link_remote_sink()) when connector is about to be
> > destroyed. And let's see below use cases:
> > a). Disconnection happens between mst branch devices
> > e.g.
> > src - 1st_mstb - 2nd_mstb - sst_monitor => src - 1st_mstb (disconnect)
> > 2nd_mstb - sst_monitor
> >
> > In above case, after receiving CSN, we will put topology reference count of
> > 2nd mstb and its port which connected to the sst monitor.
> > As you suggested before, we will destroy 2nd mst branch device and its all
> > ports due to there is no more mst path can route to them.
> > As the result of that, we at end call drm_dp_delayed_destroy_port() and
> > unregister/put the drm connector and eventually call
> > dc_link_remote_sink() to release the dc_sinck we enumerate for the specific
> > stream sink of sst_monitor.
> >
> > However, in below case:
> > b). Disconnection happens between mst branch devices and SST monitor
> > e.g.
> > src - 1st_mstb - sst_monitor => src - 1st_mstb (disconnect) sst_monitor
> >
> > In this case, which is the case having problem, it definitely won't decrease
> > the topology references count of the port of 1st_mstb which
> > was connected to the sst monitor to zero since the port is still existing in
> > the topology. Same as the malloc reference since the port can't get
> > destroyed. Hence, the port still exists and we won't call
> > drm_dp_delayed_destroy_port() to unregister and put the drm connector. As
> > the
> > result of that, we can't release corresponding remote dc_sink under this
> > case. And after few times hotplugs, we can't create any new
> > dc_sink since the number of stream sink is exceeding our limitation.
> >
> > Hence, what I'm thinking here is to register a callback function for
> > notifying us that the remote sink is detached. This just aligns what we do
> > for handling long HPD event (unplug monitor from system) of legacy DP (none
> > mst). For which case, once we detect long HPD event
> > indicating the monitor is detached, we will immediately try to release the
> > dc_link->local_sink and fire hotplug event to upper layer.
> > Same as here, once receives a CSN message notifying a drm connector is
> > changed from connected to disconnected, trigger the callback
> > function and we can try to release the dc_sink resource.
> >
> > >
> > > If I'm remembering that correctly, then (and I'm not 100% sure on this
> > > yet) I might already have noticed something very possibly wonky in
> > > how we handle payload allocations currently, that I found while working on
> > > the non-atomic removal branch that I linked to you in my
> > > previous response. Basically, in the
> > > step1() function it looks like that we follow this general flow with
> > > updating
> > > payloads:
> > >
> > > * Loop through proposed payloads and compare to previously programmed
> > > payloads
> > > - If a payload has a different allocation then it previously did,
> > > update the
> > > payload
> > > - If the payload is new, create it
> > > - If a payload no longer has an allocation, remove the payload
> > >
> > > At first glance this seems totally correct - but I'm not actually entirely
> > > convinced it is? Particularly because it seems like the order in which
> > > we do creation/deletion of payloads is totally arbitrary. To explain what
> > > I mean by that, consider a state transition like this:
> > >
> > > vcpi_slots=15 vcpi_slots=35 vcpi_slots=14
> > > > 1 | 2 |xxxxxxxx|
> > >
> > > Let's say we want to increase payload #1 from 15 to 50, and disable
> > > payload #2 in the same atomic commit on DRM's side. If the order we
> > > update payloads is entirely arbitrary, we could end up doing this:
> > >
> > > * Increase VCPI slots payload #1 from 15->50 (total VCPI slots=99,
> > > overflow!)
> > > * Decrease VCPI slots payload #2 from 35->0 (total VCPI slots=50)
> > >
> > > Notice on the first step, we've technically overflowed the available
> > > number of VCPI slots in the payload table. This is still before step 2
> > > though, and I could be totally wrong here - perhaps this is entirely OK in
> > > the real world, and we're allowed to overflow VC slots as long as
> >
> > I think it's forbidden to allocate time slots beyond 64 when we're updating
> > payload ID table of the branch in the real world. Branch probably
> > will NACK the DPCD write.
> >
> > > we repair the issue before step 2. But so far I can't seem to find
> > > anything in the DP 2.0 spec that explicitly states this would be OK -
> > > which
> > > makes me think we might fail some payload allocations if we don't always
> > > ensure we avoid overflows in between VC payload table changes
> >
> > Agree.
> > For amdgpu, I think we always do deallocation (call hws-
> > >funcs.reset_hw_ctx_wrap()) before allocation
> > (apply_single_controller_ctx_to_hw()).
> >
> > > Note that avoiding overflows would be as simple as just making sure we
> > > send all VC payload table changes that free VC slots before sending
> > > any that take new slots.
> > >
> > > Again - I haven't actually confirmed this yet and am hoping to test stuff
> > > like this very soon as I'm pretty close finishing up the initial attempt
> > > at removing the non-atomic MST code in the DRM helpers now. If my theory
> > > ends up being correct here, I can fix this in my rewrite of the
> > > MST payload management code. But I figured it might be worth mentioning in
> > > the mean time in case you think it might be relevant to the
> > > problem here :).
> > >
> > > On Wed, 2022-01-12 at 16:47 -0500, Lyude Paul wrote:
> > > > (CC'ing this to dri-devel, since this is basically patch review)
> > > >
> > > > Alright - so first off sorry about the broken debugging patch! I
> > > > thought I had tested that but I guess I hadn't tested it well enough,
> > > > I'll get it fixed asap, but feel free to try getting to it before I
> > > > do.
> > > >
> > > > As for this patch... (comments below)
> > > >
> > > > On Mon, 2021-12-20 at 02:17 +0000, Lin, Wayne wrote:
> > > > > [AMD Official Use Only]
> > > > >
> > > > > Hi Lyude,
> > > > >
> > > > > No rush and thanks for your time! : ) Just want to help clarify what
> > > > > I mean that "registering a callback function"
> > > > > for CSN of disconnecting
> > > > > stream sink device (not mst branch case). Attach below the tentative
> > > > > patch that I planned to do. Thanks so much!
> > > > >
> > > > > Regards,
> > > > > Wayne
> > > > > ---
> > > > > .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 53
> > > > > +++++++++++++++++++
> > > > > drivers/gpu/drm/drm_dp_mst_topology.c | 16 +++++-
> > > > > include/drm/drm_dp_mst_helper.h | 1 +
> > > > > 3 files changed, 68 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git
> > > > > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > > > index cc34a35d0bcb..d7343c64908c 100644
> > > > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > > > @@ -472,8 +472,61 @@ dm_dp_add_mst_connector(struct
> > > > > drm_dp_mst_topology_mgr *mgr,
> > > > > return connector;
> > > > > }
> > > > >
> > > > > +void dm_dp_notify_csn_disconnection(struct drm_connector
> > > > > +*connector) {
> > > > > + struct amdgpu_dm_connector *aconnector =
> > > > > + to_amdgpu_dm_connector(connector);
> > > > > + struct dc_sink *dc_sink = aconnector->dc_sink;
> > > > > + struct dc_link *dc_link = aconnector->dc_link;
> > > > > + struct amdgpu_device *adev = drm_to_adev(ddev);
> > > > > +
> > > > > + ASSERT(dc_link);
> > > > > +
> > > > > + if (dc_sink) {
> > > > > + mutex_lock(&adev->dm.dc_lock);
> > > > > +
> > > > > + /*clear the remote sink of the link*/
> > > > > + dc_link_remove_remote_sink(dc_link, dc_sink);
> > > > > + dc_sink_release(dc_sink);
> > > > > + aconnector->dc_sink = NULL;
> > > > > +
> > > > > + mutex_unlock(&adev->dm.dc_lock);
> > > > > + }
> > > > > +}
> > > > > +
> > > > > static const struct drm_dp_mst_topology_cbs dm_mst_cbs = {
> > > > > .add_connector = dm_dp_add_mst_connector,
> > > > > + .notify_csn_disconnection = dm_dp_notify_csn_disconnection,
> > > > > };
> > > >
> > > > I still don't really think this is a good idea. This seems like it's
> > > > just adding another hotplugging path to the code in order to avoid
> > > > sending hotplugs for non-endpoint devices. In addition to the
> >
> > So, what I wanted to do is to add a callback function for endpoint devices
> > case as described above. For disconnection of non-endpoint devices is
> > already taken care by current code.
> >
> > > > drm_connector issues I mentioned before, we also really need to stop
> > > > doing any kind of payload maintence in hotplugging paths. The reality
> > > > is any kind of payload maintanence we do outside of normal modesetting
> > > > paths is a relic from legacy modesetting that I'm dropping ASAP (more
> > > > on this below), and we can't keep adding to it because it dramatically
> > > > complicates maintanence as well.
> >
> > > >
> > > > Sorry for repeating this point so often but - the biggest issue too is
> > > > I'm still not sure what it is we're even avoiding here. We know
> > > > resources aren't released consistently, and that we're able to
> > > > reproduce the behavior with repeated hotplugs. We also know that if we
> > > > skip sending certain hotplug events, that fixes the issue. And we know
> > > > we can workaround it by adding a special case for forcing a payload
> > > > release in DC. But none of those actually tell us exactly what piece
> > > > of code is leaking and why, which means that any workarounds we're
> > > > putting in to avoid this mysterious guilty section of code we don't
> > > > entirely understand either - which means we're just adding more code
> > > > in that no one actually fully understands. This just ends up making
> > > > maintence difficult because every change in code nearby workarounds
> > > > like this has to strugle to try to figure out said workarounds in
> > > > order to avoid breaking things.
> >
> > Thank you Lyude for the reminder and I totally agree with you by avoiding
> > adding workaround since it will get the code maintenance harder. And I
> > think the root cause of the issue that I observed before is quite concrete.
> > There is no code path for us to release dc_sink under the unplug scenario
> > that I described above.
> >
> > > >
> > > > I'm actually currently running into these "later" issues right now, as
> > > > recently I've (-actively-, finally!!!) been working on trying to
> > > > remove as much non-atomic MST as I can because. As it turns out - a
> > > > huge amount of the payload maintanence code just isn't needed at all
> > > > when we stop caring about non-atomic drivers and stick everything in
> > > > atomic state structs. Step 1 for updating updating the payload table,
> > > > e.g. drm_dp_update_payload_part1(), is a great example of how messy
> > > > things have become. Here's a small sample of some of the stuff I've
> > > > seen from just that one function so far that either just don't make
> > > > sense here or is totally redundant. I should note that a lot of these
> > > > things also come from patches I reviwed, but didn't really look at as
> > > > closely as I should have because I was swamped at work, some are
> > > > historical artifacts, and others are less-than-ideal patches I got
> > > > wrote myself when I was first started working on MST and didn't know
> > > > the code as well as I do
> > > > now:
> > > >
> > > > * We try to avoid some sort of userspace issue by using
> > > > drm_dp_mst_port_downstream_of_branch() to avoid releasing payloads
> > > > for a
> > > > branch if we can't prove it's downstream of the top of the
> > > > topology. This
> > > > seems to workaround either a userspace bug. This is a redundant,
> > > > since
> > > > that's what topology refs are already supposed to be doing to the
> > > > extent is
> > > > reasonably possibly. It's also unfortunately racy solution because
> > > > we have
> > > > to be able to handle the situation where a connector is removed
> > > > from under
> > > > us. That can happen at any time, including _immediately_ after we
> > > > call
> > > > drm_dp_mst_port_downstream_of_branch() - rendering the call not
> > > > really
> > > > useful.
> > > > * If we fail to validate the sink in drm_dp_update_payload_part, we
> > > > don't
> > > > update the payload table. I think at best this solution is racy and
> > > > not
> > > > useful, at worst it leaves us with a payload table that doesn't
> > > > match what
> > > > we attempted to set in the atomic state - which at worst brings us
> > > > into
> > > > undefined territory where we're just plain out of sync with the
> > > > reality in
> > > > hw.
> > > > * Actually fun fact - mgr->payloads and mgr->proposed_vcpis both can
> > > > and
> > > > definitely should be removed entirely. All of the info for
> > > > mgr->payloads
> > > > could just be in the atomic state, because that + the magic of
> > > > atomic state
> > > > duplication means we'll also have an accurate view of the previous
> > > > state's
> > > > payload allocations: which renders mgr->proposed_vcpis redundant.
> > > >
> > > > Apologies for the long explanation again, but I hope that explains my
> >
> > Really really thanks for your kindly help on this : )
> > I'll try my best to get to your WIP branch soon. Thank you Lyude!
> >
> > > > point here a bit. I'm going to be trying to get to moving amdgpu's DSC
> > > > code out of amdgpu and into DRM helpers as well soon, so I'm really
> > > > determined to clean stuff up beforehand as every time I've done so
> > > > it's become substantially easier to make changes to this code. Things
> > > > used to be even worse before I started cleaning things up 2 or 3 years
> > > > ago, where simple changes would end up getting me stuck spending hours
> > > > trying to dig through lockdep or memory manegement issues. As well, I
> > > > would be entirely unsurprised if bugs like this very payload leak
> > > > we're working on just disappear once we've gotten rid of all the
> > > > extraneous workarounds and state tracking here - especially with how
> > > > many special cases we have for maintaining the payload table right
> > > > now. That's certainly ended up being the case in the past with a
> > > > number of other difficult to track down issues I've dealt with in MST.
> > > >
> > > > Anyhow. I've been way more productive this year then last because I
> > > > had over a month off work and am finally not super burnt out from my
> > > > job, and so far I've been making progress on the payload state cleanup
> > > > far faster then I was last year :). I think if you can't figure out
> > > > where the leak is coming from even once I get the debugging patches I
> > > > mentioned fixed up, it might be a good idea for us to try again after
> > > > I've got some of this code cleaned up. I've got a currently WIP branch
> > > > here:
> > > >
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> > > > ab.freedesktop.org%2Flyudess%2Flinux%2F-%2Fcommit%2F624dd68fa804e64b5b
> > > > 2060e4735d7317090692b5&data=04%7C01%7Cwayne.lin%40amd.com%7Cd4bdc7
> > > > 59eb274bfccc8208d9daef41fe%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%
> > > > 7C637781543416430632%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> > > > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=nFpVl%2BVoF
> > > > 40JPabfRMcR6Cz7ZHDwt%2BBpLDBNdXftJaA%3D&reserved=0
> > > >
> > > > >
> > > > > void amdgpu_dm_initialize_dp_connector(struct
> > > > > amdgpu_display_manager *dm, diff --git
> > > > > a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > index 857c5d15e81d..a70e26c5d084 100644
> > > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > @@ -2508,6 +2508,7 @@ drm_dp_mst_handle_conn_stat(struct
> > > > > drm_dp_mst_branch *mstb,
> > > > > u8 new_pdt;
> > > > > bool new_mcs;
> > > > > bool dowork = false, create_connector = false;
> > > > > + bool notify_unplug_event = false;
> > > > >
> > > > > port = drm_dp_get_port(mstb, conn_stat->port_number);
> > > > > if (!port)
> > > > > @@ -2520,6 +2521,9 @@ drm_dp_mst_handle_conn_stat(struct
> > > > > drm_dp_mst_branch *mstb,
> > > > > * port, so just throw the port out and make
> > > > > sure we
> > > > > * reprobe the link address of it's parent
> > > > > MSTB
> > > > > */
> > > > > + /* should also consider notify_unplug_event
> > > > > here.
> > > > > + * but it's not a normal case for products
> > > > > +in the
> > > > > market
> > > > > + */
> > > > > drm_dp_mst_topology_unlink_port(mgr, port);
> > > > > mstb->link_address_sent = false;
> > > > > dowork = true; @@ -2541,10 +2545,14 @@
> > > > > drm_dp_mst_handle_conn_stat(struct
> > > > > drm_dp_mst_branch
> > > > > *mstb,
> > > > > port->ddps = conn_stat->displayport_device_plug_status;
> > > > >
> > > > > if (old_ddps != port->ddps) {
> > > > > - if (port->ddps && !port->input)
> > > > > + if (port->ddps && !port->input) {
> > > > > drm_dp_send_enum_path_resources(mgr, mstb,
> > > > > port);
> > > > > - else
> > > > > + } else {
> > > > > port->full_pbn = 0;
> > > > > + if (port->connector &&
> > > > > + drm_dp_mst_is_end_device(port->pdt,
> > > > > +port-
> > > > > > mcs))
> > > > > + notify_unplug_event = true;
> > > > > + }
> > > > > }
> > > > >
> > > > > new_pdt = port->input ? DP_PEER_DEVICE_NONE : conn_stat-
> > > > > > peer_device_type;
> > > > > @@ -2557,11 +2565,15 @@ drm_dp_mst_handle_conn_stat(struct
> > > > > drm_dp_mst_branch
> > > > > *mstb,
> > > > > dowork = false;
> > > > > }
> > > > >
> > > > > + if (notify_unplug_event &&
> > > > > +mgr->cbs->notify_csn_disconnection)
> > > > > + mgr->cbs->notify_csn_disconnection(port->connector);
> > > > > +
> > > > > if (port->connector)
> > > > > drm_modeset_unlock(&mgr->base.lock);
> > > > > else if (create_connector)
> > > > > drm_dp_mst_port_add_connector(mstb, port);
> > > > >
> > > > > +
> > > > > out:
> > > > > drm_dp_mst_topology_put_port(port);
> > > > > if (dowork)
> > > > > diff --git a/include/drm/drm_dp_mst_helper.h
> > > > > b/include/drm/drm_dp_mst_helper.h index 78044ac5b59b..ff9e47729841
> > > > > 100644
> > > > > --- a/include/drm/drm_dp_mst_helper.h
> > > > > +++ b/include/drm/drm_dp_mst_helper.h
> > > > > @@ -525,6 +525,7 @@ struct drm_dp_mst_topology_cbs {
> > > > > * IRQ pulse handler.
> > > > > */
> > > > > void (*poll_hpd_irq)(struct drm_dp_mst_topology_mgr *mgr);
> > > > > + void (*notify_csn_disconnection)(struct drm_connector
> > > > > +*connector);
> > > > > };
> > > > >
> > > > > #define DP_MAX_PAYLOAD (sizeof(unsigned long) * 8)
> > > > > --
> > > > > 2.31.0
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Lin, Wayne
> > > > > > Sent: Wednesday, December 8, 2021 11:39 AM
> > > > > > To: 'Lyude Paul' <lyude at redhat.com>
> > > > > > Subject: RE: [PATCH] WIP: drm/dp_mst: Add support for dumping
> > > > > > topology ref histories from debugfs
> > > > > >
> > > > > > No worries Lyude!
> > > > > > Thanks for keeping helping on this. Take your time : )
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Lyude Paul <lyude at redhat.com>
> > > > > > > Sent: Wednesday, December 8, 2021 7:05 AM
> > > > > > > To: Lin, Wayne <Wayne.Lin at amd.com>
> > > > > > > Subject: Re: [PATCH] WIP: drm/dp_mst: Add support for dumping
> > > > > > > topology ref histories from debugfs
> > > > > > >
> > > > > > > Sorry! I will try to get to this tomorrow, if not then sometime
> > > > > > > this week.
> > > > > > >
> > > > > > > On Tue, 2021-11-30 at 08:41 +0000, Lin, Wayne wrote:
> > > > > > > > [Public]
> > > > > > > >
> > > > > > > > Hi Lyude,
> > > > > > > >
> > > > > > > > Finally have some bandwidth to get back to this problem!
> > > > > > > > I roughly went through this patch and I'm just aware that we
> > > > > > > > already have such kind of convenient tool for a while.
> > > > > > > > I think it's definitely useful for us to track port/mstb
> > > > > > > > reference count issues and I'll start to embrace this feature
> > > > > > > > for cleaning up those issues. Thank you Lyude!
> > > > > > > >
> > > > > > > > However, I think the issue that I was trying to fix is not
> > > > > > > > related to what you suggested:
> > > > > > > > " The idea here is that if stream resources aren't being
> > > > > > > > released, my guess would be that we're not dropping topology
> > > > > > > > references for the port which means the connector never goes
> > > > > > > > away."
> > > > > > > > The issue I was trying to fix is about releasing
> > > > > > > > dc_link->remote_sinks while receiving a CSN message notifying
> > > > > > > > the connection status of a sst connector of a port changed
> > > > > > > > from connected to disconnected. Not the connection status
> > > > > > > > changed of a mst branch device.
> > > > > > > > e.g.
> > > > > > > > src - 1st_mstb - 2nd_mstb - sst_monitor => src - 1st_mstb
> > > > > > > > (disconnect) 2nd_mstb - sst_monitor
> > > > > > > >
> > > > > > > > In above case, after receiving CSN, we will put topology
> > > > > > > > references of 2nd mstb and its port which is connected with
> > > > > > > > the sst monitor. As the result of that, we can call
> > > > > > > > drm_dp_delayed_destroy_port() to unregister and put the drm
> > > > > > > > connector.
> > > > > > > >
> > > > > > > > However, in below case:
> > > > > > > > e.g.
> > > > > > > > src - 1st_mstb - sst_monitor => src - 1st_mstb (disconnect)
> > > > > > > > sst_monitor
> > > > > > > >
> > > > > > > > In this case, which is the case having problem, it definitely
> > > > > > > > won't decrease the topology references count of the port which
> > > > > > > > was connected to the sst monitor to zero since the port is
> > > > > > > > still existing in the topology. Same as the malloc reference
> > > > > > > > since the port can't get destroyed. Hence, the port still
> > > > > > > > exists and we won't call
> > > > > > > > drm_dp_delayed_destroy_port() to unregister and put the drm
> > > > > > > > connector.
> > > > > > > > I looked up the code and drm_dp_delayed_destroy_port() seems
> > > > > > > > like the only place to call drm_connector_put() which means we
> > > > > > > > can't put reference count of drm connector under this case and
> > > > > > > > can't release dc_sink resource by destroying drm connector.
> > > > > > > >
> > > > > > > > I would also like to point out that this resource
> > > > > > > > (remote_sinks) is specific to different stream sinks. So if
> > > > > > > > we're trying to release this dc_sink resource by destroying
> > > > > > > > the drm connector, it conflicts the idea that you suggested
> > > > > > > > before that we should always keep the drm connector until it's
> > > > > > > > no longer reachable in the topology.
> > > > > > > > Releasing dc_sink should be binding with the disconnection
> > > > > > > > event.
> > > > > > > >
> > > > > > > > I understand your concern that we should not just easily
> > > > > > > > change the logic here since it's the result after solving tons
> > > > > > > > of bugs before and might cause other side effect. So, just my
> > > > > > > > 2 cents, what I'm thinking is to register a callback function
> > > > > > > > for our driver to notify us that the remote sink is detached.
> > > > > > > > This just aligns our flow handling long HPD event of legacy
> > > > > > > > (sst) DP.
> > > > > > > > For sst case, once we detect long HPD event indicating the
> > > > > > > > monitor is detached, we will immediately try to release the
> > > > > > > > dc_link->local_sink and fire hotplug event to upper layer.
> > > > > > > > Same as here, once receives a CSN message notifying a drm
> > > > > > > > connector is changed from connected to disconnected, trigger
> > > > > > > > the callback function and we can try to release the dc_sink
> > > > > > > > resource.
> > > > > > > >
> > > > > > > > Would like to know your thought and insight please : )
> > > > > > > >
> > > > > > > > Btw, I got some errors and warnings while building and have
> > > > > > > > some code adjustments as below : ) Thank you Lyude for your
> > > > > > > > always kindly help!!
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Wayne
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Lyude Paul <lyude at redhat.com>
> > > > > > > > > Sent: Wednesday, November 3, 2021 7:15 AM
> > > > > > > > > To: Lin, Wayne <Wayne.Lin at amd.com>
> > > > > > > > > Subject: [PATCH] WIP: drm/dp_mst: Add support for dumping
> > > > > > > > > topology ref histories from debugfs
> > > > > > > > >
> > > > > > > > > TODO:
> > > > > > > > > * Implement support for i915
> > > > > > > > > * Finish writing this commit message
> > > > > > > > > * ???
> > > > > > > > > * profit
> > > > > > > > >
> > > > > > > > > Signed-off-by: Lyude Paul <lyude at redhat.com>
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Hey wayne! SO-hopefully if I did this correctly then this
> > > > > > > > > should just work on amdgpu. What this patch should do is add
> > > > > > > > > a debugfs file to amdgpu called
> > > > > > > > > "amdgpu_dp_mst_topology_refs", and when you read the file it
> > > > > > > > > should print out the topology reference history of every
> > > > > > > > > MSTB and Port in memory, along with how many times we've hit
> > > > > > > > > the codepath in each backtrace. An example:
> > > > > > > > >
> > > > > > > > > Port DP-5 (0000000005c37748) topology ref history:
> > > > > > > > > 1 gets (last at 58.468973):
> > > > > > > > > drm_dp_send_link_address+0x6a5/0xa00 [drm_kms_helper]
> > > > > > > > > drm_dp_check_and_send_link_address+0xad/0xd0
> > > > > > > > > [drm_kms_helper]
> > > > > > > > > drm_dp_mst_link_probe_work+0x14e/0x1a0 [drm_kms_helper]
> > > > > > > > > process_one_work+0x1e3/0x390
> > > > > > > > > worker_thread+0x50/0x3a0
> > > > > > > > > kthread+0x124/0x150
> > > > > > > > > ret_from_fork+0x1f/0x30
> > > > > > > > > 1 puts (last at 58.469357):
> > > > > > > > > drm_dp_mst_topology_put_port+0x6a/0x210
> > > > > > > > > [drm_kms_helper]
> > > > > > > > > drm_dp_send_link_address+0x39e/0xa00 [drm_kms_helper]
> > > > > > > > > drm_dp_check_and_send_link_address+0xad/0xd0
> > > > > > > > > [drm_kms_helper]
> > > > > > > > > drm_dp_mst_link_probe_work+0x14e/0x1a0 [drm_kms_helper]
> > > > > > > > > process_one_work+0x1e3/0x390
> > > > > > > > > worker_thread+0x50/0x3a0
> > > > > > > > > kthread+0x124/0x150
> > > > > > > > > ret_from_fork+0x1f/0x30
> > > > > > > > >
> > > > > > > > > The idea here is that if stream resources aren't being
> > > > > > > > > released, my guess would be that we're not dropping topology
> > > > > > > > > references for the port which means the connector never goes
> > > > > > > > > away. So, if that's really the case then once we unplug the
> > > > > > > > > offending connector we should be able to find a pair of
> > > > > > > > > gets/puts for the offending leaked connector where the
> > > > > > > > > get/put count doesn't match up. Also, if the frame count on
> > > > > > > > > the backtrace isn't long enough you can increase the value
> > > > > > > > > of STACK_DEPTH in drivers/gpu/drm/drm_dp_mst_topology.c and
> > > > > > > > > recompile to get more frames.
> > > > > > > > >
> > > > > > > > > To enable this, first enable CONFIG_EXPERT for your kernel
> > > > > > > > > which will unhide CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS.
> > > > > > > > > Then just enable CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS,
> > > > > > > > > recompile, and it should be good to go.
> > > > > > > > >
> > > > > > > > > Let me know if this works for you, and hopefully this should
> > > > > > > > > tell us exactly what the problem actually is here.
> > > > > > > > >
> > > > > > > > > .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 35 ++++
> > > > > > > > > drivers/gpu/drm/drm_dp_mst_topology.c | 173
> > > > > > > > > ++++++++++++++----
> > > > > > > > > drivers/gpu/drm/nouveau/nouveau_debugfs.c | 35 ++++
> > > > > > > > > include/drm/drm_dp_mst_helper.h | 18 ++
> > > > > > > > > 4 files changed, 228 insertions(+), 33 deletions(-)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > diff --git
> > > > > > > > > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
> > > > > > > > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
> > > > > > > > > index 1a68a674913c..1a14732c52b4 100644
> > > > > > > > > ---
> > > > > > > > > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugf
> > > > > > > > > +++ s.c
> > > > > > > > > @@ -3063,6 +3063,37 @@ static int mst_topo_show(struct
> > > > > > > > > seq_file *m, void
> > > > > > > > > *unused)
> > > > > > > > > return 0;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +#ifdef CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS
> > > > > > > > > +static int mst_topology_ref_dump_show(struct seq_file *m,
> > > > > > > > > +void
> > > > > > > > > +*unused) {
> > > > > > > > > + struct amdgpu_device *adev = (struct amdgpu_device
> > > > > > > > > +*)m->private;
> > > > > > > > > + struct drm_device *dev = adev_to_drm(adev);
> > > > > > > > > + struct drm_connector *connector;
> > > > > > > > > + struct drm_connector_list_iter conn_iter;
> > > > > > > > > + struct amdgpu_dm_connector *aconnector;
> > > > > > > > > +
> > > > > > > > > + drm_connector_list_iter_begin(dev, &conn_iter);
> > > > > > > > > + drm_for_each_connector_iter(connector, &conn_iter) {
> > > > > > > > > + if (connector->connector_type !=
> > > > > > > > > DRM_MODE_CONNECTOR_DisplayPort)
> > > > > > > > > + continue;
> > > > > > > > > +
> > > > > > > > > + aconnector =
> > > > > > > > > +to_amdgpu_dm_connector(connector);
> > > > > > > > > +
> > > > > > > > > + /* Ensure we're only dumping the topology of a
> > > > > > > > > +root mst node
> > > > > > > > > */
> > > > > > > > > + if (!aconnector->mst_mgr.max_payloads)
> > > > > > > > > + continue;
> > > > > > > > > +
> > > > > > > > > + seq_printf(m, "\nMST topology for connector
> > > > > > > > > +%d\n",
> > > > > > > > > aconnector->connector_id);
> > > > > > > > > + drm_dp_mst_dump_topology_refs(m,
> > > > > > > > > +&aconnector->mst_mgr);
> > > > > > > > > + }
> > > > > > > > > + drm_connector_list_iter_end(&conn_iter);
> > > > > > > > > +
> > > > > > > > > + return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +DEFINE_SHOW_ATTRIBUTE(mst_topology_ref_dump);
> > > > > > > > > +#endif
> > > > > > > > > +
> > > > > > > > > /*
> > > > > > > > > * Sets trigger hpd for MST topologies.
> > > > > > > > > * All connected connectors will be rediscovered and re
> > > > > > > > > started as needed if val of 1 is sent.
> > > > > > > > > @@ -3299,6 +3330,10 @@ void dtn_debugfs_init(struct
> > > > > > > > > amdgpu_device
> > > > > > > > > *adev)
> > > > > > > > >
> > > > > > > > > debugfs_create_file("amdgpu_mst_topology", 0444, root,
> > > > > > > > > adev, &mst_topo_fops);
> > > > > > > > > +#ifdef CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS
> > > > > > > > > + debugfs_create_file("amdgpu_dp_mst_topology_refs",
> > > > > > > > > +0444, root, adev,
> > > > > > > > > + &mst_topology_ref_dump_fops);
> > > > > > > > > +#endif
> > > > > > > > > debugfs_create_file("amdgpu_dm_dtn_log", 0644, root,
> > > > > > > > > adev,
> > > > > > > > > &dtn_log_fops);
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > > index 1aa8702383d4..0159828c494d 100644
> > > > > > > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > > @@ -1366,23 +1366,6 @@ static int
> > > > > > > > > drm_dp_mst_wait_tx_reply(struct drm_dp_mst_branch *mstb,
> > > > > > > > > return ret;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > -static struct drm_dp_mst_branch
> > > > > > > > > *drm_dp_add_mst_branch_device(u8 lct, u8
> > > > > > > > > *rad) -{
> > > > > > > > > - struct drm_dp_mst_branch *mstb;
> > > > > > > > > -
> > > > > > > > > - mstb = kzalloc(sizeof(*mstb), GFP_KERNEL);
> > > > > > > > > - if (!mstb)
> > > > > > > > > - return NULL;
> > > > > > > > > -
> > > > > > > > > - mstb->lct = lct;
> > > > > > > > > - if (lct > 1)
> > > > > > > > > - memcpy(mstb->rad, rad, lct / 2);
> > > > > > > > > - INIT_LIST_HEAD(&mstb->ports);
> > > > > > > > > - kref_init(&mstb->topology_kref);
> > > > > > > > > - kref_init(&mstb->malloc_kref);
> > > > > > > > > - return mstb;
> > > > > > > > > -}
> > > > > > > > > -
> > > > > > > > > static void drm_dp_free_mst_branch_device(struct kref
> > > > > > > > > *kref) {
> > > > > > > > > struct drm_dp_mst_branch *mstb = @@ -1642,12 +1625,20
> > > > > > > > > @@ topology_ref_type_to_str(enum
> > > > > > > > > drm_dp_mst_topology_ref_type type)
> > > > > > > > > return "put";
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +static const char *topology_ref_history_type_to_str(enum
> > > > > > > > > +drm_dp_mst_topology_history_type type) {
> > > > > > > > > + if (type == DRM_DP_MST_TOPOLOGY_HISTORY_PORT)
> > > > > > > > > + return "Port";
> > > > > > > > > + else
> > > > > > > > > + return "MSTB"; }
> > > > > > > > > +
> > > > > > > > > static void
> > > > > > > > > -__dump_topology_ref_history(struct
> > > > > > > > > drm_dp_mst_topology_ref_history *history,
> > > > > > > > > - void *ptr, const char *type_str)
> > > > > > > > > +dump_topology_ref_history(struct
> > > > > > > > > +drm_dp_mst_topology_ref_history *history, struct
> > > > > > > > > +drm_printer p)
> > > > > > > > > {
> > > > > > > > > - struct drm_printer p = drm_debug_printer(DBG_PREFIX);
> > > > > > > > > + char *connector_name = NULL;
> > > > > > > > > char *buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
> > > > > > > > > + void *ptr;
> > > > > > > > > int i;
> > > > > > > > >
> > > > > > > > > if (!buf)
> > > > > > > > > @@ -1656,14 +1647,29 @@ __dump_topology_ref_history(struct
> > > > > > > > > drm_dp_mst_topology_ref_history *history,
> > > > > > > > > if (!history->len)
> > > > > > > > > goto out;
> > > > > > > > >
> > > > > > > > > + /* Get a pointer to the actual MSTB/port so we can the
> > > > > > > > > +memory
> > > > > > > > > address to the kernel log */
> > > > > > > > > + if (history->type == DRM_DP_MST_TOPOLOGY_HISTORY_PORT)
> > > > > > > > > + ptr = container_of(history, struct
> > > > > > > > > +drm_dp_mst_port,
> > > > > > > > > topology_ref_history);
> > > > > > > > > + else
> > > > > > > > > + ptr = container_of(history, struct
> > > > > > > > > +drm_dp_mst_branch, topology_ref_history);
> > > > > > > > > +
> > > > > > > > > /* First, sort the list so that it goes from oldest to
> > > > > > > > > newest
> > > > > > > > > * reference entry
> > > > > > > > > */
> > > > > > > > > sort(history->entries, history->len,
> > > > > > > > > sizeof(*history->entries),
> > > > > > > > > topology_ref_history_cmp, NULL);
> > > > > > > > >
> > > > > > > > > - drm_printf(&p, "%s (%p) topology count reached 0,
> > > > > > > > > dumping history:\n",
> > > > > > > > > - type_str, ptr);
> > > > > > > > > + if (history->type == DRM_DP_MST_TOPOLOGY_HISTORY_PORT)
> > > > > > > > > +{
> > > > > > > > > + struct drm_dp_mst_port *port = ptr;
> > > > > > > > > +
> > > > > > > > > + if (port->connector)
> > > > > > > > > + connector_name =
> > > > > > > > > +port->connector->name;
> > > > > > > > > + }
> > > > > > > > > + if (connector_name)
> > > > > > > > > + drm_printf(&p, "Port %s (%p) topology ref
> > > > > > > > > +history:\n",
> > > > > > > > > connector_name, ptr);
> > > > > > > > > + else
> > > > > > > > > + drm_printf(&p, "%s (%p) topology ref
> > > > > > > > > +history:\n",
> > > > > > > > > +
> > > > > > > > > +topology_ref_history_type_to_str(history->type),
> > > > > > > > > ptr);
> > > > > > > > >
> > > > > > > > > for (i = 0; i < history->len; i++) {
> > > > > > > > > const struct drm_dp_mst_topology_ref_entry
> > > > > > > > > *entry = @@
> > > > > > > > > -
> > > > > > > > > 1682,24 +1688,92 @@ __dump_topology_ref_history(struct
> > > > > > > > > drm_dp_mst_topology_ref_history *history,
> > > > > > > > > ts_nsec, rem_nsec / 1000, buf);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > - /* Now free the history, since this is the only time
> > > > > > > > > we expose it */
> > > > > > > > > - kfree(history->entries);
> > > > > > > > > out:
> > > > > > > > > kfree(buf);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +/**
> > > > > > > > > + * drm_dp_mst_dump_topology_refs - helper function for
> > > > > > > > > +dumping the topology ref history
> > > > > > > > > + * @m: File to print to
> > > > > > > > > + * @mgr: &struct drm_dp_mst_topology_mgr to use
> > > > > > > > > + *
> > > > > > > > > + * Prints the topology ref history of all ports and MSTBs
> > > > > > > > > +on @mgr that are still in memory,
> > > > > > > > > + * regardless of whether they're actually still reachable
> > > > > > > > > +through the topology or not. Only enabled
> > > > > > > > > + * when %CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS is enabled.
> > > > > > > > > +Can be implemented by drivers to assist
> > > > > > > > > + * with debugging leaks in the DP MST helpers.
> > > > > > > > > + */
> > > > > > > > > +void drm_dp_mst_dump_topology_refs(struct seq_file *m,
> > > > > > > > > +struct drm_dp_mst_topology_mgr *mgr) {
> > > > > > > > > + struct drm_dp_mst_topology_ref_history *history;
> > > > > > > > > + struct drm_printer p = drm_seq_file_printer(m);
> > > > > > > > > +
> > > > > > > > > + mutex_lock(&mgr->topology_ref_history_lock);
> > > > > > > > > + list_for_each_entry(history,
> > > > > > > > > +&mgr->topology_ref_history_list,
> > > > > > > > > +node)
> > > > > > > > > + dump_topology_ref_history(history, p);
> > > > > > > > > + mutex_unlock(&mgr->topology_ref_history_lock);
> > > > > > > > > +}
> > > > > > > > > +EXPORT_SYMBOL(drm_dp_mst_dump_topology_refs);
> > > > > > > > > +
> > > > > > > > > +static void
> > > > > > > > > +__init_topology_ref_history(struct
> > > > > > > > > +drm_dp_mst_topology_ref_history
> > > > > > > > > *history,
> > > > > > > > > + struct drm_dp_mst_topology_mgr
> > > > > > > > > +*mgr,
> > > > > > > > > + enum
> > > > > > > > > +drm_dp_mst_topology_history_type
> > > > > > > > > +type) {
> > > > > > > > > + history->type = type;
> > > > > > > > > + INIT_LIST_HEAD(&history->node);
> > > > > > > > > +
> > > > > > > > > + mutex_lock(&mgr->topology_ref_history_lock);
> > > > > > > > > + list_add(&history->node,
> > > > > > > > > +&mgr->topology_ref_history_list);
> > > > > > > > > + mutex_unlock(&mgr->topology_ref_history_lock);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void
> > > > > > > > > +__destroy_topology_ref_history(struct
> > > > > > > > > +drm_dp_mst_topology_ref_history
> > > > > > > > > *history,
> > > > > > > > > + struct drm_dp_mst_topology_mgr
> > > > > > > > > +*mgr) {
> > > > > > > > > + mutex_lock(&mgr->topology_ref_history_lock);
> > > > > > > > > + list_del(&mgr->topology_ref_history_list);
> > > > > > > > > + mutex_unlock(&mgr->topology_ref_history_lock);
> > > > > > > > > +
> > > > > > > > > + kfree(history->entries); }
> > > > > > > > > +
> > > > > > > > > +static __always_inline void
> > > > > > > > > +init_port_topology_history(struct drm_dp_mst_topology_mgr
> > > > > > > > > +*mgr, struct drm_dp_mst_port *port) {
> > > > > > > > > +
> > > > > > > > > +__init_topology_ref_history(&port->topology_ref_history,
> > > > > > > > > +mgr,
> > > > > > > > > +
> > > > > > > > > +DRM_DP_MST_TOPOLOGY_HISTORY_PORT);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static __always_inline void
> > > > > > > > > +init_mstb_topology_history(struct drm_dp_mst_topology_mgr
> > > > > > > > > +*mgr, struct drm_dp_mst_branch *mstb) {
> > > > > > > > > +
> > > > > > > > > +__init_topology_ref_history(&mstb->topology_ref_history,
> > > > > > > > > +mgr,
> > > > > > > > > +
> > > > > > > > > +DRM_DP_MST_TOPOLOGY_HISTORY_MSTB);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static __always_inline void
> > > > > > > > > +destroy_port_topology_history(struct drm_dp_mst_port *port)
> > > > > > > > > +{
> > > > > > > > > +
> > > > > > > > > +__destroy_topology_ref_history(&port->topology_ref_history,
> > > > > > > > > +port->mgr); }
> > > > > > > > > +
> > > > > > > > > +static __always_inline void
> > > > > > > > > +destroy_mstb_topology_history(struct drm_dp_mst_branch
> > > > > > > > > +*mstb) {
> > > > > > > > > +
> > > > > > > > > +__destroy_topology_ref_history(&mstb->topology_ref_history,
> > > > > > > > > +mstb->mgr); }
> > > > > > > > > +
> > > > > > > > > static __always_inline void
> > > > > > > > > dump_mstb_topology_history(struct drm_dp_mst_branch *mstb)
> > > > > > > > > {
> > > > > > > > > -
> > > > > > > > > __dump_topology_ref_history(&mstb->topology_ref_history,
> > > > > > > > > mstb,
> > > > > > > > > - "MSTB");
> > > > > > > > > + dump_topology_ref_history(&mstb->topology_ref_history,
> > > > > > > > > +drm_debug_printer(DBG_PREFIX));
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > static __always_inline void
> > > > > > > > > dump_port_topology_history(struct drm_dp_mst_port *port) {
> > > > > > > > > -
> > > > > > > > > __dump_topology_ref_history(&port->topology_ref_history,
> > > > > > > > > port,
> > > > > > > > > - "Port");
> > > > > > > > > + dump_topology_ref_history(&port->topology_ref_history,
> > > > > > > > > +drm_debug_printer(DBG_PREFIX));
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > static __always_inline void @@ -1729,6 +1803,14 @@
> > > > > > > > > topology_ref_history_unlock(struct
> > > > > > > > > drm_dp_mst_topology_mgr *mgr) } #else static inline void
> > > > > > > > > +init_port_topology_history(struct drm_dp_mst_topology_mgr
> > > > > > > > > +*mgr, struct drm_dp_mst_port *port); static inline void
> > > > > > > > Should also add the bracket, otherwise will get warnings.
> > > > > > > > => static inline void init_port_topology_history(struct
> > > > > > > > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
> > > > > > > > {};
> > > > > > > >
> > > > > > > > > +init_mstb_topology_history(struct drm_dp_mst_topology_mgr
> > > > > > > > > +*mgr, struct drm_dp_mst_branch *mstb); static inline void
> > > > > > > > Same as above
> > > > > > > > => static inline void init_mstb_topology_history(struct
> > > > > > > > drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_branch *mstb)
> > > > > > > > {};
> > > > > > > >
> > > > > > > > > +destroy_port_topology_history(struct
> > > > > > > > > +drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port
> > > > > > > > > +*port); static inline void
> > > > > > > > destroy_port_topology_history() takes one parameter only which
> > > > > > > > is port.
> > > > > > > > => destroy_port_topology_history(struct drm_dp_mst_port *port)
> > > > > > > > {};
> > > > > > > >
> > > > > > > > > +destroy_mstb_topology_history(struct
> > > > > > > > > +drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_branch
> > > > > > > > > +*mstb); static inline void
> > > > > > > > destroy_mstb_topology_history() takes one parameter only which
> > > > > > > > is mstb => destroy_mstb_topology_history(struct
> > > > > > > > drm_dp_mst_branch
> > > > > > > > *mstb) {};
> > > > > > > >
> > > > > > > > > topology_ref_history_lock(struct drm_dp_mst_topology_mgr
> > > > > > > > > *mgr) {} static inline void
> > > > > > > > > topology_ref_history_unlock(struct
> > > > > > > > > drm_dp_mst_topology_mgr *mgr) {} @@ -1740,6 +1822,25 @@
> > > > > > > > > dump_port_topology_history(struct drm_dp_mst_port *port) {}
> > > > > > > > > #define save_port_topology_ref(port, type) #endif
> > > > > > > > >
> > > > > > > > > +static struct drm_dp_mst_branch *
> > > > > > > > > +drm_dp_add_mst_branch_device(struct drm_dp_mst_topology_mgr
> > > > > > > > > +*mgr,
> > > > > > > > > +u8 lct, u8 *rad) {
> > > > > > > > > + struct drm_dp_mst_branch *mstb;
> > > > > > > > > +
> > > > > > > > > + mstb = kzalloc(sizeof(*mstb), GFP_KERNEL);
> > > > > > > > > + if (!mstb)
> > > > > > > > > + return NULL;
> > > > > > > > > +
> > > > > > > > > + mstb->lct = lct;
> > > > > > > > > + if (lct > 1)
> > > > > > > > > + memcpy(mstb->rad, rad, lct / 2);
> > > > > > > > > + INIT_LIST_HEAD(&mstb->ports);
> > > > > > > > > + kref_init(&mstb->topology_kref);
> > > > > > > > > + kref_init(&mstb->malloc_kref);
> > > > > > > > > + init_mstb_topology_history(mgr, mstb);
> > > > > > > > > + return mstb;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > static void drm_dp_destroy_mst_branch_device(struct kref
> > > > > > > > > *kref) {
> > > > > > > > > struct drm_dp_mst_branch *mstb = @@ -1747,6 +1848,7 @@
> > > > > > > > > static void drm_dp_destroy_mst_branch_device(struct
> > > > > > > > > kref *kref)
> > > > > > > > > struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
> > > > > > > > >
> > > > > > > > > dump_mstb_topology_history(mstb);
> > > > > > > > > + destroy_mstb_topology_history(mstb);
> > > > > > > > >
> > > > > > > > > INIT_LIST_HEAD(&mstb->destroy_next);
> > > > > > > > >
> > > > > > > > > @@ -1856,6 +1958,7 @@ static void drm_dp_destroy_port(struct
> > > > > > > > > kref
> > > > > > > > > *kref)
> > > > > > > > > struct drm_dp_mst_topology_mgr *mgr = port->mgr;
> > > > > > > > >
> > > > > > > > > dump_port_topology_history(port);
> > > > > > > > > + destroy_port_topology_history(port);
> > > > > > > > >
> > > > > > > > > /* There's nothing that needs locking to destroy an
> > > > > > > > > input port yet */
> > > > > > > > > if (port->input) {
> > > > > > > > > @@ -2135,7 +2238,7 @@ drm_dp_port_set_pdt(struct
> > > > > > > > > drm_dp_mst_port *port, u8 new_pdt,
> > > > > > > > > ret =
> > > > > > > > > drm_dp_mst_register_i2c_bus(port);
> > > > > > > > > } else {
> > > > > > > > > lct = drm_dp_calculate_rad(port, rad);
> > > > > > > > > - mstb =
> > > > > > > > > drm_dp_add_mst_branch_device(lct, rad);
> > > > > > > > > + mstb =
> > > > > > > > > +drm_dp_add_mst_branch_device(mgr, lct, rad);
> > > > > > > > > if (!mstb) {
> > > > > > > > > ret = -ENOMEM;
> > > > > > > > > drm_err(mgr->dev, "Failed to
> > > > > > > > > create MSTB for port %p", port); @@ -2366,6 +2469,8 @@
> > > > > > > > > drm_dp_mst_add_port(struct drm_device *dev,
> > > > > > > > > */
> > > > > > > > > drm_dp_mst_get_mstb_malloc(mstb);
> > > > > > > > >
> > > > > > > > > + init_port_topology_history(mgr, port);
> > > > > > > > > +
> > > > > > > > > return port;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > @@ -3745,7 +3850,7 @@ int
> > > > > > > > > drm_dp_mst_topology_mgr_set_mst(struct
> > > > > > > > > drm_dp_mst_topology_mgr *mgr, bool ms
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > /* add initial branch device at LCT 1 */
> > > > > > > > > - mstb = drm_dp_add_mst_branch_device(1, NULL);
> > > > > > > > > + mstb = drm_dp_add_mst_branch_device(mgr, 1,
> > > > > > > > > +NULL);
> > > > > > > > > if (mstb == NULL) {
> > > > > > > > > ret = -ENOMEM;
> > > > > > > > > goto out_unlock; @@ -5512,14 +5617,16
> > > > > > > > > @@ int drm_dp_mst_topology_mgr_init(struct
> > > > > > > > > drm_dp_mst_topology_mgr *mgr,
> > > > > > > > > mutex_init(&mgr->delayed_destroy_lock);
> > > > > > > > > mutex_init(&mgr->up_req_lock);
> > > > > > > > > mutex_init(&mgr->probe_lock); -#if
> > > > > > > > > IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
> > > > > > > > > - mutex_init(&mgr->topology_ref_history_lock);
> > > > > > > > > -#endif
> > > > > > > > > INIT_LIST_HEAD(&mgr->tx_msg_downq);
> > > > > > > > > INIT_LIST_HEAD(&mgr->destroy_port_list);
> > > > > > > > > INIT_LIST_HEAD(&mgr->destroy_branch_device_list);
> > > > > > > > > INIT_LIST_HEAD(&mgr->up_req_list);
> > > > > > > > >
> > > > > > > > > +#if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
> > > > > > > > > + mutex_init(&mgr->topology_ref_history_lock);
> > > > > > > > > + INIT_LIST_HEAD(&mgr->topology_ref_history_list);
> > > > > > > > > +#endif
> > > > > > > > > +
> > > > > > > > > /*
> > > > > > > > > * delayed_destroy_work will be queued on a dedicated
> > > > > > > > > WQ, so that any
> > > > > > > > > * requeuing will be also flushed when deiniting the
> > > > > > > > > topology manager.
> > > > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
> > > > > > > > > b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
> > > > > > > > > index 1cbe01048b93..53ec7c1daada 100644
> > > > > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
> > > > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
> > > > > > > > > @@ -29,9 +29,13 @@
> > > > > > > > > */
> > > > > > > > >
> > > > > > > > > #include <linux/debugfs.h>
> > > > > > > > > +#include <drm/drm_dp_mst_helper.h> #include
> > > > > > > > > +<drm/drm_encoder.h>
> > > > > > > > > #include <nvif/class.h>
> > > > > > > > > #include <nvif/if0001.h>
> > > > > > > > > +#include <nouveau_encoder.h>
> > > > > > > > > #include "nouveau_debugfs.h"
> > > > > > > > > +#include "nouveau_display.h"
> > > > > > > > > #include "nouveau_drv.h"
> > > > > > > > >
> > > > > > > > > static int
> > > > > > > > > @@ -68,6 +72,34 @@ nouveau_debugfs_strap_peek(struct
> > > > > > > > > seq_file *m, void
> > > > > > > > > *data)
> > > > > > > > > return 0;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +#ifdef CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS
> > > > > > > > > +static int nouveau_debugfs_mst_topology_history(struct
> > > > > > > > > +seq_file *m, void *data) {
> > > > > > > > > + struct drm_info_node *node = m->private;
> > > > > > > > > + struct drm_device *dev = node->minor->dev;
> > > > > > > > > + struct drm_encoder *encoder;
> > > > > > > > > + struct nouveau_display *disp = nouveau_display(dev);
> > > > > > > > > +
> > > > > > > > > + if (disp->disp.object.oclass < GF110_DISP)
> > > > > > > > > + return -EINVAL;
> > > > > > > > > +
> > > > > > > > > + drm_for_each_encoder(encoder, dev) {
> > > > > > > > > + struct nv50_mstm *mstm;
> > > > > > > > > +
> > > > > > > > > + /* We need the real encoder for each MST mgr
> > > > > > > > > +*/
> > > > > > > > > + if (encoder->encoder_type !=
> > > > > > > > > +DRM_MODE_ENCODER_TMDS)
> > > > > > > > > + continue;
> > > > > > > > > + mstm = nouveau_encoder(encoder)->dp.mstm;
> > > > > > > > > + if (!mstm)
> > > > > > > > > + continue;
> > > > > > > > > +
> > > > > > > > > + seq_printf(m, "MSTM on %s:\n", encoder->name);
> > > > > > > > > + drm_dp_mst_dump_topology_refs(m, &mstm->mgr);
> > > > > > > > > + }
> > > > > > > > > + return 0;
> > > > > > > > > +}
> > > > > > > > > +#endif
> > > > > > > > > +
> > > > > > > > > static int
> > > > > > > > > nouveau_debugfs_pstate_get(struct seq_file *m, void *data)
> > > > > > > > > { @@
> > > > > > > > > -213,6
> > > > > > > > > +245,9 @@ static const struct file_operations
> > > > > > > > > nouveau_pstate_fops = { static struct drm_info_list
> > > > > > > > > nouveau_debugfs_list[] = {
> > > > > > > > > { "vbios.rom", nouveau_debugfs_vbios_image, 0, NULL
> > > > > > > > > },
> > > > > > > > > { "strap_peek", nouveau_debugfs_strap_peek, 0, NULL },
> > > > > > > > > +#ifdef CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS
> > > > > > > > > + { "dp_mst_topology_refs",
> > > > > > > > > +nouveau_debugfs_mst_topology_history, 0, NULL }, #endif
> > > > > > > > > };
> > > > > > > > > #define NOUVEAU_DEBUGFS_ENTRIES
> > > > > > > > > ARRAY_SIZE(nouveau_debugfs_list)
> > > > > > > > >
> > > > > > > > > diff --git a/include/drm/drm_dp_mst_helper.h
> > > > > > > > > b/include/drm/drm_dp_mst_helper.h index
> > > > > > > > > 78044ac5b59b..e3a73d8d97c0
> > > > > > > > > 100644
> > > > > > > > > --- a/include/drm/drm_dp_mst_helper.h
> > > > > > > > > +++ b/include/drm/drm_dp_mst_helper.h
> > > > > > > > > @@ -35,6 +35,11 @@ enum drm_dp_mst_topology_ref_type {
> > > > > > > > > DRM_DP_MST_TOPOLOGY_REF_PUT,
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > +enum drm_dp_mst_topology_history_type {
> > > > > > > > > + DRM_DP_MST_TOPOLOGY_HISTORY_PORT,
> > > > > > > > > + DRM_DP_MST_TOPOLOGY_HISTORY_MSTB, };
> > > > > > > > > +
> > > > > > > > > struct drm_dp_mst_topology_ref_history {
> > > > > > > > > struct drm_dp_mst_topology_ref_entry {
> > > > > > > > > enum drm_dp_mst_topology_ref_type type; @@
> > > > > > > > > -43,6
> > > > > > > > > +48,9 @@ struct drm_dp_mst_topology_ref_history {
> > > > > > > > > depot_stack_handle_t backtrace;
> > > > > > > > > } *entries;
> > > > > > > > > int len;
> > > > > > > > > +
> > > > > > > > > + enum drm_dp_mst_topology_history_type type;
> > > > > > > > > + struct list_head node;
> > > > > > > > > };
> > > > > > > > > #endif /* IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS)
> > > > > > > > > */
> > > > > > > > >
> > > > > > > > > @@ -769,6 +777,12 @@ struct drm_dp_mst_topology_mgr {
> > > > > > > > > * &drm_dp_mst_branch.topology_ref_history.
> > > > > > > > > */
> > > > > > > > > struct mutex topology_ref_history_lock;
> > > > > > > > > +
> > > > > > > > > + /**
> > > > > > > > > + * @topology_ref_history_list: contains a list of
> > > > > > > > > +topology ref
> > > > > > > > > histories for any MSTBs or
> > > > > > > > > + * ports that are still in memory
> > > > > > > > > + */
> > > > > > > > > + struct list_head topology_ref_history_list;
> > > > > > > > > #endif
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > @@ -873,6 +887,10 @@ void drm_dp_mst_put_port_malloc(struct
> > > > > > > > > drm_dp_mst_port *port);
> > > > > > > > >
> > > > > > > > > struct drm_dp_aux *drm_dp_mst_dsc_aux_for_port(struct
> > > > > > > > > drm_dp_mst_port *port);
> > > > > > > > >
> > > > > > > > > +#ifdef CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS
> > > > > > > > > +void drm_dp_mst_dump_topology_refs(struct seq_file *m,
> > > > > > > > > +struct drm_dp_mst_topology_mgr *mgr); #endif
> > > > > > > > > +
> > > > > > > > > extern const struct drm_private_state_funcs
> > > > > > > > > drm_dp_mst_topology_state_funcs;
> > > > > > > > >
> > > > > > > > > /**
> > > > > > > > > --
> > > > > > > > > 2.31.1
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Cheers,
> > > > > > > Lyude Paul (she/her)
> > > > > > > Software Engineer at Red Hat
> > > > >
> > > >
> > >
> > > --
> > > Cheers, Lyude Paul (she/her) Software Engineer at Red Hat
> > --
> > Regards,
> > Wayne Lin
> >
>
> --
> Cheers,
> Lyude Paul (she/her)
> Software Engineer at Red Hat
--
Regards,
Wayne Lin
More information about the dri-devel
mailing list