[PATCH] drm/msm/dp: move link_ready out of HPD event thread

Abhinav Kumar quic_abhinavk at quicinc.com
Mon Mar 18 18:01:25 UTC 2024



On 3/15/2024 8:57 AM, Johan Hovold wrote:
> On Thu, Mar 14, 2024 at 09:30:57AM -0700, Abhinav Kumar wrote:
>> On 3/14/2024 8:38 AM, Johan Hovold wrote:
>>> On Wed, Mar 13, 2024 at 10:24:08AM -0700, Abhinav Kumar wrote:
> 
>>> Perhaps I'm missing something in the race that you are trying to
>>> describe (and which I've asked you to describe in more detail so that I
>>> don't have to spend more time trying to come up with a reproducer
>>> myself).
> 
>> The race condition is between the time we get disconnect event and set
>> link_ready to false, a commit can come in. Because setting link_ready to
>> false happens in the event thread so it could be slightly delayed.
> 
> I get this part, just not why, or rather when, that becomes a problem.
> 
> Once the disconnect event is processed, dp_hpd_unplug_handle() will
> update the state to ST_DISCONNECT_PENDING, and queue a notification
> event. link_ready is (before this patch) still set to 1.
> 

This is the case I am thinking of:

1) Disconnect event happens which will call dp_hpd_unplug_handle() but 
link_ready is not false yet.

2) There is a commit with a modeset, which shall trigger 
atomic_disable() followed by an atomic_enable()

atomic_disable() will go through disable clocks and set hpd_state to 
ST_DISCONNECTED.

3) atomic_enable() will not go through because we will bail out because 
state was ST_DISCONNECTED.

         if (state != ST_DISPLAY_OFF && state != ST_MAINLINK_READY) {
                 mutex_unlock(&dp_display->event_mutex);
                 return;
         }

4) Now, if there is another commit with a modeset, it will go and crash 
at atomic_disable()

> Here a commit comes in; what exactly are you suggesting would trigger
> that? And in such a way that it breaks the state machine?
> 

Like we have seen, the commit can either come directly from userspace as 
one last frame (the original bug I had given the link to) or from the 
__drm_fb_helper_restore_fbdev_mode_unlocked() which happened in 
sc8280xp's case. This is totally independent of the hpd_thread() with no 
mutual exclusion.

This commit() can come before the link_ready was set to false. If it had 
come after link_ready was set to false, atomic_check() would have failed 
and no issue would have been seen.

My change is making the link_ready false sooner in the disconnect case.

> One way this could cause trouble is if you end up with a call to
> dp_bridge_atomic_post_disable() which updates the state to
> ST_DISCONNECTED. (1)
> 
> This would then need to be followed by another call to
> dp_bridge_atomic_enable() which bails out early with the link clock
> disabled. (2) (And if link_ready were to be set to 0 sooner, the
> likelihood of this is reduced.)
> 
> This in turn, would trigger a reset when dp_bridge_atomic_disable() is
> later called.
> 

Yes, this is exactly what I have written above.

> This is the kind of description of the race I expect to see in the
> commit message, and I'm still not sure what would trigger the call to
> dp_bridge_atomic_post_disable() and dp_bridge_atomic_enable() (i.e. (1)
> and (2) above) and whether this is a real issue or not.
> 

I have explained what triggers the disable/enable call below.

> Also note that the above scenario is quite different from the one I've
> hit and described earlier.
> 

Why is that so? Eventually it will also translate to the same scenario. 
I would like to understand why this is different. I think in your case, 
probably we do not know what triggers the modeset, but its a minor 
detail like I have written before.

>> It will be hard to reproduce this. Only way I can think of is to delay
>> the EV_NOTIFICATION for sometime and see in dp_bridge_hpd_notify()
>>
>>           else if (dp_display->link_ready && status ==
>> connector_status_disconnected)
>>                   dp_add_event(dp, EV_HPD_UNPLUG_INT, 0, 0);
>>
>> as dp_add_event() will add the event, then wakeup the event_q.
> 
> Sure that would increase the race window with the current code, but that
> alone isn't enough to trigger the bug AFAICT.
> 
>> Before the event thread wakes up and processes this unplug event, the
>> commit can come in. This is the race condition i was thinking of.
> 
> Yes, but what triggers the commit? And why would it lead to a mode set
> that disables the bridge?
> 

Commit was triggered from the userspace as it did not process the 
disconnect event on time and the userspace was triggering a couple of 
modesets by by changing the mode on the CRTC from 1080P to NONE to 1080P.

[drm:drm_atomic_helper_check_modeset] [CRTC:60:crtc-1] mode changed

> Johan


More information about the dri-devel mailing list