[RFC PATCH xserver] modesetting: re-set the crtc's mode when link-status goes BAD

Wed Feb 8 16:37:56 UTC 2017

On 06/02/17 17:50, Martin Peres wrote:
> On 03/02/17 10:04, Daniel Vetter wrote:
>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
>>> On 01/02/17 22:05, Manasi Navare wrote:
>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
>>>>> Jani Nikula <jani.nikula at linux.intel.com> writes:
>>>>>
>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric at anholt.net> wrote:
>>>>>>> Martin Peres <martin.peres at linux.intel.com> writes:
>>>>>>>
>>>>>>>> Despite all the careful planing of the kernel, a link may become
>>>>>>>> insufficient to handle the currently-set mode. At this point, the
>>>>>>>> kernel should mark this particular configuration as being broken
>>>>>>>> and potentially prune the mode before setting the offending
>>>>>>>> connector's
>>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
>>>>>>>> happen right after a modeset or later on.
>>>>>>>>
>>>>>>>> When available, we should use the link-status information to reset
>>>>>>>> the wanted mode.
>>>>>>>>
>>>>>>>> Signed-off-by: Martin Peres <martin.peres at linux.intel.com>
>>>>>>> If I understand this right, there are two failure modes being
>>>>>>> handled:
>>>>>>>
>>>>>>> 1) A mode that won't actually work because the link isn't good
>>>>>>> enough.
>>>>>>>
>>>>>>> 2) A mode that should work, but link parameters were too
>>>>>>> optimistic and
>>>>>>> if we just ask the kernel to set the mode again it'll use more
>>>>>>> conservative parameters that work.
>>>>>>>
>>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
>>>>>>> going to set our old mode back.  Won't the modeset then fail to link
>>>>>>> train again, and bring us back into this loop?  The only escape
>>>>>>> that I
>>>>>>> see would be some other userspace responding to the advertised
>>>>>>> mode list
>>>>>>> changing, and then asking X to modeset to something new.
>>>>>>>
>>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
>>>>>>> point, and only re-setting if our mode still exists?
>>>>>> Disclaimer: I don't know anything about the internals of the
>>>>>> modesetting
>>>>>> driver.
>>>>>>
>>>>>> Perhaps we can identify the two cases now, but I'd put this more
>>>>>> generally: if the link status has gone bad, it's an indicator to
>>>>>> userspace that the circumstances may have changed, and userspace
>>>>>> should
>>>>>> query the kernel for the list of available modes again. It should no
>>>>>> longer trust information obtained prior to getting the bad link
>>>>>> status,
>>>>>> including the current mode.
>>>>>>
>>>>>> But specifically, I think you're right, and AFAICT asking for the
>>>>>> list
>>>>>> of modes again is the only way for the userspace to distinguish
>>>>>> between
>>>>>> the two cases. I don't think there's a shortcut for deciding the
>>>>>> current
>>>>>> mode is still valid.
>>>>> To avoid the busy-loop problem, I think I'd like this patch to
>>>>> re-query
>>>>> the kernel to ask about the current mode list, and only try to re-set
>>>>> the mode if our mode is still there.
>>>>>
>>>>> If the mode isn't there, then it's up to the DE to take action in
>>>>> response to the notification of new modes.  If you don't have a DE to
>>>>> take appropriate action, you're kind of out of luck.
>>>>>
>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
>>>>> about ABI was having to walk all the connectors on every uevent to see
>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
>>>>> the
>>>>> kernel could go ahead with the current plan.
>>>> Yes I agree. The kernel sets the link status BAD as soona s link
>>>> training fails
>>>> but does not prune the modes until a new modelist is requested by
>>>> the userspace.
>>>> So this patch should re query the mode list as soon as it sees the
>>>> link status
>>>> BAD in order for the kernel to validate the modes again based on new
>>>> link
>>>> paarmeters and send a new mode list back.
>>> Seems like a bad behaviour from the kernel, isn't it? It should return
>>> immediatly
>>> if the mode is gonna be pruned :s
>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
>> validate requested modes against that. The mode list is purely for
>> userspace's information. Which means updating it only when userspace asks
>> for it is fine.
>
> Hmm, ok. Fair enough!
>
>> I also thought some more about the loop when we're stuck on BAD, and I
>> think it shouldn't happen: When the link goes BAD we update the link
>> parameter values to the new limits, and the kernel will reject any mode
>> that won't fit anymore. Of course you might be unlucky, and the new link
>> limits are also not working, but eventually you're stuck at the "you're
>> link is broken, sry" stage, where the kernel rejects everything :-)
>>
>> So I think the busy-loop has strictly a limited amount of time until it
>> runs out of steam.
>
> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
> on IRC.
>
> My current proposal is based on the idea that the kernel should reject a
> mode
> it knows it cannot set. This is already largely tested in real life: Try
> setting 4K at 60Hz on an HDMI 1.x monitor, it should immediately fail on
> prepare(). For this proposal to work, we would need to put in the ABI
> that a
> driver that sets the link-status to BAD should also make sure that whatever
> the userspace does, no infinite loop should be created (by changing the
> maximum link characteristics before setting the link-status property).
>
> Re-probing the list of modes and checking if the mode is still in there is
> inherently racy, as a new screen may be plugged between the moment the list
> is sent to the userspace and the moment when we try setting the mode. Or
> the
> DE may also have changed the mode in the mean time and the kernel would
> have reduced the limits even more. So, the userspace cannot be expected to
> always do the right thing here :s.
>
> From this point of view, I really do not like the idea of re-probing, since
> it increases the delay before the DE can handle the change and it does not
> bring any guarantee of working properly.
>
> Did I miss anything? Any opinions?

Any comments on this, Eric? Does it sound logical to you or did I miss 
something?

The kernel patches are stuck until this patch gets in. So far, you look 
like the only person who would be willing to review this patch (Ajax 
acked the patch, but that's the extent he was willing to go).

Feel free to suggest more potential reviewers if you don't want to 
commit to a R-b.

Cheers,
Martin