[Intel-gfx] [PATCH] [drm][i915] Increase LSPCON timeout

Rodrigo Vivi rodrigo.vivi at intel.com
Thu Aug 16 21:43:22 UTC 2018


On Thu, Aug 16, 2018 at 11:35:26PM +0200, fredrikschon at gmail.com wrote:
> tor 2018-08-16 klockan 11:23 -0700 skrev Rodrigo Vivi:
> > On Thu, Aug 16, 2018 at 10:36:43AM +0200, Fredrik Schön wrote:
> > > Shashank,
> > > 
> > > Den tors 16 aug. 2018 kl 10:15 skrev Sharma, Shashank
> > > <shashank.sharma at intel.com>:
> > > > 
> > > > Hey Chris,
> > > > 
> > > > 
> > > > On 8/16/2018 1:13 PM, Chris Wilson wrote:
> > > > > Quoting Sharma, Shashank (2018-08-16 08:33:36)
> > > > > > Regards
> > > > > > 
> > > > > > Shashank
> > > > > > 
> > > > > > 
> > > > > > On 8/16/2018 12:47 PM, Jani Nikula wrote:
> > > > > > > On Wed, 15 Aug 2018, Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > > > > wrote:
> > > > > > > > On Wed, Aug 15, 2018 at 03:39:40PM -0700, Rodrigo Vivi
> > > > > > > > wrote:
> > > > > > > > > On Mon, Aug 13, 2018 at 07:51:33PM +0200, Fredrik Schön
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > First of all we need to fix the commit subject:
> > > > > > > > > 
> > > > > > > > > drm/i915: Increase LSPCON timeout
> > > > > > > > > 
> > > > > > > > > (this can be done when merging, no need to resend)
> > > > > > > > > 
> > > > > > > > > > 100 ms is not enough time for the LSPCON adapter on
> > > > > > > > > > Intel NUC devices to
> > > > > > > > > > settle. This causes dropped display modes at driver
> > > > > > > > > > initialisation.
> > > > > > > > > > 
> > > > > > > > > > Increase timeout to 1000 ms.
> > > > > > > > > > 
> > > > > > > > > > Fixes: 
> > > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1570392
> > > > > > > > > 
> > > > > > > > > Missusage of "Fixes:" tag, please read
> > > > > > > > > 
> > > > > > > > > 
> https://01.org/linuxgraphics/gfx-docs/maintainer-tools/drm-intel.html#fixes
> > > > > > > > > 
> > > > > > > > > But also no need for resending... could be fixed when
> > > > > > > > > merging
> > > > > > > > > 
> > > > > > > > > The right one would be:
> > > > > > > > > 
> > > > > > > > > Bugzilla: 
> > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1570392
> > > > > > > > > Fixes: 357c0ae9198a ("drm/i915/lspcon: Wait for
> > > > > > > > > expected LSPCON mode to settle")
> > > > > > > > > Cc: Shashank Sharma <shashank.sharma at intel.com>
> > > > > > > > > Cc: Imre Deak <imre.deak at intel.com>
> > > > > > > > > Cc: Jani Nikula <jani.nikula at intel.com>
> > > > > > > > > Cc: <stable at vger.kernel.org> # v4.11+
> > > > > > > > > 
> > > > > > > > > Since initial 100 seemed to be empirical and this
> > > > > > > > > increase seems to
> > > > > > > > > help other cases I'm in favor of this move so
> > > > > > > > > 
> > > > > > > > > Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > > > > > > 
> > > > > > > > > However I will wait a bit before merging it
> > > > > > > > > so Imre, Shashank, and/or Jani can take a look here...
> > > > > > > > 
> > > > > > > > now, really cc'ing them...
> > > > > > > 
> > > > > > > Shashank? Does this slow down non-LSPCON paths?
> > > > > > 
> > > > > > This will slow down the lspcon probing and resume part, but
> > > > > > both of them
> > > > > > happen only when LSPCON device is found. So to answer your
> > > > > > question,
> > > > > > this will not slow down the non-lspcon path, but will slow
> > > > > > down the
> > > > > > LSPCON connector resume and probe time. but I would
> > > > > > recommend, instead
> > > > > > of increasing it to 1000 ms in a single shot, we might want
> > > > > > to gradually
> > > > > > pick this up, on a wake-and-check way.
> > > > > 
> > > > > wait_for() checks every [10us, 1ms] until the condition is met,
> > > > > or it
> > > > > times out. So, so long as we don't enter this path for !LPSCON
> > > > > where we
> > > > > know that it will timeout, the wait_for() will only take as
> > > > > long as is
> > > > > required for the connector to settle.
> > > > 
> > > > We wont hit !LSPCON timeout case here, as we have already read
> > > > the
> > > > dongle signature successfully by now.  But I was thinking that,
> > > > if the
> > > > spec recommends max wait time as 100ms (which is of course
> > > > doesn't seem
> > > > enough), if we can't detect i2c-over-aux after first 500ms, I
> > > > guess we
> > > > wont be able to do that in next 500ms too. So is it really ok to
> > > > wait
> > > > this long in the resume sequence ?
> > > > 
> > > > I guess Fredrik can provide some inputs here on if there are some
> > > > experiments behind this number of 1000ms, or this is just a safe
> > > > bet ?
> > > > > 
> > > 
> > > My first patch attempt - which is attached to the Redhat and FDO
> > > Bugzilla
> > > bugs - added a retry loop around the original 100 ms timeout. The
> > > retry loop
> > > did trigger, but never more than once in a row in my testing.
> > > 
> > > So possibly 200 ms would be a sufficient timeout, but as the
> > > wait_for() loop
> > > terminates early on success I suggested a conservative value of
> > > 1000 ms.
> > 
> > Since Shashank mentioned 100us came from some spec, maybe the double
> > is already
> > a conservative value.
> > 
> > Since there is the concerns of delaying something when LSPCON fails
> > and we are possibly looping on connectors somewhere/somehow I believe
> > we need
> > to have a balanced approach here.
> > 
> > could you please try the 200 ms approach on your case there for a
> > while and
> > see how it goes?
> > 
> 
> I ran a few stress tests using Nicholas test case from [1]. I can
> quickly reproduce the failure with timeouts 100 ms, 110 ms, 130 ms, 150
> ms and 170 ms. I am unable to reproduce any failures with timeouts 190
> ms (n=18) and 200 ms (n=20+16).
> 
> So while 200 ms appears to work on my hardware with reasonable
> confidence, I wouldn't call 200 conservative. But then again, I do not
> know the specifications. I'm just being empirical.

I don't know this specification either and if that exists the empirical
shows that it is wrong or we have another bug somewhere else.

So... let's call 400 safe enough for now then?!

> 
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=107503#c15
> 
> > > 
> > > > > Can we do other connectors at the same time, or does probing
> > > > > LSPCON
> > > > > block the system?
> > > > 
> > > > We can do other connectors at the same time in DRM layer at-
> > > > least,
> > > > LSPCON blocks only this connector. I was curious if are we doing
> > > > this
> > > > during the resume scenario or is this in the sequential
> > > > get_connector()
> > > > type of call  ?
> > > > - Shashank
> > > > > -Chris
> > > 
> > > /F
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 


More information about the Intel-gfx mailing list