[Intel-gfx] [PATCH] [drm][i915] Increase LSPCON timeout

Fredrik Schön fredrikschon at gmail.com
Thu Aug 16 21:51:07 UTC 2018


tor 2018-08-16 klockan 14:43 -0700 skrev Rodrigo Vivi:
> On Thu, Aug 16, 2018 at 11:35:26PM +0200, fredrikschon at gmail.com
> wrote:
> > tor 2018-08-16 klockan 11:23 -0700 skrev Rodrigo Vivi:
> > > On Thu, Aug 16, 2018 at 10:36:43AM +0200, Fredrik Schön wrote:
> > > > Shashank,
> > > > 
> > > > Den tors 16 aug. 2018 kl 10:15 skrev Sharma, Shashank
> > > > <shashank.sharma at intel.com>:
> > > > > 
> > > > > Hey Chris,
> > > > > 
> > > > > 
> > > > > On 8/16/2018 1:13 PM, Chris Wilson wrote:
> > > > > > Quoting Sharma, Shashank (2018-08-16 08:33:36)
> > > > > > > Regards
> > > > > > > 
> > > > > > > Shashank
> > > > > > > 
> > > > > > > 
> > > > > > > On 8/16/2018 12:47 PM, Jani Nikula wrote:
> > > > > > > > On Wed, 15 Aug 2018, Rodrigo Vivi <
> > > > > > > > rodrigo.vivi at intel.com>
> > > > > > > > wrote:
> > > > > > > > > On Wed, Aug 15, 2018 at 03:39:40PM -0700, Rodrigo
> > > > > > > > > Vivi
> > > > > > > > > wrote:
> > > > > > > > > > On Mon, Aug 13, 2018 at 07:51:33PM +0200, Fredrik
> > > > > > > > > > Schön
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > First of all we need to fix the commit subject:
> > > > > > > > > > 
> > > > > > > > > > drm/i915: Increase LSPCON timeout
> > > > > > > > > > 
> > > > > > > > > > (this can be done when merging, no need to resend)
> > > > > > > > > > 
> > > > > > > > > > > 100 ms is not enough time for the LSPCON adapter
> > > > > > > > > > > on
> > > > > > > > > > > Intel NUC devices to
> > > > > > > > > > > settle. This causes dropped display modes at
> > > > > > > > > > > driver
> > > > > > > > > > > initialisation.
> > > > > > > > > > > 
> > > > > > > > > > > Increase timeout to 1000 ms.
> > > > > > > > > > > 
> > > > > > > > > > > Fixes: 
> > > > > > > > > > > 
https://bugzilla.redhat.com/show_bug.cgi?id=1570392
> > > > > > > > > > 
> > > > > > > > > > Missusage of "Fixes:" tag, please read
> > > > > > > > > > 
> > > > > > > > > > 
> > 
> > 
https://01.org/linuxgraphics/gfx-docs/maintainer-tools/drm-intel.html#fixes
> > > > > > > > > > 
> > > > > > > > > > But also no need for resending... could be fixed
> > > > > > > > > > when
> > > > > > > > > > merging
> > > > > > > > > > 
> > > > > > > > > > The right one would be:
> > > > > > > > > > 
> > > > > > > > > > Bugzilla: 
> > > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1570392
> > > > > > > > > > Fixes: 357c0ae9198a ("drm/i915/lspcon: Wait for
> > > > > > > > > > expected LSPCON mode to settle")
> > > > > > > > > > Cc: Shashank Sharma <shashank.sharma at intel.com>
> > > > > > > > > > Cc: Imre Deak <imre.deak at intel.com>
> > > > > > > > > > Cc: Jani Nikula <jani.nikula at intel.com>
> > > > > > > > > > Cc: <stable at vger.kernel.org> # v4.11+
> > > > > > > > > > 
> > > > > > > > > > Since initial 100 seemed to be empirical and this
> > > > > > > > > > increase seems to
> > > > > > > > > > help other cases I'm in favor of this move so
> > > > > > > > > > 
> > > > > > > > > > Reviewed-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > > > > > > > 
> > > > > > > > > > However I will wait a bit before merging it
> > > > > > > > > > so Imre, Shashank, and/or Jani can take a look
> > > > > > > > > > here...
> > > > > > > > > 
> > > > > > > > > now, really cc'ing them...
> > > > > > > > 
> > > > > > > > Shashank? Does this slow down non-LSPCON paths?
> > > > > > > 
> > > > > > > This will slow down the lspcon probing and resume part,
> > > > > > > but
> > > > > > > both of them
> > > > > > > happen only when LSPCON device is found. So to answer
> > > > > > > your
> > > > > > > question,
> > > > > > > this will not slow down the non-lspcon path, but will
> > > > > > > slow
> > > > > > > down the
> > > > > > > LSPCON connector resume and probe time. but I would
> > > > > > > recommend, instead
> > > > > > > of increasing it to 1000 ms in a single shot, we might
> > > > > > > want
> > > > > > > to gradually
> > > > > > > pick this up, on a wake-and-check way.
> > > > > > 
> > > > > > wait_for() checks every [10us, 1ms] until the condition is
> > > > > > met,
> > > > > > or it
> > > > > > times out. So, so long as we don't enter this path for
> > > > > > !LPSCON
> > > > > > where we
> > > > > > know that it will timeout, the wait_for() will only take as
> > > > > > long as is
> > > > > > required for the connector to settle.
> > > > > 
> > > > > We wont hit !LSPCON timeout case here, as we have already
> > > > > read
> > > > > the
> > > > > dongle signature successfully by now.  But I was thinking
> > > > > that,
> > > > > if the
> > > > > spec recommends max wait time as 100ms (which is of course
> > > > > doesn't seem
> > > > > enough), if we can't detect i2c-over-aux after first 500ms, I
> > > > > guess we
> > > > > wont be able to do that in next 500ms too. So is it really ok
> > > > > to
> > > > > wait
> > > > > this long in the resume sequence ?
> > > > > 
> > > > > I guess Fredrik can provide some inputs here on if there are
> > > > > some
> > > > > experiments behind this number of 1000ms, or this is just a
> > > > > safe
> > > > > bet ?
> > > > > > 
> > > > 
> > > > My first patch attempt - which is attached to the Redhat and
> > > > FDO
> > > > Bugzilla
> > > > bugs - added a retry loop around the original 100 ms timeout.
> > > > The
> > > > retry loop
> > > > did trigger, but never more than once in a row in my testing.
> > > > 
> > > > So possibly 200 ms would be a sufficient timeout, but as the
> > > > wait_for() loop
> > > > terminates early on success I suggested a conservative value of
> > > > 1000 ms.
> > > 
> > > Since Shashank mentioned 100us came from some spec, maybe the
> > > double
> > > is already
> > > a conservative value.
> > > 
> > > Since there is the concerns of delaying something when LSPCON
> > > fails
> > > and we are possibly looping on connectors somewhere/somehow I
> > > believe
> > > we need
> > > to have a balanced approach here.
> > > 
> > > could you please try the 200 ms approach on your case there for a
> > > while and
> > > see how it goes?
> > > 
> > 
> > I ran a few stress tests using Nicholas test case from [1]. I can
> > quickly reproduce the failure with timeouts 100 ms, 110 ms, 130 ms,
> > 150
> > ms and 170 ms. I am unable to reproduce any failures with timeouts
> > 190
> > ms (n=18) and 200 ms (n=20+16).
> > 
> > So while 200 ms appears to work on my hardware with reasonable
> > confidence, I wouldn't call 200 conservative. But then again, I do
> > not
> > know the specifications. I'm just being empirical.
> 
> I don't know this specification either and if that exists the
> empirical
> shows that it is wrong or we have another bug somewhere else.
> 
> So... let's call 400 safe enough for now then?!
> 

Sound reasonable. Do you want me to respin the patch?
> > 
> > [1] https://bugs.freedesktop.org/show_bug.cgi?id=107503#c15
> > 
> > > > 
> > > > > > Can we do other connectors at the same time, or does
> > > > > > probing
> > > > > > LSPCON
> > > > > > block the system?
> > > > > 
> > > > > We can do other connectors at the same time in DRM layer at-
> > > > > least,
> > > > > LSPCON blocks only this connector. I was curious if are we
> > > > > doing
> > > > > this
> > > > > during the resume scenario or is this in the sequential
> > > > > get_connector()
> > > > > type of call  ?
> > > > > - Shashank
> > > > > > -Chris
> > > > 
> > > > /F
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx



More information about the Intel-gfx mailing list