[Intel-gfx] [PATCH i-g-t] kms_atomic_transition: Output more finegrained progress info to avoid CI watchdog timeout

Imre Deak imre.deak at intel.com
Fri Oct 20 12:33:37 UTC 2017


On Thu, Oct 19, 2017 at 06:48:30AM +0000, Lofstedt, Marta wrote:
> 
> 
> > -----Original Message-----
> > From: Intel-gfx [mailto:intel-gfx-bounces at lists.freedesktop.org] On Behalf
> > Of Daniel Vetter
> > Sent: Wednesday, October 18, 2017 5:36 PM
> > To: Latvala, Petri <petri.latvala at intel.com>
> > Cc: intel-gfx at lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH i-g-t] kms_atomic_transition: Output more
> > finegrained progress info to avoid CI watchdog timeout
> > 
> > On Wed, Oct 18, 2017 at 02:43:38PM +0300, Petri Latvala wrote:
> > > On Wed, Oct 18, 2017 at 02:29:33PM +0300, Imre Deak wrote:
> > > > The CI software watchdog (owatch) will timeout if the test doesn't
> > > > output anything for a long time on standard out or error. At least
> > > > the plane-all-modeset-transition and
> > > > plane-all-modeset-transition-fences
> > > > subtests run without any output longer than the watchdog timeout, so
> > > > output some more progress info.
> > >
> > > No, owatch is wrapping piglit, and pings the watchdog if _piglit_
> > > prints anything. Which it does on start/exit of a test.
> > 
> > tbh this sounds like owatch being dense and it shouldn't try to reboot this
> > quickly. What's the current owatch timeout?
> > 
> > Aside: What exactly does owatch give us? I thought jenkins also watches
> > machines and reboots them using the ac switch ... And owatch provides
> > spurious reboots?
> Daniel,
> Owatch gives us the knowledge that is was a test that took too long. I.e. we will know that it was not a system hang. 
>  We also know that the NMI watchdog didn't trigger. 
> I believe this is extremely useful information when you are starting to debug the issue
> 
> Imre if you believe that owatch is preventing you from getting
> information to debug why these test are taking so extremely long time,
> it would be easy to increase the timeout or even do runs without it
> being enabled.

It would be useful to have a clear indication if the test really hang or
it just took too long. Right now this isn't obvious without going
through the logs.

--Imre


More information about the Intel-gfx mailing list