[Libreoffice-commits] core.git: coverity#1323754 we apparently can survive std::abort for a while

Fri Sep 11 07:47:40 PDT 2015

Hi Stephan,

On Fri, 2015-09-11 at 16:04 +0200, Stephan Bergmann wrote:
> But I doubt we want to make our code base more capricious than 
> necessary, to shield us from behavior exhibited by the Windows debugging 
> environment.

	Ah ;-) well - I saw std::abort not aborting, and I added that to make
it actually - die ;-> you recall the discussion: _exit() was the
solution.

	I rather suspect that while the process is being debugged, and the
dialog is up that other threads are making progress - anyhow - the
windows behavior is somewhat unusual here.

> > 	Which thread would you expect the signal to be delivered
>> to (I wonder) - it's all a bit interesting I suspect.
> 
> The case should be pretty clear for a synchronous, std::abort-generated 
> SIGABRT (hopefully even on Windows).

	I don't find much that's terribly clear about signal handling, and/or
the cross-thread synchronization mess that follows it around under the
covers =)

> > 	My hope was that the watchdog would carry on working in these cases &
> > kill us again more aggressively if necessary if people insist on
> > ignoring these guys.
> 
> But how should it do that?  Even if the SIGABRT-handling were done on 
> another thread, the watchdog thread just couldn't progress past the 
> std::abort() (notwithstanding cheating in a debugging environment).

	Good point =) so best to start a new watchdog instance in the abort
handler then.

> So there's only a single instance of the watchdog thread supposed to 
> ever run.  The odd "static bool bFired" in OpenGLWatchdogThrad::execute 
> had fooled me to assume otherwise (for why else should the variable have 
> static storage duration).

	Ah - this was a reasonably harmless way to avoid using a variable in a
wider scope ;-) given that this class is a singleton.

> Anyway, generalizing that "watchdog the OpenGLWatchdogThread, in case 
> our signal handler gets stuck" idea obviously leads to a "watchdog our 
> signal handler, in case it gets stuck" feature, i.e., spawn a thread 
> early in our signal handler (assuming spawning an additional thread 
> doesn't make our violation of what a signal handler is supposed to be 
> allowed to do any worse), which will call _exit after a fixed amount of 
> time.

	Actually, I think that's a great idea =) I've ~often seen traces out of
bugzilla for hung processes (on Linux at least) where the hang was in a
crash from the recovery process. That leads to these unfortunate dead
windows lingering around etc. and upset users.

>   The question just is, what is a reasonable value for that amount 
> of time.  Make it too short, and you'll prevent recovery of documents 
> that take long to save and for which our document recovery would 
> otherwise have happened to work fine.

	Right; hmm =) several of the traces I remember seeing were nasty ones
where eg. the malloc arena mutex was locked - making it rather hard to
make progress ;-) or we were blocked trying to get the solar-mutex.

	I guess if we were truly 31337 we would hook some interaction handler
that had a global progress-bar hook (so we would see the emergency
'save' making progress), and another that would ignore yielding waiting
for user-interaction (or do we not ask questions during the crash
handler - I forget - there is plenty of GUI stuff there still).

	It might work: I'd say if there is no progress-bar type update from a
file filter in 5 seconds of any kind, it is "really game-over" =)

> And the true route ahead of course is to no longer put our document 
> recovery strategy at the mercy of a brittle, undefined-behavior--riddled 
> signal handler.

	Sure =) far more ideal would be to stream the keystroke / edits that
happen on the document and fsync them to an append-only file ever few
keystrokes, and then re-play them on crash-recovery =) so "nothing can
ever be lost" - would be ideal.

	Only problem is - we need to implement something like a collaborative
editor first I think =)

	ATB,

		Michael.

-- 
 michael.meeks at collabora.com  <><, Pseudo Engineer, itinerant idiot