[Libreoffice-commits] core.git: coverity#1323754 we apparently can survive std::abort for a while
Michael Meeks
michael.meeks at collabora.com
Fri Sep 11 07:47:40 PDT 2015
Hi Stephan,
On Fri, 2015-09-11 at 16:04 +0200, Stephan Bergmann wrote:
> But I doubt we want to make our code base more capricious than
> necessary, to shield us from behavior exhibited by the Windows debugging
> environment.
Ah ;-) well - I saw std::abort not aborting, and I added that to make
it actually - die ;-> you recall the discussion: _exit() was the
solution.
I rather suspect that while the process is being debugged, and the
dialog is up that other threads are making progress - anyhow - the
windows behavior is somewhat unusual here.
> > Which thread would you expect the signal to be delivered
>> to (I wonder) - it's all a bit interesting I suspect.
>
> The case should be pretty clear for a synchronous, std::abort-generated
> SIGABRT (hopefully even on Windows).
I don't find much that's terribly clear about signal handling, and/or
the cross-thread synchronization mess that follows it around under the
covers =)
> > My hope was that the watchdog would carry on working in these cases &
> > kill us again more aggressively if necessary if people insist on
> > ignoring these guys.
>
> But how should it do that? Even if the SIGABRT-handling were done on
> another thread, the watchdog thread just couldn't progress past the
> std::abort() (notwithstanding cheating in a debugging environment).
Good point =) so best to start a new watchdog instance in the abort
handler then.
> So there's only a single instance of the watchdog thread supposed to
> ever run. The odd "static bool bFired" in OpenGLWatchdogThrad::execute
> had fooled me to assume otherwise (for why else should the variable have
> static storage duration).
Ah - this was a reasonably harmless way to avoid using a variable in a
wider scope ;-) given that this class is a singleton.
> Anyway, generalizing that "watchdog the OpenGLWatchdogThread, in case
> our signal handler gets stuck" idea obviously leads to a "watchdog our
> signal handler, in case it gets stuck" feature, i.e., spawn a thread
> early in our signal handler (assuming spawning an additional thread
> doesn't make our violation of what a signal handler is supposed to be
> allowed to do any worse), which will call _exit after a fixed amount of
> time.
Actually, I think that's a great idea =) I've ~often seen traces out of
bugzilla for hung processes (on Linux at least) where the hang was in a
crash from the recovery process. That leads to these unfortunate dead
windows lingering around etc. and upset users.
> The question just is, what is a reasonable value for that amount
> of time. Make it too short, and you'll prevent recovery of documents
> that take long to save and for which our document recovery would
> otherwise have happened to work fine.
Right; hmm =) several of the traces I remember seeing were nasty ones
where eg. the malloc arena mutex was locked - making it rather hard to
make progress ;-) or we were blocked trying to get the solar-mutex.
I guess if we were truly 31337 we would hook some interaction handler
that had a global progress-bar hook (so we would see the emergency
'save' making progress), and another that would ignore yielding waiting
for user-interaction (or do we not ask questions during the crash
handler - I forget - there is plenty of GUI stuff there still).
It might work: I'd say if there is no progress-bar type update from a
file filter in 5 seconds of any kind, it is "really game-over" =)
> And the true route ahead of course is to no longer put our document
> recovery strategy at the mercy of a brittle, undefined-behavior--riddled
> signal handler.
Sure =) far more ideal would be to stream the keystroke / edits that
happen on the document and fsync them to an append-only file ever few
keystrokes, and then re-play them on crash-recovery =) so "nothing can
ever be lost" - would be ideal.
Only problem is - we need to implement something like a collaborative
editor first I think =)
ATB,
Michael.
--
michael.meeks at collabora.com <><, Pseudo Engineer, itinerant idiot
More information about the LibreOffice
mailing list