tdf#109085: status

Mon Aug 14 10:56:09 UTC 2017

Recently I attempted to fix tdf#109085 [1]. As the attempt wasn't
successful, I think I post this report so that others could suggest
opinions or have background if they decide to work on this.

The problem (as described in the issue) is caused by improper shutdown
of LO during Windows shutdown or logoff. This doesn't close documents
properly (keeping restore information, even for files without changes),
and keeps lock file for user profile.

The first manifestation of this is that LibreOffice emits recovery
dialog on next start. This happens e.g. when user had saved all open
documents, without closing LibreOffice, then initiated logoff/shutdown.
So, the recovery dialog is unexpected, and clearly wrong.

The second problem is visible in some environments where LibreOffice
user profile may be moved across network with Windows user profile:
e.g., Roaming profiles feature of Active Directory domain [2]. Normally,
LibreOffice detects that the profile lock file was left by error
(comparing that system name and user name are the same), and simply
ignores existing lock. But in case of a roaming profile, the lock had
been created on some workstation1, where user initiated the logoff. The
LibreOffice profile (with residual lock file) gets synchronised to
server, and then to another workstation where user logs on next time.
So, started LibreOffice detects that system name is different, and
supposes that an attempt to simultaneously use the same profile from
different instances happened. It then emits a warning:

> "Either another instance of Program is accessing your personal
> settings or personal settings being locked.

> Simultaneous access can lead to inconsistencies in your personal
> settings. Before continuing, you should make sure user 'X' closes
> Program of host 'Y'

Of course, that confuses the user.

When closed normally, LibreOffice makes multiple cleanup steps both 
before and after terminating program's main message loop. The specifics 
of Windows shutdown/logoff sequence is that Windows does not wait 
programs to close themselves; it only waits programs to process two 
specific window messages: WM_QUERYENDSESSION and WM_ENDSESSION - sent to 
program's all top-level windows. After the messages return result, 
system is free to forcefully terminate the program at any following 
moment. So, the program must be ready for that when it returns from last 
WM_ENDSESSION handler.

In my research, I found that Windows does the following steps when shuts 
down (tested with Win10):

1. It takes one of the program with the highest shutdown priority (see 
SetProcessShutdownParameters function [3]).
1.1. It starts sending WM_QUERYENDSESSION messages to all its top-level 
windows, in supposedly LIFO order. It seems that it waits for one window 
to return from the message before sending it to the next window.
1.2. If a window fails to process this message in 5 seconds, Windows 
will show UI telling that an app either does not respond, or waits for 
user input (depending on if the app message queue status), and user 
might either cancel shutdown, or continue (terminate it). In modern 
Windows (Vista+), the UI is obtrusive, cover the whole screen and does 
not allow to see the applications. E.g., the user will not see the 
application dialog asking if the document should be saved or not, unless 
user cancels the shutdown in that UI.
1.3. If one of the messages return FALSE, i.e., program denied to be 
closed (e.g., user answered "Cancel" to the application request to close 
a modified file), Windows will show UI telling that some app does not 
allow to shutdown. Again, the UI is obtrusive, and offers to cancel or 
continue (terminating the app).
1.4. When all WM_QUERYENDSESSION messages have been processed, Windows 
starts to send WM_ENDSESSION messages to the windows of the same program 
in the same order. The messages tell the final decision (to shutdown or 
not shutdown). If all of WM_QUERYENDSESSION returned TRUE, then final 
decision is to shutdown, naturally. If one of them returned FALSE, then 
final decision depends on user choice ("cancel" will lead to no 
shutdown, "continue" will lead to forced shutdown). Each window is given 
another 5 sec time span, and if it fails to return from the handler, the 
UI is shown again to user telling about hung program.
1.5. Windows ignores the return value from the WM_ENDSESSION handler. 
Only the fact that handler had completed matters.
2. When all WM_ENDSESSION messages have been processed, Windows 
continues with the next application of the same shutdown priority, then 
to applications with lower priorities.

Currently, in LibreOffice the messages are handled in SalFrameWndProc. 
This is the handler for user-visible (document or start center) windows. 
Only the first called WM_QUERYENDSESSION handler does the real work. It 
emits SalEvent::Shutdown, which (in ImplWindowFrameProc) calls 
GetpApp()->QueryExit() (followed by Application::Quit() on success). The 
QueryExit() tries to close all open frames (and this may ask user to 
close and save changes), then starts shutdown steps (its last task is to 
terminate main message loop). It returns false if user decided to cancel 
closing a document, and this is passed as the WM_QUERYENDSESSION handler 
return value. Actually, at the end of the first WM_QUERYENDSESSION 
handler, we either deny shutdown, or have all documents already closed 
(!), so at least there should be no recovery dialogs on next launch. The 
next WM_QUERYENDSESSION handlers do nothing (naturally), as that should 
be not necessary. The WM_ENDSESSION handler is only meant to reset the 
machinery in case when the shutdown was successfully interrupted (to be 
able to process it again in future).

Everything looks OK (well, at least for documents, if not for profile 
lock), but the real life shows that it just doesn't work.

One my guess was that Windows somehow detects that the window (to which 
current message was sent) was destroyed during the handler, and that 
leads to process termination (cannot say that there's much sense in that 
idea). I tried to re-structure the processing: only closed current 
window in its handler, and delegated whole application shutdown to the 
handler (in SalComWndProc) of LibreOffice's special hidden service 
window. But that didn't solve the problem. Sending the messages manually 
(from any utility like StefanTools' SendMessage [4]) always succeeds, 
but actual tests fail again. Also, I doubt that Windows should treat 
window destruction that way. And I haven't met any evidence that Windows 
can terminate an application in such circumstances (even in case of >5s 
processing, it only shows UI when the shutdown is not forced).

I also thought about possibility that the sequence somehow throws (or 
segfaults) in LibreOffice, thus terminating the correct shutdown 
sequence. But there's no notification about that on screen (or 
crashreporter), and sending messages always succeed, as I mentioned. The 
system didn't shutdown any hypothetical required services or resources 
LibreOffice could depend on, which may be seen if LibreOffice is started 
after system shutdown has been interrupted with some program with a 
lower shutdown priority.

My attempts are available on gerrit [5]. I have paused my further 
attempts for a while. My next try will be to make our guard process on 
Windows (soffice.exe) have higher shutdown priority, and to send 
shutdown message to soffice.bin from there. Maybe that would allow to 
workaround the situation (though I still don't think that's a proper 
solution). It would also lead to unclear messages when soffice.exe would 
wait for soffice.bin showing "Save?" dialog: currently, Windows tells 
that a program is waiting for user input; with that change, it would say 
that a program is hung. :(

[1] https://bugs.documentfoundation.org/show_bug.cgi?id=109085
[2] https://en.wikipedia.org/wiki/Roaming_user_profile
[3] https://msdn.microsoft.com/en-us/library/ms686227
[4] http://stefanstools.sourceforge.net/SendMessage.html
[5] https://gerrit.libreoffice.org/39884

-- 
Best regards,
Mike Kaganski