[Libreoffice-commits] core.git: Branch 'libreoffice-5-4' - desktop/source
Stephan Bergmann
sbergman at redhat.com
Thu Oct 26 10:00:52 UTC 2017
desktop/source/app/officeipcthread.cxx | 83 +++++++++++++++++++++------------
1 file changed, 53 insertions(+), 30 deletions(-)
New commits:
commit 3ddbaa4132204cb4b6e92e9db6fff8e849dec4d6
Author: Stephan Bergmann <sbergman at redhat.com>
Date: Tue Oct 24 17:25:47 2017 +0200
Avoid race between DbusIpcThread::close and DbusIpcThread::execute
...that caused Flatpak'ed LO to sometimes not terminate properly after
243d743dae127fe11ecd7f807121da8815af8f01 "solenv/flatpak-manifest.in:
incorporate upstream sandboxing improvements" had removed --socket=session-bus
and thus introduced flatpak-dbus-proxy into the mix:
> Oct 24 15:25:16 <sberg_> I'm not sure who's at fault there; on the LO side I
> have a thread processing incoming dbus requests (DbusIpcThread::execute,
> <https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx#552>);
> that thread is typically waiting within dbus_connection_read_write_dispatch
> for new messages; if LO wants to close, it needs to get that thread out of
> dbus_connection_read_write_dispatch, and the only way I found back then is
> what's in DbusIpcThread::close (just following DbusIpcThread::execute), with
> a big "this apparently needs a more DBus-y design anyway" testament to my
> cluelesness re dbus; it calls dbus_bus_get_private to connect to itself, and
> send a Close message that'll cause DbusIpcThread::execute to fall out of the
> loop; the dbus_bus_get_private leads to a flatpak_proxy_incoming in the
> proxy, which returns TRUE, and after that I see no further activity in the
> proxy (but not sure what functions I should all put breakpoints on), and the
> LO side is hung in dbus_bus_release_name a few lines futher down
> Oct 24 15:26:20 <alexlarsson> sberg_: i ran this:
> https://github.com/sgh/dbus-examples/blob/master/dbus-signal.c
> Oct 24 15:26:27 <alexlarsson> sberg_: which calls dbus_bus_release_name
> Oct 24 15:26:30 <alexlarsson> and it seems to work for me
> Oct 24 15:26:39 <alexlarsson> If i --own-name=org.DBusTest.SignalTest
> Oct 24 15:27:50 <sberg_> so maybe it's related to my unorthodox way of trying
> to quit the dbus loop there; if anybody with a clue about dbus would be
> willing to help me with the above, I'd be very grateful :)
> Oct 24 15:28:06 <alexlarsson> I don't see how the proxy should affect it tho
> Oct 24 15:28:11 <alexlarsson> other than maybe timing?
> Oct 24 15:29:06 <sberg_> maybe; I think I've seen things actually succeed once
> when I stepped through the proxy somewhat manually
> Oct 24 15:31:01 <alexlarsson> sberg_: eeeentresting
> Oct 24 15:31:07 <alexlarsson> sberg_: so, it does an own-call
> Oct 24 15:31:13 <alexlarsson> sberg_: maybe that is what breaks?
> Oct 24 15:31:33 <sberg_> yeah, that's my dumb speculation
> Oct 24 15:34:38 <alexlarsson> So, you send the close from another thread, not
> waiting for the reply
> Oct 24 15:34:56 <alexlarsson> then you start working on the main connection
> immediately
> Oct 24 15:35:02 <alexlarsson> shouldn
> Oct 24 15:35:25 <alexlarsson> shouldn't you grab the mutex after that?
> Oct 24 15:36:35 <alexlarsson> This whole thing smells racy
> Oct 24 15:38:18 <alexlarsson> My guess is that with regular dbus, the flush
> call will schedule dbus daemon which sends back the message, waking up the
> other thread before continuing
> Oct 24 15:38:22 <sberg_> alexlarsson, ah, yeah, I see; apparently didn't
> occur to me back then that I mustn't call dbus_bus_release_name when the
> execute loop may still be busy
> Oct 24 15:38:25 <alexlarsson> But now you have two context switches
> Oct 24 15:38:37 <alexlarsson> so we continue before we get the reply
> Oct 24 15:38:44 <alexlarsson> eh, not reply
> Oct 24 15:38:46 <alexlarsson> the close message
> Oct 24 15:38:52 <alexlarsson> and then you race
> Oct 24 15:39:11 <alexlarsson> Honestly i don't remember the exact rules for
> dbus threadedness
> Oct 24 15:39:58 <sberg_> ...so looks like something that can be corrected on
> the LO side after all; I'll give it a try
> Oct 24 15:40:17 <alexlarsson> you're calling dbus_threads_init_default(), so
> it should be nominally threadsafe
> Oct 24 15:40:29 <alexlarsson> still, I imagine it can still deadlock
> Oct 24 15:43:12 <alexlarsson> sberg_: i still don't see the exact deadlock
> though
> Oct 24 15:43:32 <alexlarsson> i imagine working on the separate private
> connection is threadsafe
> Oct 24 15:43:35 <alexlarsson> and we flush that
> Oct 24 15:44:08 <alexlarsson> then we call dbus_bus_release_name, which i
> imagine would block if the dbus thread is in read_write_dispatch
> Oct 24 15:44:35 <alexlarsson> The question is, why is the close message not
> delivered?
> Oct 24 15:44:44 <alexlarsson> I mean, we flushed it...
> Oct 24 15:45:48 <alexlarsson> sberg_: oh, i think i know
> Oct 24 15:45:53 <alexlarsson> sberg_: release_name probably sent some message,
> but then the other thread pop:ed the reply
> Oct 24 15:46:04 <alexlarsson> sberg_: queue very long wait
> Oct 24 15:51:53 <sberg_> alexlarsson, yeah, sounds plausible; thanks!
> Oct 24 15:52:14 <sberg_> (now that I want to reproduce it, closing succeeds
> cleanly every time I try, of course...)
> Oct 24 15:54:48 <alexlarsson> sberg_: obviously
> Oct 24 15:55:30 <alexlarsson> sberg_: that just makes the race theory more
> valid tho
> Oct 24 15:57:31 <sberg_> alexlarsson, yup, after stopping the unrelated
> background LO build that had put the machine under full load, I see it hang
> again in release_name, while the execute loop is still happily running too;
> that nicely confirms your theory
> Oct 24 15:58:28 <alexlarsson> sberg_: alternatively you could tell your users
> to always build LO in the background
> Oct 24 15:59:04 <sberg_> I'll try to put that into the release notes; lets
> see...
(cherry picked from commit 38081c0884b64ed1132047973b4dccc42d548c89, plus
follow-up dc3ff192f3f048059cf149f416d0b237eb33b014 "Avoid further unwanted
interference of DbusIpcThread::execute/close")
Change-Id: I2a8a58f9259d2854f42f4aa3db5bb232cf70845d
Reviewed-on: https://gerrit.libreoffice.org/43867
Tested-by: Jenkins <ci at libreoffice.org>
Reviewed-by: Tor Lillqvist <tml at collabora.com>
diff --git a/desktop/source/app/officeipcthread.cxx b/desktop/source/app/officeipcthread.cxx
index 24252716bd73..6b797d49aa57 100644
--- a/desktop/source/app/officeipcthread.cxx
+++ b/desktop/source/app/officeipcthread.cxx
@@ -452,6 +452,7 @@ private:
void close() override;
DbusConnectionHolder connection_;
+ osl::Condition closeDone_;
};
RequestHandler::Status DbusIpcThread::enable(rtl::Reference<IpcThread> * thread)
@@ -572,6 +573,19 @@ void DbusIpcThread::execute()
if (dbus_message_is_method_call(
msg.message, "org.libreoffice.LibreOfficeIpcIfc0", "Close"))
{
+ DbusMessageHolder repl(dbus_message_new_method_return(msg.message));
+ if (repl.message == nullptr) {
+ SAL_WARN(
+ "desktop.app", "dbus_message_new_method_return failed");
+ } else {
+ dbus_uint32_t serial = 0;
+ if (!dbus_connection_send(
+ connection_.connection, repl.message, &serial)) {
+ SAL_WARN("desktop.app", "dbus_connection_send failed");
+ } else {
+ dbus_connection_flush(connection_.connection);
+ }
+ }
break;
}
if (!dbus_message_is_method_call(
@@ -615,13 +629,38 @@ void DbusIpcThread::execute()
}
dbus_connection_flush(connection_.connection);
}
+ closeDone_.wait();
+ DBusError e;
+ dbus_error_init(&e);
+ int n = dbus_bus_release_name(
+ connection_.connection, "org.libreoffice.LibreOfficeIpc0", &e);
+ assert((n == -1) == bool(dbus_error_is_set(&e)));
+ switch (n) {
+ case -1:
+ SAL_WARN(
+ "desktop.app",
+ "dbus_bus_release_name failed with: " << e.name << ": "
+ << e.message);
+ dbus_error_free(&e);
+ break;
+ case DBUS_RELEASE_NAME_REPLY_RELEASED:
+ break;
+ case DBUS_RELEASE_NAME_REPLY_NOT_OWNER:
+ case DBUS_RELEASE_NAME_REPLY_NON_EXISTENT:
+ SAL_WARN(
+ "desktop.app",
+ "dbus_bus_release_name failed with unexpected " << +n);
+ break;
+ default:
+ for (;;) std::abort();
+ }
}
void DbusIpcThread::close() {
- assert(connection_.connection != nullptr);
- DBusError e;
- dbus_error_init(&e);
{
+ assert(connection_.connection != nullptr);
+ DBusError e;
+ dbus_error_init(&e);
// Let DbusIpcThread::execute return from dbus_connection_read_write;
// for now, just abort on failure (the process would otherwise block,
// with DbusIpcThread::execute hanging in dbus_connection_read_write);
@@ -645,35 +684,19 @@ void DbusIpcThread::close() {
SAL_WARN("desktop.app", "dbus_message_new_method_call failed");
std::abort();
}
- if (!dbus_connection_send(con.connection, msg.message, nullptr))
- {
- SAL_WARN("desktop.app", "dbus_connection_send failed");
- std::abort();
+ DbusMessageHolder repl(
+ dbus_connection_send_with_reply_and_block(
+ con.connection, msg.message, 0x7FFFFFFF, &e));
+ assert((repl.message == nullptr) == bool(dbus_error_is_set(&e)));
+ if (repl.message == nullptr) {
+ SAL_INFO(
+ "desktop.app",
+ "dbus_connection_send_with_reply_and_block failed with: "
+ << e.name << ": " << e.message);
+ dbus_error_free(&e);
}
- dbus_connection_flush(con.connection);
- }
- int n = dbus_bus_release_name(
- connection_.connection, "org.libreoffice.LibreOfficeIpc0", &e);
- assert((n == -1) == bool(dbus_error_is_set(&e)));
- switch (n) {
- case -1:
- SAL_WARN(
- "desktop.app",
- "dbus_bus_release_name failed with: " << e.name << ": "
- << e.message);
- dbus_error_free(&e);
- break;
- case DBUS_RELEASE_NAME_REPLY_RELEASED:
- break;
- case DBUS_RELEASE_NAME_REPLY_NOT_OWNER:
- case DBUS_RELEASE_NAME_REPLY_NON_EXISTENT:
- SAL_WARN(
- "desktop.app",
- "dbus_bus_release_name failed with unexpected " << +n);
- break;
- default:
- for (;;) std::abort();
}
+ closeDone_.set();
}
#endif
More information about the Libreoffice-commits
mailing list