ignoring a process' last words: SIGPIPE ...

Thu Mar 27 19:23:20 PDT 2008

Hi Avery,

	Thanks for your replies ! glad we came to the same conclusion :-)

On Thu, 2008-03-27 at 14:40 -0400, Avery Pennarun wrote:
> On Thu, Mar 27, 2008 at 2:13 PM, Havoc Pennington <hp at pobox.com> wrote:
> >  Both ends need to deal with the other end crashing or disconnecting. I
> >  would think adding a "clean" way to disconnect just creates another
> >  special case. It's a bug if a "dirty" or "abrupt" disconnect doesn't
> >  work, so the single, only codepath to test may as well be "just close
> >  the socket"
> 
> This seems to be the going theory nowadays, with http as well.  I
> guess even Jon Postel isn't perfect :)

	Heh; of course long-term if d-bus gets used in anger over the network -
we will get no reliable hangups, and need a 'close connection' msg, but
I completely agree making the hangup case work perfectly is what we
need.

> >  What I meant on this btw was not to use --print-reply, but make
> >  dbus-send block for reply even if it will not be printing the reply.
> 
> Be careful that this won't work when sending signals, though.

	Quite - IMHO, while it may work for method calls, it is a band-aid to
dbus-send: we will still be loosing other process' last messages for no
good reason :-)

> The nice thing about --print-reply is it already exists, so the daemon
> with the problem can be easily updated that way without waiting for a
> new dbus release.

	True; and perhaps this is a quick solution for all the NetworkManager
dbus-send-ing shell scripts that are malfunctioning intermittently.

> >  This may be part of the solution. I would be pretty worried about
> >  introducing bugs, though, if this were not done very carefully. It
> >  would definitely need test cases to be added and all uses of
> >  connection_get_connected(), transport_get_connected(), etc. audited to
> >  see if they "really meant" connected for read or for write.
> 
> I'm always an advocate of being careful, but in this case I wouldn't
> expect to find a problem.  Fundamentally, even if write() succeeds,
> you don't know if the remote end got it or not.  They might have
> disconnected *after* you did the write() but before they read the
> response.  So if anything in dbus assumes success just because write()
> was successful, it's a bug anyhow and always has been.

	Agreed.

> >  Perhaps it is less work than figuring out how to force immediate
>> read-and-dispatch of remaining incoming messages on disconnect.

	But the patch AFAICS doesn't force anything, let along anything
immediate - it just defers closing the connection until we hit the
polling mainloop & after we have processed any pending reads - which is
what we want - right ?

	Of course, how this integrates with dbus-glib, and whether we need a
similar fix in unix_do_iteration (etc.) is unclear to me, havn't read
that: but clearly the main issue with data-loss here is in the daemon
which AFAICS uses dbus-mainloop.c (right?).

	Having said that - it's true that a regression test is nice & might
help clarify the problem & solution; I'll look for the best place to add
one.

	Regards,

		Michael.

-- 
 michael.meeks at novell.com  <><, Pseudo Engineer, itinerant idiot