[PATCH] port-serial: don't close tty file descriptor twice
Brian, Sam
Samuel.Brian at digi.com
Thu Jun 21 01:11:05 UTC 2018
Hi Aleksander,
On 20/06/18 22:56, Aleksander Morgado wrote:
> Hey,
>
> On Wed, Jun 20, 2018 at 12:25 AM, Brian, Sam <Samuel.Brian at digi.com> wrote:
>> After the tty is opened, the fd is passed off to a g_io_channel which
>> manages it. When the serial port is closed with _close_internal(),
>> g_io_channel_shutdown() is called which then calls g_io_unix_close() which
>> in turn close()s the fd.
>> Currently, _close_internal() performs a second explicit close() after the
>> g_io_channel shutdown which *in the best case scenario* will fail with EBADF.
>> This is a race condition that can result in other file descriptors being
>> closed.
>>
>> This change avoids double closing the serial port fd.
>> ---
>>
>> Hey,
>> I had this problem with a Telit LE910-EU1. What do you think of the fix?
>>
>> To give some more detail about the problem...
>>
>> The g_io_channel_shutdown() calling close() can be verified with
>> this short program:
>>
>> $ cat test.c
>> #include <glib.h>
>> #include <errno.h>
>> #include <stdio.h>
>> #include <unistd.h>
>>
>> int main (int argc, char* argv[])
>> {
>> int ret;
>> GIOChannel *channel;
>>
>> channel = g_io_channel_unix_new(STDIN_FILENO);
>> g_io_channel_shutdown(channel, FALSE, NULL);
>>
>> ret = close(STDIN_FILENO);
>> printf("close() -> %d: %m\n", ret);
>> return 0;
>> }
>>
>> $ gcc -o test test.c -g $(pkg-config --cflags --libs gio-2.0)
>> $ ./test
>> close() -> -1: Bad file descriptor
>>
>> Here is a snippet of strace from ModemManager that demonstrates when the
>> race causes another fd to be closed:
>>
>> ### Close the tty (from g_io_channel_shutdown)
>> [pid 413] close(9 <unfinished ...>
>> [pid 687] <... poll resumed> ) = 1 ([{fd=7, revents=POLLIN}])
>> ...
>> ### tty's closed fd is reused by eventfd2
>> [pid 687] SYS_356(0, 0x80800, 0xc6c4c8, 0, 0xc6c4d0) = 9
>> ...
>> [pid 687] poll([{fd=7, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}], 3, -1 <unfinished ...>
>> ### First close succeeds (tty fd)
>> [pid 413] <... close resumed> ) = 0
>> ### Second (explicit) close succeeds (eventfd fd, oops)
>> [pid 413] close(9) = 0
>>
>> PID 413 is ModemManager, PID 687 is a clone process created with the flag
>> CLONE_FILES so the fd table is shared between parent and clone.
>> From the clone's read/writes, it appears to be used to communicate with
>> dbus.
>>
>> Strangely, closing the eventfd didn't always cause a noticable failure.
>> It was only when *somehow* the port-serial code tried to use an eventfd as
>> a tty and AT commands fail.
>>
>> src/mm-port-serial.c | 24 +++++++++++++-----------
>> 1 file changed, 13 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/mm-port-serial.c b/src/mm-port-serial.c
>> index 44699a76..9ba4e5d2 100644
>> --- a/src/mm-port-serial.c
>> +++ b/src/mm-port-serial.c
>> @@ -1262,6 +1262,12 @@ success:
>> error:
>> mm_warn ("(%s) failed to open serial device", device);
>>
>> + if (self->priv->fd >= 0) {
>> + if (!self->priv->iochannel)
>> + close (self->priv->fd);
>> + self->priv->fd = -1;
>> + }
>> +
>> if (self->priv->iochannel) {
>> g_io_channel_shutdown (self->priv->iochannel, FALSE, NULL);
>> g_io_channel_unref (self->priv->iochannel);
>> @@ -1274,11 +1280,6 @@ error:
>> self->priv->socket = NULL;
>> }
>>
>> - if (self->priv->fd >= 0) {
>> - close (self->priv->fd);
>> - self->priv->fd = -1;
>> - }
>> -
>> return FALSE;
>> }
>>
>> @@ -1348,6 +1349,13 @@ _close_internal (MMPortSerial *self, gboolean force)
>> tcflush (self->priv->fd, TCIOFLUSH);
>> }
>>
>> + /* Close fd if not part of the iochannel (shutdown closes fd) */
>> + if (self->priv->fd >= 0) {
>> + if (!self->priv->iochannel)
>> + close (self->priv->fd);
>> + self->priv->fd = -1;
>> + }
>> +
>> /* Destroy channel */
>> if (self->priv->iochannel) {
>> data_watch_enable (self, FALSE);
>> @@ -1356,12 +1364,6 @@ _close_internal (MMPortSerial *self, gboolean force)
>> self->priv->iochannel = NULL;
>> }
>>
>> - /* Close fd, if any */
>> - if (self->priv->fd >= 0) {
>> - close (self->priv->fd);
>> - self->priv->fd = -1;
>> - }
>> -
>> /* Destroy socket */
>> if (self->priv->socket) {
>> data_watch_enable (self, FALSE);
>
> I believe the analysis is good indeed. When we create a GIOChannel
> from a FD, there is nothing in the documentation saying that
> g_io_channel_shutdown() won't close the underlying FD. There is an
> explicit mention to g_io_channel_unref() not implicitly closing it,
> but nothing about shutdown. Following the GLib code, I see how
> g_io_channel_shutdown() will call the "io_close" method in the
> GIOFuncs table, which for the unix channel will run g_io_unix_close(),
> always doing an explicit close().
I didn't find the g_io_channel_shutdown() documentation immediately
obvious either. The strace tipped me off, and then I followed the code.
>
> Why not change the logic and instead of not closing when there is no
> iochannel, we just remove the g_io_channel_shutdown() calls? We know
> that g_io_channel_unref() won't do any implicit close, so we can
> safely destroy the iochannel regardless of whether we leave the
> underlying FD open, and then close the FD with close() ourselves
> unconditionally. That would probably be easier to understand as well?
> Instead of the shutdown() call we could place a big fat /* comment */
> explaining why we don't call shutdown. >
> What do you think?
>
If we directly close() instead of using g_io_channel_shutdown() then
(1) the channel buffer won't be flushed before closing (which is
probably fine?), and
(2) if there are other refs to the channel and the "closing"
g_io_channel_unref() doesn't destroy it, then the channel's internal
record keeping will think that the closed fd is still open and
readable/writable.
I know (2) is not a problem today because there are no other
g_io_channel_ref() calls, but it could be overlooked if one was added in
future.
Perhaps a better way would be to only use g_io_channel_shutdown() and
replace priv->fd with g_io_channel_unix_get_fd() where required for the
tcgetattr/tcsetattr/ioctl?
Removing priv->fd would mean the fd is tracked in only one place. The
"self->priv->fd >= 0" checks would become the "self->priv->iochannel"
check so the fd being open would be tied directly to the iochannel being
present.
Sam
More information about the ModemManager-devel
mailing list