[systemd-devel] Fail to reset-failed as user

Olivier Brunel jjk at jjacky.com
Sat Feb 14 10:37:00 PST 2015


On 02/11/15 21:13, Lennart Poettering wrote:
> On Thu, 05.02.15 19:20, Olivier Brunel (jjk at jjacky.com) wrote:
> 
>> On 02/03/15 22:17, Lennart Poettering wrote:
>>> On Fri, 12.12.14 16:06, Olivier Brunel (jjk at jjacky.com) wrote:
>>>
>>> Sorry for resurrecting this old thread this late. Is this still an
>>> issue? Does this work on current git?
>>
>> Still an issue w/ 218 yes, haven't actually had time to try with current
>> git. I'll try to do that over the weekend.
>>
>>>> Today I had one unit in failed state, and after taking care of things I
>>>> wanted to simply reset its state (to inactive) w/out having to start it.
>>>>
>>>> Looking up the man page, I see there's a command reset-failed for this
>>>> exact purpose, awesome. So I go:
>>>>
>>>> % systemctl reset-failed backups2.service
>>>> Failed to reset failed state of unit backups2.service: No such device or
>>>> address
>>>
>>> Hmm, did you issue this from some weird environment (su/sudo context,
>>> from a system service context or so?)
>>>
>>> If this is still an issue, could you try to reproduce this after
>>> issuing "systemd-analyze set-log-level debug"? Then please attach the
>>> log output this generates!
>>
>> Meanwhile, this is what I get today: http://ix.io/gaR
>> This is not from some weird environment no (or, not that I'm aware of),
>> but an (almost) up-to-date Arch Linux x64, systemd 218.
> 
> Puzzled. Don't see how this can happen. Also, works fine here...
> 
> If you can reproduce this on git, it would be good to gdb this. For
> that:
> 
> a) start gdb, type "attach 1", to attach to PID 1
> 
> b) add a breakpoint on method_reset_failed_unit, by issuing "b
>    method_reset_failed_unit"
> 
> c) Continue execution of PID 1, by typing in the line "c"
> 
> d) trigger the issue, gdb should break at that instant. 
> 
> e) now, check which call fails by stepping through the function with
>    "n". As soon as the function is left, use "c" to continue
>    execution. Not that the function will be executed twice, one after
>    the other. The first invocation will be without PolicyKit privs,
>    the second one with PolicyKit privs. The second invocation is the
>    interesting one. Check why it exits non-zero, and whether
>    unit_reset_failed() is invoked at all (which actually does the
>    inetersting work).
> 
> f) post your findings here

Alright so I did some testing, here's what I found:

- on that second invocation, method_reset_failed_unit() fails from its
call to bus_unit_method_reset_failed(), and that comes down to
bus_message_enter_struct() returning -ENXIO.

- I don't know how this whole thing is supposed to work, but what I
noticed is that bus_message_enter_struct() is called twice from
method_reset_failed_unit(), once from bus_verify_manage_unit_async() and
then from bus_unit_method_reset_failed(). Details as follow:

First, when bus_verify_manage_unit_async() is called:

#0  bus_message_enter_struct (m=0x7f5fb0cb88b0, c=0x7f5fb0cb8aa0,
    contents=0x7f5faef0d152 "bba{ss}", item_size=0x7fffcebd4928,
offsets=0x7fffcebd4918,
    n_offsets=0x7fffcebd4920) at src/libsystemd/sd-bus/bus-message.c:3865
#1  0x00007f5faee80136 in sd_bus_message_enter_container
(m=0x7f5fb0cb88b0, type=114 'r',
    contents=0x7f5faef0d152 "bba{ss}") at
src/libsystemd/sd-bus/bus-message.c:4012
#2  0x00007f5faee8e00d in bus_verify_polkit_async (call=0x7f5fb0ca59a0,
capability=21,
    action=0x7f5faeef05f8 "org.freedesktop.systemd1.manage-units",
interactive=false,
    registry=0x7f5fb0c0a890, error=0x7fffcebd4ad0) at
src/libsystemd/sd-bus/bus-util.c:374
#3  0x00007f5faee0aa00 in bus_verify_manage_unit_async
(m=0x7f5fb0c0a460, call=0x7f5fb0ca59a0,
    error=0x7fffcebd4ad0) at src/core/dbus.c:1196
#4  0x00007f5faee0c801 in method_reset_failed_unit (bus=0x7f5fb0ca32f0,
message=0x7f5fb0ca59a0,
    userdata=0x7f5fb0c0a460, error=0x7fffcebd4ad0) at
src/core/dbus-manager.c:574

(gdb) p *c
$38 = {enclosing = 0 '\000', need_offsets = false, index = 0,
saved_index = 0,
  signature = 0x7f5fb0c09110 "(bba{ss})", before = 0, begin = 0, end =
133, array_size = 0x0,
  offsets = 0x0, n_offsets = 0, offsets_allocated = 0, offset_index = 0,
item_size = 133,
  peeked_signature = 0x0}
(gdb) p contents
$39 = 0x7f5faef0d152 "bba{ss}"

It eventually returns 1.

Then it gets to called from bus_unit_method_reset_failed():

#0  bus_message_enter_struct (m=0x7f5fb0cb88b0, c=0x7f5fb0cb8250,
    contents=0x7f5faef0d152 "bba{ss}", item_size=0x7fffcebd48e8,
offsets=0x7fffcebd48d8,
    n_offsets=0x7fffcebd48e0) at src/libsystemd/sd-bus/bus-message.c:3865
#1  0x00007f5faee80136 in sd_bus_message_enter_container
(m=0x7f5fb0cb88b0, type=114 'r',
    contents=0x7f5faef0d152 "bba{ss}") at
src/libsystemd/sd-bus/bus-message.c:4012
#2  0x00007f5faee8e00d in bus_verify_polkit_async (call=0x7f5fb0ca59a0,
capability=21,
    action=0x7f5faeef05f8 "org.freedesktop.systemd1.manage-units",
interactive=false,
    registry=0x7f5fb0c0a890, error=0x7fffcebd4ad0) at
src/libsystemd/sd-bus/bus-util.c:374
#3  0x00007f5faee0aa00 in bus_verify_manage_unit_async
(m=0x7f5fb0c0a460, call=0x7f5fb0ca59a0,
    error=0x7fffcebd4ad0) at src/core/dbus.c:1196
#4  0x00007f5faee12feb in bus_unit_method_reset_failed (bus=0x7f5fb0ca32f0,
    message=0x7f5fb0ca59a0, userdata=0x7f5fb0cc7ff0, error=0x7fffcebd4ad0)
    at src/core/dbus-unit.c:496
#5  0x00007f5faee0c8aa in method_reset_failed_unit (bus=0x7f5fb0ca32f0,
message=0x7f5fb0ca59a0,
    userdata=0x7f5fb0c0a460, error=0x7fffcebd4ad0) at
src/core/dbus-manager.c:588

(gdb) p *c
$40 = {enclosing = 114 'r', need_offsets = true, index = 2, saved_index
= 2,
  signature = 0x7f5fb0ca3ec0 "bba{ss}", before = 0, begin = 0, end =
133, array_size = 0x0,
  offsets = 0x0, n_offsets = 0, offsets_allocated = 8391685410159683651,
offset_index = 0,
  item_size = 0, peeked_signature = 0x0}
(gdb) p contents
$41 = 0x7f5faef0d152 "bba{ss}"

And this will fail on:
if (c->signature[c->index] != SD_BUS_TYPE_STRUCT_BEGIN ||
and return -ENXIO.


Hope this can be helpful,
-j

> 
> g) leave gdb again with ^D
> 
> Don'd do much more than this at the same time. Since you stop
> execution of PID 1 a lot of things will be slow and potentially time
> ut while you run all this.
> 
> Thanks,
> 
> Lennart
> 



More information about the systemd-devel mailing list