[systemd-devel] Errorneous detection of degraded array

NeilBrown neilb at suse.com
Mon Jan 30 22:19:48 UTC 2017


On Mon, Jan 30 2017, Andrei Borzenkov wrote:

> On Mon, Jan 30, 2017 at 9:36 AM, NeilBrown <neilb at suse.com> wrote:
> ...
>>>>>>
>>>>>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
>>>>>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
>>>>>> systemd[1]: Starting Activate md array even though degraded...
>>>>>> systemd[1]: Stopped target Local File Systems.
>>>>>> systemd[1]: Stopping Local File Systems.
>>>>>> systemd[1]: Unmounting /share...
>>>>>> systemd[1]: Stopped (with error) /dev/md0.
>>>>
> ...
>>
>> The race is, I think, that one I mentioned.  If the md device is started
>> before udev tells systemd to start the timer, the Conflicts dependencies
>> goes the "wrong" way and stops the wrong thing.
>>
>
> From the logs provided it is unclear whether it is *timer* or
> *service*. If it is timer - I do not understand why it is started
> exactly 30 seconds after device apparently appears. This would match
> starting service.

My guess is that the timer is triggered immediately after the device is
started, but before it is mounted.
The Conflicts directive tries to stop the device, but is cannot stop the
device and there are no dependencies yet, so nothing happen.
After the timer fires (30 seconds later) the .service starts.  It also
has a Conflicts directory so systemd tried to stop the device again.
Now that it has been mounted, there is a dependences that can be
stopped, and the device gets unmounted.

>
> Yet another case where system logging is hopelessly unfriendly for
> troubleshooting :(
>
>> It would be nice to be able to reliably stop the timer when the device
>> starts, without risking having the device get stopped when the timer
>> starts, but I don't think we can reliably do that.
>>
>
> Well, let's wait until we can get some more information about what happens.
>
>> Changing the
>>   Conflicts=sys-devices-virtual-block-%i.device
>> lines to
>>   ConditionPathExists=/sys/devices/virtual/block/%i
>> might make the problem go away, without any negative consequences.
>>
>
> Ugly, but yes, may be this is the only way using current systemd.
>
>> The primary purpose of having the 'Conflicts' directives was so that
>> systemd wouldn't log
>>   Starting Activate md array even though degraded
>> after the array was successfully started.
>
> This looks like cosmetic problem. What will happen if last resort
> service is started when array is fully assembled? Will it do any harm?

Yes, it could be seen as cosmetic, but cosmetic issues can be important
too.  Confusing messages in logs can be harmful.

In all likely cases, running the last-resort service won't cause any
harm.
If, during the 30 seconds, the array is started, then deliberately
stopped, then partially assembled again, then when the last-resort
service finally starts it might do the wrong thing.
So it would be cleanest if the timer was killed as soon as the device
is started.  But I don't think there is a practical concern.

I guess I could make a udev rule that fires when the array started, and
that runs "systemctl stop mdadm-last-resort at md0.timer"

NeilBrown


>
>> Hopefully it won't do that when the Condition fails.
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20170131/8392bb35/attachment.sig>


More information about the systemd-devel mailing list