[Authentication] Delay in update of AD msDS-KeyVersionNumber after computer password change via "adcli join"
Stef Walter
stefw at gnome.org
Fri Dec 18 08:01:33 PST 2015
On 18.12.2015 16:15, Chris Rutledge wrote:
> Hello,
>
>
>
> About 2 months ago we started having issues when using adcli to join our
> Windows AD domain. The symptom we first noticed was not being able to
> log into our stateless HPC compute nodes and messages in the logs
> stating the kvno was mismatched.
>
>
>
> As our compute cluster nodes are stateless, every time they are
> rebooted, they rejoin the domain upon boot via "adcli join".
> Historically this has worked great. We did discovered we could work
> around the issue by repeating the adcli join command until we finally
> received the latest kvno. The number of attempts would vary from node to
> node – timing and luck I suspect.
>
>
>
> Yesterday, I decided to download the latest version of adcli from GitHub
> to debug.
>
>
>
> Here is what I have found:
>
>
>
> 1) The adcli command is confirmed to connect to any one of the 3
> domain controllers and stay connected to that server throughout the session.
>
> 2) With unmodified versions of adcli, there was a very large chance
> we would get the old kvno value after the password change.
>
> a. We could see the server we are talking to does not yet have
> this changed value by observing the msDS-KeyVersionNumber via ldapsearch
> against all 3 DCs.
>
> 3) Not until I entered a sleep(30) statement in adcli after the
> password update and before we retrieve the new kvno did things start to
> work reliably.
>
>
>
> I would understand the need to sleep if adcli would attempt to retrieve
> the updated kvno value from any one of the 3 DCs. However, it is my
> understanding that there is code in there to make sure we talk to only
> the one and the expectation is once the password change has been made
> the kvno value should reflect this – immediately.
>
>
>
> Also, if we delete the computer object from the domain we get an error
> the first time we attempt to join setting the password. I suspect the
> same timing issue here…the computer object does not exist yet on this
> server.
>
>
>
> Based on my testing and observations, this smells like a performance or
> configuration issue on the Windows AD side. Others are not so convinced
> of this and think that perhaps adcli should delay between operations for
> replication to complete.
>
>
>
> So I figured I would ask the experts, should adcli delay this retrieval
> of the new kvno or are we looking at an AD issues? If you too suspect an
> issue with AD, any idea where to begin?
Does adcli 0.8.0 fix the issue?
http://lists.freedesktop.org/archives/authentication/2015-December/000321.html
It's pretty new, but if you have a chance to try it out. In particular:
https://bugs.freedesktop.org/show_bug.cgi?id=91185
Similar case seems to be described here:
https://bugs.freedesktop.org/show_bug.cgi?id=91185#c4
Stef
More information about the Authentication
mailing list