[Authentication] Delay in update of AD msDS-KeyVersionNumber after computer password change via "adcli join"

Chris Rutledge crutledge at renci.org
Fri Dec 18 07:15:07 PST 2015


Hello,

About 2 months ago we started having issues when using adcli to join our Windows AD domain. The symptom we first noticed was not being able to log into our stateless HPC compute nodes and messages in the logs stating the kvno was mismatched.

As our compute cluster nodes are stateless, every time they are rebooted, they rejoin the domain upon boot via "adcli join". Historically this has worked great. We did discovered we could work around the issue by repeating the adcli join command until we finally received the latest kvno. The number of attempts would vary from node to node - timing and luck I suspect.

Yesterday, I decided to download the latest version of adcli from GitHub to debug.

Here is what I have found:


1)      The adcli command is confirmed to connect to any one of the 3 domain controllers and stay connected to that server throughout the session.

2)      With unmodified versions of adcli, there was a very large chance we would get the old kvno value after the password change.

a.       We could see the server we are talking to does not yet have this changed value by observing the msDS-KeyVersionNumber via ldapsearch against all 3 DCs.

3)      Not until I entered a sleep(30) statement in adcli after the password update and before we retrieve the new kvno did things start to work reliably.

I would understand the need to sleep if adcli would attempt to retrieve the updated kvno value from any one of the 3 DCs. However, it is my understanding that there is code in there to make sure we talk to only the one and the expectation is once the password change has been made the kvno value should reflect this - immediately.

Also, if we delete the computer object from the domain we get an error the first time we attempt to join setting the password. I suspect the same timing issue here...the computer object does not exist yet on this server.

Based on my testing and observations, this smells like a performance or configuration issue on the Windows AD side. Others are not so convinced of this and think that perhaps adcli should delay between operations for replication to complete.

So I figured I would ask the experts, should adcli delay this retrieval of the new kvno or are we looking at an AD issues? If you too suspect an issue with AD, any idea where to begin?


Thanks in advance,
Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/authentication/attachments/20151218/04c48f84/attachment.html>


More information about the Authentication mailing list