[Authentication] Delay in update of AD msDS-KeyVersionNumber after computer password change via "adcli join"

Chris Rutledge crutledge at renci.org
Fri Dec 18 08:27:40 PST 2015


Hello Stef,

That case was one of the first documented issues we came across at the beginning of this which looked like a match to our issue. I went back to that same issue yesterday and upon seeing that a change had been checked in for it, I did a git clone. 

It did not help in our case.

-Chris

-----Original Message-----
From: Authentication [mailto:authentication-bounces at lists.freedesktop.org] On Behalf Of Stef Walter
Sent: Friday, December 18, 2015 11:02 AM
To: Cross-desktop authentication and single sign-on
Subject: Re: [Authentication] Delay in update of AD msDS-KeyVersionNumber after computer password change via "adcli join"

On 18.12.2015 16:15, Chris Rutledge wrote:
> Hello,
> 
>  
> 
> About 2 months ago we started having issues when using adcli to join 
> our Windows AD domain. The symptom we first noticed was not being able 
> to log into our stateless HPC compute nodes and messages in the logs 
> stating the kvno was mismatched.
> 
>  
> 
> As our compute cluster nodes are stateless, every time they are 
> rebooted, they rejoin the domain upon boot via "adcli join".
> Historically this has worked great. We did discovered we could work 
> around the issue by repeating the adcli join command until we finally 
> received the latest kvno. The number of attempts would vary from node 
> to node – timing and luck I suspect.
> 
>  
> 
> Yesterday, I decided to download the latest version of adcli from 
> GitHub to debug.
> 
>  
> 
> Here is what I have found:
> 
>  
> 
> 1)      The adcli command is confirmed to connect to any one of the 3
> domain controllers and stay connected to that server throughout the session.
> 
> 2)      With unmodified versions of adcli, there was a very large chance
> we would get the old kvno value after the password change.
> 
> a.       We could see the server we are talking to does not yet have
> this changed value by observing the msDS-KeyVersionNumber via 
> ldapsearch against all 3 DCs.
> 
> 3)      Not until I entered a sleep(30) statement in adcli after the
> password update and before we retrieve the new kvno did things start 
> to work reliably.
> 
>  
> 
> I would understand the need to sleep if adcli would attempt to 
> retrieve the updated kvno value from any one of the 3 DCs. However, it 
> is my understanding that there is code in there to make sure we talk 
> to only the one and the expectation is once the password change has 
> been made the kvno value should reflect this – immediately.
> 
>  
> 
> Also, if we delete the computer object from the domain we get an error 
> the first time we attempt to join setting the password. I suspect the 
> same timing issue here…the computer object does not exist yet on this 
> server.
> 
>  
> 
> Based on my testing and observations, this smells like a performance 
> or configuration issue on the Windows AD side. Others are not so 
> convinced of this and think that perhaps adcli should delay between 
> operations for replication to complete.
> 
>  
> 
> So I figured I would ask the experts, should adcli delay this 
> retrieval of the new kvno or are we looking at an AD issues? If you 
> too suspect an issue with AD, any idea where to begin?

Does adcli 0.8.0 fix the issue?

http://lists.freedesktop.org/archives/authentication/2015-December/000321.html

It's pretty new, but if you have a chance to try it out. In particular:

https://bugs.freedesktop.org/show_bug.cgi?id=91185

Similar case seems to be described here:

https://bugs.freedesktop.org/show_bug.cgi?id=91185#c4

Stef
_______________________________________________
Authentication mailing list
Authentication at lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/authentication


More information about the Authentication mailing list