If you use, Active Directory (AD), it is probably the most important system you've got. Without it, users can't login, they probably can't browse the web, machines can't communicate and finance won't be able to generate their latest report. Be sure you keep AD in tip top shape with an Active Directory health check script!
In this article, you'll learn how to gather information from AD with PowerShell. This will help you recognize, report on and ultimately fix any issues related to your AD environment.
This is Part I of a two-part series. In this part, you'll learn what to check and how to build the individual tests that will ultimately go into an Active Directory health check script. If you'd prefer to download the completed script now and learn how to build reports, feel free to check out Building an Active Directory Health Check Tool [In-Depth]: Part II.
Active Directory Health is an Extensive Topic
There's a lot to the term "Active Directory health" and as such, a lot of ways to validate it's operational safety. It's not uncommon for organizations to only monitor one or even none of the AD health attributes in the table below. But what these organizations don't know is that every one of them is critical!
|Replication||This is how the domain controllers synchronize their data. Without it users, computers, DNS and many more attributes would be out of sync.|
|Connection||Having a good connection and the right ports open is important for AD and it’s clients to function properly.|
|Event Log||Errors and signs of problems are sometimes shown in event logs.|
|DNS||DNS is vital to Active Directory for service discovery and communication.|
|Duplicate attributes||While seeing duplicate attributes in your database won’t harm AD, this might create some trouble with users that can’t log in or services that you can’t connect to.|
You will learn how to perform various checks for these categories in this article.
Build Routines; Don't Just Run Scripts
There are a lot of checks to do when it comes to AD health. You'll probably never never catch them all. Using the material you'll learn in this post, you can catch most of the errors that occur. But the tactics you're going to learn in this article are useless if you don’t have routines to back them in case of an AD failure or security breach.
If you want a reliable and resilient Active Directory environment, first ask yourself these questions:
- When was the last time I did a backup test?
- When was the last time I performed a full forest recovery in a disaster recovery (DR) environment?
- Is the hardware separated from the AD service so that if the hypervisor goes down AD still operates?
- Are the site links balanced so that the other sites can handle it if the main site goes down?
- Can the domain controllers at the other sites handle the extra load?
- Do I have a Windows Server backup of your domain controllers (the only backup solution supported by Microsoft)?
If y0u answered no to many of these, you should speak to your Microsoft rep (or Microsoft partners) about a program called ADRAP (Proactive AD Maintenance) and ADRES (Active Directory Disaster Recovery Training).
The ADRAP will do a painstakingly elaborate inspection of your AD health, security and routines revolving around it. ADRES is for learning and establishing routines around disaster recovery of Active Directory in case of a meltdown or breach.
If you're in a large organization, these two programs can prove invaluable.
Prerequisites to Follow Along
In this article, you're going to learn how to build Active Directory health checks using PowerShell and a few other tools. Before you get too deep and would like to follow along, make sure you have the following prerequisites met.
- One or more 2016 Active Directory domain controllers (DCs) (may work with older but not tested)
- The account running the script will need to be a member of the Domain Admins group or equivalent.
- At least WSMan and RPC ports opened on DCs. If you're unsure, check out this article for testing RPC ports with PowerShell and use the
Test-WSManPowerShell command to test WSMan remote capabilities.
Running DcDiag Tests (with a PowerShell Boost)
Dcdiag or (domain controller diagnostics) is the Microsoft-approved way of validating Active Directory services. It's installed by default on all servers with the Active Directory Domain Services role and on Windows 10 computers with Remote Server Administration Tools (RSAT) package installed.
Admins have used dcdiag for a long time but it has one big drawback; it creates a report and it doesn't signal automatically if something is wrong.
Just to provide an idea of how extensive dcdiag is, check out each set of tests it can run and a brief explanation below.
|Advertising||Validates that the function that locates the DC works properly and that the DC can properly announce itself back.|
|CheckSDRefDom||Validates that the SDReferenceDomain attribute in the partition cross reference object contains the right domain names.|
|CheckSecurityError||Performs many different tests. This set of tests is not performed by default. This suites of tests ensures at least one KDC is online, UDP packages do not fragment, a DC's computer account exists and that it contains the correct attributes and minimum SPN configuration, a DC's computer object is replicated correctly and no replication or KCC errors have occurred for connected partners|
|Connectivity||Tests connectivity to a DC's services. Always performed before every test.|
|CrossRefValidation||Validates that CnName, dnsRoot and NetBiosName in naming contexts are correct.|
|CutOffServers||Ensures that all the DCs have working connection objects for replication.|
|DcPromo||Tests the possibility to promote a new DC. Not executed by default.|
|DNS||A battery of tests to validate that the DNS service is working properly.|
|SysVolCheck||Reads the SysvolReady registry key to validate that SYSVOL is advertised.|
|LocatorCheck||Validates that the DCLocator method advertises the five capabilities that a domain must contain and does so correctly. These capabilities are global catalog, PDCEmulator, time server, preferred time server and the KDC.|
|Intersite||Checks for conditions that might prevent inter-site AD replication.|
|KccEvent||Checks for KCC errors that have occured within the last 15 minutes.|
|KnowsOfRoleHolders||Tests for the advertisement of FSMO role holders.|
|MachineAccount||Validates a DC's machine account OU, UAC, ServerReference and SPN.|
|NCSecDesc||Validates permissions on all naming contexts.|
|ObjectsReplicated||Validates that key objects in AD are up to date.|
|OutboundSecureChannels||Tests external trusts, not run by default.|
|Replications||Checks all AD replication objects for errors and ensures they aren't disabled.|
|RidManager||Validates the RID Master FSMO role can be contacted and has valid RID pool values.|
|Services||Validates that critical services are working correctly.|
|SystemLog||Checks for errors in a DC's Windows System event log that has occured in the last 60 minutes.|
|Topology||Validates that AD replication topology is fully connected.|
|VerifyEnterpriseReferences||Validates all DCs' computer reference attributes.|
|VerifyReferences||Validates a DC's computer reference attributes.|
|VerifyReplicas||Verifies that the specified DC contains the application partitions that it should have.|
For a complete reference to dcdiag, be sure to check out this blog post on Technet from the Microsoft Directory Services team. This article gathers in-depth knowledge from the developers since many of its functions can be quite cryptic.
Using DCDIAG with PowerShell
Dcdiag, although a handy utility, has a major drawback - the output. The output is old school in that it returns a loose string that's not easily parseable. To incorporate dcdiag into a large PowerShell AD health check script, you need to transform that output into a PowerShell object.
Parsing and using dcdiag with Powershell is an easy way to convert the dcdiag result to an object that you can then send to reports, monitoring systems, test frameworks and so on.
The key to marrying PowerShell and dcdiag is running each of the dcdiag tests separately with the
/test:<testname> argument. By separating out tests like this, it's much easier to distinguish a failed test from a passed one.
Note that dcdiag performs tests querying a DC's event log. These tests return a different structure of output and should be parsed with other techniques that you will learn a littler later.
Setting up a Custom
Test-AdHcDcDiag PowerShell Function
To save you some time building your own dcdiag parsing script, you can use a custom-built PowerShell function called
Test-AdHcDcDiag. You can download this function via a GitHub Gist here
Test-AdhcDcDiag function executes all dcdiag tests except for DFSREvent, FRSEvent and SystemLog tests. Some of the tests that dcdiag does not perform by default are available via parameters that should work in all AD environments.
For this article, I'll be assuming you have the
Test-DcHcDcDiagfunction in a file called Test-AdhcDcDiag.ps1 on one of your DCs.
You can make this function available in a PowerShell console by dot-sourcing as shown below.
If you'd rather not dot source the PowerShell script, you can copy and paste the code directly into a PowerShell console for the same effect.
Test-AdHcDcDiag PowerShell Function
Once the function is available, you can run it without parameters. You should then see an output like below. This assumes you're like to execute dcdiag (the utility the function calls behind the scenes) on a domain controller.
But it'd be too much trouble to run the function on call DCs manually. To prevent this, you can either supply other DCs by using the
ComputerName parameter or supplying it through pipeline as shown below.
Test-AdHcDcDiag function also allows you to run specific tests by explicitly specifying them via the
Tests parameter or excluding them via the
ExcludeTests parameter. An example of this syntax is shown below.
Testing Windows Event Logs for Errors
Looking in the event log for errors is an important, proactive step for early discovery of potential problems. Dcdiag includes tests that query the Windows event logs but they are noisy and will report false positives such as if DFSR is paused for backup.
Building your own PowerShell function that can filter out the noise and will save headaches from dealing with false positives.
Note that it's better to ship domain controller logs to a third-party logging solution that can produce a near instant alarm. After all, some of these events can be quite serious and require you to act quickly. One example is the ESENT event 508 error - the faint sound a SAN makes when it can't failover and decides to take a couple of DCs with it.
Common Errors in the DFS Replication Event Log
The DFSR service on every DC is responsible for replicating SYSVOL folders that contain group policies. One indicator of an unhealthy DFSR is the existence of error events in the DFS Replication event log.
Any event log error is something to look at but just because an error exists doesn't mean that it's serious. For example, event id 5014 is a totally normal error but only if error ID 9036 (paused for backup) is defined in the event.
Ddciag does not check for the error ID 9036 in the event so it creates noise and will always report failure if done at the wrong moments.
The most reliable way of finding tracking down this specific error is with PowerShell. You can use the
Get-WinEvent cmdlet to read the DFS Replication log and filter out any events with ID 5014 with an error ID of 9036 in the event's seventh replacement string. You can see an example of this below.
Filtering Out Common False Positives in the System Event Log
Searching the system event log for errors is almost identical to searching the DFS Replication event log. The major difference is that there's simply more to filter out depending on your environment.
If an error event is registered in the system event log, there are a few caveat you may want to filter out to prevent false positives in your report.
- Netlogon event IDs 5722,5723 and 5805 (deactivated or removed computers that are trying to contact the domain)
- KDC event ID 16 (if DES encryption is disabled)
- KDC event ID 11 (duplicate service principal name)
- Common safe-to-ignore DCOM event that occur when a Microsoft component tries to access DCOM components without proper permissions.
You can account for each of these false positives by creating a PowerShell scriptblock excluding each scenario. You can see below an example of querying the system event log for errors and excluding the common scenarios just referenced.
Checking for Duplicate AD Attributes
Duplicate AD attributes, although not necessarily a problem, can turn into a nightmare under the right circumstances. For example, duplicate UPNs, mail or ProxyAddresses attributes will cause errors with Azure AD, Office 365 and general authentication issues. Duplicate attributes can cause the synchronization of that user's AD attributes to fail.
Duplicate SPNs can cause even worse problems with Kerberos and authentication-related errors. For example, when a user attempts to authenticate using Kerberos referencing a specific SPN, authentication will fail if duplicate SPNs exist. The attempt will fail because Kerberos only expects one SPN.
Speeding up Duplicate AD Attribute Discovery
Using PowerShell, there are a few different ways to search for duplicate AD attributes. You can query AD using cmdlets like
Get-ADObject and then use the
Group-Object cmdlet to group attributes together but there are better ways to do it. Instead, you can use a hashtable to speed up your search.
To use this method:
- Get all attributes in an array
- Define a hashtable with no keys
- Populate the hashtable keys with the attribute names with the number of instances that exist
- Find all hashtable values that are greater than one
Below is the example PowerShell syntax.
By simply changing the value of the
$adAttributeName variable you can quickly track down duplicate any duplicate AD attribute.
Common Problematic AD Attribute Duplicates
There are a few attributes that are known to cause problems:
- User principal names (UPNs) - can cause overall authentication errors with AD, ADFS, certificates and more)
- Service principal names (SPNs) - can cause Kerberos authentication errors
- mail attributes - can cause the problems mentioned earlier in Office 365 and Exchange
- proxyAddresses attributes - can create problems with Office 365 or Exchange since it will work for the user originally having the ProxyAddress but the other user that gets the same ProxyAddress won’t get the update
Using the search technique described in the previous section, you can plug these attributes into the script below to find duplicate values for all of them.
Bloated User Kerberos Tokens
Sometimes an AD user account's Kerberos token becomes larger than the maximum token size. This is called a "bloated Kerberos token". A Kerberos token is a combination of each group a user account is under. These groups can be directly assigned or inherited from other groups.
The issue of bloated user Kerberos tokens usually starts with older versions of Windows because the MaxTokenSize registry attribute (the maximum size a kerberos token can be) has grown every version. You'll find that this is a common problem in larger environments.
Finding the Kerberos Maximum Token Size
To find the maximum token size in your environment, you can query the registry on any DC for the MaxTokenSize value. Below you can see an example of doing this via PowerShell.
Finding Account Token Size Across User Accounts
You can use PowerShell to find the Kerberos token size associated with a user account, but it's not a straightforward process. To do this, you can only estimate the token size.
WARNING: The process to find the token size usually takes eight to ten seconds per user account. During this time, it also puts the DC being queried under a heavy load due to the use of the LDAP_MATCHING_RULE_IN_CHAIN LDAP filter. This process attempts to recursively get a user's nested group membership.
Below you can see a PowerShell snippet that will find user account token sizes for all members of the Domain Admins group. Change the value of
$groupName if you'd like to run this snippet on another group.
The example below calculates the approximate token size for each user and outputs everyone that has a calculated token size of over 48000.
Remedying Bloated Kerberos Tokens
There a few different ways you can proactively stop over-sized Kerberos tokens:
- Removing users from groups
- Practicing a well-designed group nesting strategy
- Add user accounts to Domain Local groups as Domain Local groups allocate a smaller portion of the token
- Increase the Kerberos token size
Oversized or bloated Kerberos tokens can be a pain to track down. It's rarely clear issues you see are directly correlated to this problem. However, you should now have some code to track down this issue and fix them proactively.
Finding Roaming Clients
Missing subnets can cause many problem for the DCLocator service on domain controllers. A missing subnet can cause clients to select the wrong DCs, DFS shares and anything else that relies on AD sites.
When a subnet is missing, clients will "roam" meaning they can't find a site to associate with. When a computer cannot find a site, AD will generate a message in a DC's netlogon.log file.
If a client can't find a site (roaming), it will generate a log message with the line NO_CLIENT_SITE. You could manually search each netlogon.log file on each DC or you could use the below PowerShell snippet to find them. This snippet outputs all the events logged in the netlogon.log file that contains the string NO_CLIENT_SITE: followed by computer name and IP address.
You would run this on each DC.
For a deeper dive into this subject including a PowerShell script to read all netlogon.log files across all DCs, check out the Active Directory computers with no site ATA blog post.
Group policy or GPOs is a large part of Active Directory and how we configure domain-joined computers. Group policy stores files in the SYSVOL share of all DCss and SYSVOL is replicated with DFSR. Because of this, you've already learned a little about group policy health. You've already learned how to report on SYSVOL replication issues.
GPO health isn't just limited to the SYSVOL DC shares. There are a few other AD health checks you can run to ensure group policy is in tip top shape like unlinked GPOs and passwords stored in GPOs.
Checking for Unlinked GPOs
If a GPO is not linked to an organizational unit (OU), it has no effect. It's just taking up room in your AD database. If a GPO isn't linked and it's not being used as a template, it's recommended to remove it.
Below you will find a PowerShell script that queries AD for all GPOs in XML format and finds all GPOs that are not linked to an OU
Finding GPOs Containing Decryptable Passwords
I consider AD health more than just break/fix issues and performance. I also consider AD health to be involved with AD security. GPOs containing decryptable passwords is a major security risk and should be part of your AD health check script.
Unfortunately, Group Policy Preferences containing passwords have been decryptable since the key was leaked years ago. Some people still aren't aware of this and are a major risk of exposing sensitive passwords.
GPOs are stored in the XML format. You can notice this by the command used in the last section. You can define passwords in a GPO preference which AD will then store in that GPO's XML file.
When a password is stored in a GPO preference, it's stored in an attribute called CPassword. Unfortunately for the world, this is a security risk but fortunately for you, you can use PowerShell to read each GPO's XML file and determine if the
CPassword XML attribute is being used.
Below you will find a modified version of a PowerShell script from TechNet. When run on a domain controller, this script reads all GPO XML files in the SYSVOL folder and searches for the
CPassword XML attribute with regular expressions.
In Part I of this series, you learned some components of AD to monitor and how to query those components with PowerShell. If you'd like to continue reading how to report on and monitor the Active Directory health check tests you learned about in this article, head over to Part II.
Subscribe to Adam the Automator
Get the latest posts delivered right to your inbox