Building an Active Directory Health Check Tool: Part I

Alex Asplund

Read more posts by this author.

Get this post and others like it in eBook form!

If you use, Active Directory (AD), it is probably the most important system you’ve got. Without it, users can’t log in, they probably can’t browse the web, machines can’t communicate and finance won’t be able to generate their latest report. Be sure you can test Active Directory with a health check script!

Discover, report and prevent insecure Active Directory account passwords in your environment with Specops’ completely free Password Auditor Pro. Download it today!

In this article, you’ll learn how to gather information from AD with PowerShell. This will help you recognize, report on and ultimately fix any issues related to your AD environment.

This is Part I of a two-part series. In this part, you’ll learn what to check and how to build the individual tests that will ultimately go into an Active Directory health check script. If you’d prefer to download the completed script now and learn how to build reports, feel free to check out Building an Active Directory Health Check Tool [In-Depth]: Part II.

Active Directory Health is an Extensive Topic

There’s a lot to the term “Active Directory health” and as such, a lot of ways to validate it’s operational safety. It’s not uncommon for organizations to only monitor one or even none of the AD health attributes in the table below. But what these organizations don’t know is that every one of them is critical!

Name Description
Replication This is how the domain controllers synchronize their data. Without it users, computers, DNS and many more attributes would be out of sync.
Connection Having a good connection and the right ports open is important for AD and it’s clients to function properly.
Event Log Errors and signs of problems are sometimes shown in event logs.
DNS Testing DNS and a DNS server is vital to Active Directory for service discovery and communication.
Duplicate attributes While seeing duplicate attributes in your database won’t harm AD, this might create some trouble with users that can’t log in or services that you can’t connect to.

You will learn how to perform various checks for these categories in this article.

Active Directory Health Check with Routines

There are a lot of checks to do when it comes to AD health. You’ll probably never catch them all. Using the material you’ll learn in this post, you can catch most of the errors that occur. But the tactics you’re going to learn in this article are useless if you don’t have routines to back them in case of an AD failure or security breach.

If you want a reliable and resilient Active Directory environment, first ask yourself these questions:

  • When was the last time I did a backup test?
  • When was the last time I performed a full forest recovery in a disaster recovery (DR) environment?
  • Is the hardware separated from the AD service so that if the hypervisor goes down AD still operates?
  • Are the site links balanced so that the other sites can handle it if the main site goes down?
  • Can the domain controllers at the other sites handle the extra load?
  • Do I have a Windows Server 2003 or later backup of your domain controllers (the only backup solution supported by Microsoft)?

If y0u answered no to many of these, you should speak to your Microsoft rep (or Microsoft partners) about a program called ADRAP (Proactive AD Maintenance) and ADRES (Active Directory Disaster Recovery Training).

The ADRAP will do a painstakingly elaborate inspection of your AD health, security and routines revolving around it. ADRES is for learning and establishing routines around disaster recovery of Active Directory in case of a meltdown or breach.

If you’re in a large organization, these two programs can prove invaluable.

Prerequisites to Follow Along

In this article, you’re going to learn how to build Active Directory health checks using PowerShell and a few other tools. Before you get too deep and would like to follow along, make sure you have the following prerequisites met.

  • One or more 2016 Active Directory domain controllers (DCs) (may work with older but not tested)
  • The account running the script will need to be a member of the Domain Admins group or equivalent.
  • At least WSMan and RPC ports opened on DCs. If you’re unsure, check out this article for testing RPC ports with PowerShell and use the Test-WSMan PowerShell command to test WSMan remote capabilities.

Running DcDiag Command Tests (with a PowerShell Boost)

Dcdiag or (domain controller diagnostics) is the Microsoft-approved way of validating Active Directory services. It’s installed by default on all servers with the Active Directory Domain Services role and on Microsoft Windows 10 computers with Remote Server Administration Tools (RSAT) package installed.

Related: Installing the Active Directory Module

Admins have used dcdiag for a long time but it has one big drawback; it creates a report and it doesn’t signal automatically if something is wrong.

Just to provide an idea of how extensive dcdiag is, check out each set of tests it can run and a brief explanation below.

Test Description
Advertising Validates that the function that locates the DC works properly and that the DC can properly announce itself back.
CheckSDRefDom Validates that the SDReferenceDomain attribute in the partition cross reference object contains the right domain names.
CheckSecurityError Performs many different tests. This set of tests is not performed by default. This suites of tests ensures at least one KDC is online, UDP packages do not fragment, a DC’s computer account exists and that it contains the correct attributes and minimum SPN configuration, a DC’s computer object is replicated correctly and no replication or KCC errors have occurred for connected partners
Connectivity Tests connectivity to a DC’s services. Always performed before every test.
CrossRefValidation Validates that CnName, dnsRoot and NetBiosName in naming contexts are correct.
CutOffServers Ensures that all the DCs have working connection objects for replication.
DcPromo Tests the possibility to promote a new DC. Not executed by default.
DNS A battery of tests to validate that the DNS service is working properly.
SysVolCheck Reads the SysvolReady registry key to validate that SYSVOL is advertised.
LocatorCheck Validates that the DCLocator method advertises the five capabilities that a domain must contain and does so correctly. These capabilities are global catalog, PDCEmulator, time server, preferred time server and the KDC.
Intersite Checks for conditions that might prevent inter-site AD replication.
KccEvent Checks for KCC errors that have occured within the last 15 minutes.
KnowsOfRoleHolders Tests for the advertisement of FSMO role holders.
MachineAccount Validates a DC’s machine account OU, UAC, ServerReference and SPN.
NCSecDesc Validates permissions on all naming contexts.
ObjectsReplicated Validates that key objects in AD are up to date.
OutboundSecureChannels Tests external trusts, not run by default.
Replications Checks all AD replication objects for errors and ensures they aren’t disabled.
RidManager Validates the RID Master FSMO role can be contacted and has valid RID pool values.
Services Validates that critical services are working correctly.
SystemLog Checks for errors in a DC’s Windows System event log that has occured in the last 60 minutes.
Topology Validates that AD replication topology is fully connected.
VerifyEnterpriseReferences Validates all DCs’ computer reference attributes.
VerifyReferences Validates a DC’s computer reference attributes.
VerifyReplicas Verifies that the specified DC contains the application partitions that it should have.

For a complete reference to dcdiag, be sure to check out this blog post on Technet from the Microsoft Directory Services team. This article gathers in-depth knowledge from the developers since many of its functions can be quite cryptic.

Using DCDIAG to Test Active Directory with PowerShell

Dcdiag, although a handy utility and is a great addition to any Active Directory health check, has a major drawback – the output. The output is old school in that it returns a loose string that’s not easily parseable. To incorporate dcdiag into a large PowerShell AD health check script, you need to transform that output into a PowerShell object.

Parsing and using dcdiag with Powershell is an easy way to convert the dcdiag result to an object that you can then send to reports, monitoring systems, test frameworks and so on.

The key to marrying PowerShell and dcdiag is running each of the dcdiag tests separately with the /test:<testname> argument. By separating out tests like this, it’s much easier to distinguish a failed test from a passed one.

Note that dcdiag performs tests querying a DC’s event log. These tests return a different structure of output and should be parsed with other techniques that you will learn a littler later.

Setting up a Custom Test-AdHcDcDiag PowerShell Function

To save you some time building your own dcdiag parsing script, you can use a custom-built PowerShell function called Test-AdHcDcDiag. You can download this function via a GitHub Gist here

The Test-AdhcDcDiag function executes all dcdiag tests except for DFSREvent, FRSEvent and SystemLog tests. Some of the tests that dcdiag does not perform by default are available via parameters that should work in all AD environments.

For this article, I’ll be assuming you have the Test-DcHcDcDiag function in a file called Test-AdhcDcDiag.ps1 on one of your DCs.

You can make this function available in a PowerShell console by dot-sourcing as shown below.

PS51> . C:\Path\To\Test-AdhcDcDiag.ps1

If you’d rather not dot source the PowerShell script, you can copy and paste the code directly into a PowerShell console for the same effect.

Running the Test-AdHcDcDiag PowerShell Function

Once the function is available, you can run it without parameters. You should then see an output like below. This assumes you’re like to execute dcdiag (the utility the function calls behind the scenes) on a domain controller.

PS51> Test-AdhcDCDiag

Source      : dc01
TestName    : DCDiagCheckSecurityError
Pass        : True
Was         : {0, 0}
ShouldBe    : {0, 0}
Category    : DCDIAG
SubCategory : CheckSecurityError
Message     :
Data        : {........................ Test passed}
Tags        : {DCDIAG, CheckSecurityError}

But it’d be too much trouble to run the function on call DCs manually. To prevent this, you can either supply other DCs by using the ComputerName parameter or supplying it through pipeline as shown below.

PS51> Test-AdhcDCDiag -ComputerName dc02
PS51> "DC01","DC02" | Test-AdhcDCDiag

The Test-AdHcDcDiag function in your Active Directory health check also allows you to run specific tests by explicitly specifying them via the Tests parameter or excluding them via the ExcludeTests parameter. An example of this syntax is shown below.

# Only running
PS51> Test-AdhcDCDiag -Tests DCPromo,RegisterInDNS

# Excluding
PS51> Test-AdhcDCDiag -ExcludeTests DCPromo

Testing Windows Event Logs for Errors

Looking in the event log for errors is an important, proactive step for early discovery of potential problems. Dcdiag includes tests that query the Windows event logs but they are noisy and will report false positives such as if DFSR is paused for backup.

Building your own PowerShell function that can filter out the noise and will save headaches from dealing with false positives while testing Active Directory.

Note that it’s better to ship domain controller logs to a third-party logging solution that can produce a near instant alarm. After all, some of these events can be quite serious and require you to act quickly. One example is the ESENT event 508 error – the faint sound a SAN makes when it can’t failover and decides to take a couple of DCs with it.

Common Errors in the DFS Replication Event Log

The DFSR service on every DC is responsible for replicating SYSVOL folders that contain group policies. One indicator of an unhealthy DFSR is the existence of error events in the DFS Replication event log.

Any event log error is something to look at but just because an error exists doesn’t mean that it’s serious. For example, event id 5014 is a totally normal error but only if error ID 9036 (paused for backup) is defined in the event.

Ddciag does not check for the error ID 9036 in the event so it creates noise and will always report failure if done at the wrong moments.

The most reliable way of finding tracking down this specific error is with PowerShell. You can use the Get-EventLog or Get-WinEvent cmdlet to read the DFS Replication log and filter out any events with ID 5014 with an error ID of 9036 in the event’s seventh replacement string. You can see an example of this below.

PS51> $Events = Get-EventLog -LogName "DFS Replication" -EntryType Error -After (get-date).AddDays(-1)
PS51> $Events | Where-Object {$_.EventId -ne 5014 -and $_.ReplacementStrings[6] -ne 9036}

Filtering Out Common False Positives in the System Event Log

Searching the system event log for errors is a critical part of testing Active Directory and is almost identical to searching the DFS Replication event log. The major difference is that there’s simply more to filter out depending on your environment.

If an error event is registered in the system event log, there are a few caveat you may want to filter out to prevent false positives in your report.

  • Netlogon event IDs 5722,5723 and 5805 (deactivated or removed computers that are trying to contact the domain)
  • KDC event ID 16 (if DES encryption is disabled)
  • KDC event ID 11 (duplicate service principal name)
  • Common safe-to-ignore DCOM event that occur when a Microsoft component tries to access DCOM components without proper permissions.

You can account for each of these false positives by creating a PowerShell scriptblock excluding each scenario. You can see below an example of querying the system event log for errors and excluding the common scenarios just referenced.

$Filter = {
    # Filter out computers unable to contact domain because they're removed or disabled
    !($_.Source -eq 'NetLogon' -and $_.eventid -in @(5805,5723, 5722)) -and

    # Filter TGS/TGT events
    !($_.Source -eq 'KDC' -and $_.EventId -in @(16,11)) -and

    # Filter out DCOM errors
    !($_.Source -eq "DCOM" -and $_.EventId -eq 10016)

## Assuming this is run on a DC, it queries the system event log for errors excluding all common false positive situations
Get-EventLog -LogName "System" -EntryType Error | Where-Object $Filter

Checking for Duplicate AD Attributes

Duplicate AD attributes, although not necessarily a problem, can turn into a nightmare under the right circumstances. For example, duplicate UPNs, mail or ProxyAddresses attributes will cause errors with Azure AD, Office 365 and general authentication issues. Duplicate attributes can cause the synchronization of that user’s AD attributes to fail.

Duplicate SPNs can cause even worse problems with Kerberos and authentication-related errors if you don’t properly test Active Directory. For example, when a user attempts to authenticate using Kerberos referencing a specific SPN, authentication will fail if duplicate SPNs exist. The attempt will fail because Kerberos only expects one SPN.

Speeding up Duplicate AD Attribute Discovery

Using PowerShell, there are a few different ways to search for duplicate AD attributes. You can query AD using cmdlets like Get-ADObject and then use the Group-Object cmdlet to group attributes together but there are better ways to do it. Instead, you can use a hashtable to speed up your search.

To use this method:

  • Get all attributes in an array
  • Define a hashtable with no keys
  • Populate the hashtable keys with the attribute names with the number of instances that exist
  • Find all hashtable values that are greater than one

Below is the example PowerShell syntax.

$adAttributeName = 'AttributeNameHere'
$adAttributes = (Get-AdObject -LDAPFilter "$adAttributeName=*" -Properties $adAttributeName).$adAttributeName

$hashtable = @{}

$adAttributes.foreach({ $hashtable[$_]++ })

$hashtable.GetEnumerator().where({ $_.Value -gt 1 }).Key

By simply changing the value of the $adAttributeName variable you can quickly track down duplicate any duplicate AD attribute.

Common Problematic AD Attribute Duplicates

There are a few attributes that are known to cause problems:

  • User principal names (UPNs) – can cause overall authentication errors with AD, ADFS, certificates and more)
  • Service principal names (SPNs) – can cause Kerberos authentication errors
  • mail attributes – can cause the problems mentioned earlier in Office 365 and Exchange
  • proxyAddresses attributes – can create problems with Office 365 or Exchange since it will work for the user originally having the ProxyAddress but the other user that gets the same ProxyAddress won’t get the update

Using the search technique described in the previous section, you can plug these attributes into the script below to find duplicate values for all of them.

$adAttributeNames = 'UserPrincipalName','ServicePrincipalName','mail','ProxyAddresses'

foreach ($adAttrib in $adAttributeNames) {
	$adAttributes = (Get-AdObject -LDAPFilter "$adAttrib=*" -Properties $adAttrib).$adAttrib

	$hashtable = @{}

	$adAttributes.foreach({ $hashtable[$_]++ })

	$hashtable.GetEnumerator().where({ $_.Value -gt 1 }).Key

Bloated User Kerberos Tokens

Sometimes an AD user account’s Kerberos token becomes larger than the maximum token size. This is called a “bloated Kerberos token”. A Kerberos token is a combination of each group a user account is under. These groups can be directly assigned or inherited from other groups.

The issue of bloated user Kerberos tokens usually starts with older versions of Windows because the MaxTokenSize registry attribute (the maximum size a kerberos token can be) has grown every version. You’ll find that this is a common problem in larger environments.

Finding the Kerberos Maximum Token Size

To find the maximum token size in your environment, you can query the registry on any DC for the MaxTokenSize value. Below you can see an example of doing this via PowerShell.

PS51> Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters -Name MaxTokenSize

Finding Account Token Size Across User Accounts

You can use PowerShell to find the Kerberos token size associated with a user account, but it’s not a straightforward process. To do this, you can only estimate the token size.

WARNING: The process to find the token size usually takes eight to ten seconds per user account. During this time, it also puts the DC being queried under a heavy load due to the use of the LDAP_MATCHING_RULE_IN_CHAIN LDAP filter. This process attempts to recursively get a user’s nested group membership.

Below you can see a PowerShell snippet that will find user account token sizes for all members of the Domain Admins group. Change the value of $groupName if you’d like to run this snippet on another group.

The example below calculates the approximate token size for each user and outputs everyone that has a calculated token size of over 48000.

## The group name to find user accounts in
$groupName = 'Domain Admins'

## Find distinguished names of all accounts in the group
$UserDNs = (Get-ADGroupMember $groupName).DistinguishedName

## Define an array to store token sizes in
$TokenSizes = @()

Foreach($UserDN in $UserDNs) {
    # Get all nested groups using LDAP_MATCHING_RULE_IN_CHAIN (1.2.840.113556.1.4.1941)
    $Groups = Get-ADGroup -LDAPFilter "(member:1.2.840.113556.1.4.1941:=$UserDN)" -Properties sIDHistory

	## Create an object to output
    $Object = [PSCustomObject]@{
        DistinguishedName = $UserDN
        UserTokenSize = 1200
    ## Process each group the user is a part of
    foreach ($Group in $Groups){
        if ($Group.SIDHistory.Count -ge 1){
            # Groups with sidhistory always counts as +40
            $Object.TokenSize = 40
            'Global' {$Object.UserTokenSize+=8}
            'Universal' {$Object.UserTokenSize+=8}
            'DomainLocal' {$Object.UserTokenSize+=40}
    $TokenSizes += $Object

# Restrict token sizes output to any greater than 48000. The max default token size for 2012R2 is 48000
$TokenSizes | Where-Object {$_.UserTokenSize -gt 48000}

Remedying Bloated Kerberos Tokens

There a few different ways you can proactively stop over-sized Kerberos tokens:

Oversized or bloated Kerberos tokens can be a pain to track down. It’s rarely clear issues you see are directly correlated to this problem. However, you should now have some code to track down this issue and fix them proactively.

Finding Roaming Clients

Missing subnets can cause many problem for the DCLocator service on domain controllers. A missing subnet can cause clients to select the wrong DCs, DFS shares and anything else that relies on AD sites.

When a subnet is missing, clients will “roam” meaning they can’t find a site to associate with. When a computer cannot find a site, AD will generate a message in a DC’s netlogon.log file.

If a client can’t find a site (roaming), it will generate a log message with the line NO_CLIENT_SITE. You could manually search each netlogon.log file on each DC or you could use the below PowerShell snippet to find them. This snippet outputs all the events logged in the netlogon.log file that contains the string NO_CLIENT_SITE: followed by computer name and IP address.

You would run this on each DC.

$NetLogonLog = Import-Csv "$env:SystemRoot\Debug\netlogon.log" -Delimiter " " -Header Date,Time,Pid,Domain,Message,ComputerName,IpAddress
$NoClientSite = $NetlogonLog | Where-Object Message -eq "NO_CLIENT_SITE:" | Select ComputerName,IpAddress

For a deeper dive into this subject including a PowerShell script to read all netlogon.log files across all DCs, check out the Active Directory computers with no site ATA blog post.

Group Policy

Group policy or GPOs is a large part of Active Directory and how we configure domain-joined computers. Group policy stores files in the SYSVOL share of all DCss and SYSVOL is replicated with DFSR. Because of this, you’ve already learned a little about group policy health. You’ve already learned how to report on SYSVOL replication issues.

GPO health isn’t just limited to the SYSVOL DC shares. While testing Active Directory, there are a few other AD health checks you can run to ensure group policy is in tip-top shape like unlinked GPOs and passwords stored in GPOs.

Checking for Unlinked GPOs

If a GPO is not linked to an organizational unit (OU), it has no effect. It’s just taking up room in your AD database. If a GPO isn’t linked and it’s not being used as a template, it’s recommended to remove it.

Below you will find a PowerShell script that queries AD for all GPOs in XML format and finds all GPOs that are not linked to an OU.

[xml]$GPOXmlReport = Get-GPOReport -All -ReportType Xml
($GPOXmlReport.GPOS.GPO | Where-Object {$_.LinksTo -eq $null}).Name

Finding GPOs Containing Decryptable Passwords

I consider AD health more than just break/fix issues and performance. I also consider AD health to be involved with AD security. GPOs containing decryptable passwords is a major security risk and should be part of your AD health check script.

Unfortunately, Group Policy Preferences containing passwords have been decryptable since the key was leaked years ago. Some people still aren’t aware of this and are a major risk of exposing sensitive passwords.

GPOs are stored in the XML format. You can notice this by the command used in the last section. You can define passwords in a GPO preference which AD will then store in that GPO’s XML file.

When a password is stored in a GPO preference, it’s stored in an attribute called CPassword. Unfortunately for the world, this is a security risk but fortunately for you, you can use PowerShell to read each GPO’s XML file and determine if the CPassword XML attribute is being used.

Below you will find a modified version of a PowerShell script from TechNet. When run on a domain controller, this script reads all GPO XML files in the SYSVOL folder and searches for the CPassword XML attribute with regular expressions.

Want to quickly check your Active Directory for leaked passwords? Specops has a tool that does so for free and generates a nice report as well.

$Path = "C:\Windows\SYSVOL\domain\Policies\"

# Get all GPO XMLs
$XMLs = Get-ChildItem $Path -recurse -Filter *.xml

# GPO's containing cpasswords
$cPasswordGPOs = @()
# Loop through all XMLs and use regex to parse out cpassword
# Return GPO display name if it returns
Foreach($XMLFile in $XMLs){
    $Content = Get-Content -Raw -Path $XMLFile.FullName
        [string]$CPassword = [regex]::matches($Content,'(cpassword=).+?(?=\")')
        $CPassword = $CPassword.split('(\")')[1]
            [string]$GPOguid = [regex]::matches($XMLFile.DirectoryName,'(?<=\{).+?(?=\})')
            $GPODetail = Get-GPO -guid $GPOguid

Continue Reading

In Part I of this series, you learned some components of AD to monitor and how to query those components with PowerShell. If you’d like to continue reading how to report on and monitor the Active Directory health check tests you learned about in this article, head over to Part II.

Subscribe to Stay in Touch

Never miss out on your favorite ATA posts and our latest announcements!

Looks like you're offline!