Data discovery and classification isn’t just done with high-end, expensive software. Using the PowerShell scripting language, you can build custom tools that compare to the features you get paying thousands of dollars for free!
Table of Contents
Figuring out where all your data resides can be a daunting task. With so many data sources like databases, the cloud, file servers, Sharepoint, Active Directory, the list goes on. A tool that can work with all of your data sources is PowerShell. PowerShell, through its native cmdlets, and through various vendors, modules can discover data just about anywhere it lives.
PowerShell has native support file servers through a lot of different cmdlets such as
Get-ChildItem which enumerates files and folders on a file server, all of the File Server Resource Manager (FSRM) cmdlets that allow you to script out nearly all facets of functionality when it comes to FSRM.
The FSRM cmdlets are available by installing the FS-Resource-Manager Windows features. This can be done in PowerShell using
PS> Install-WindowsFeature -Name FS-Resource-Manager -IncludeManagementTools
You can perform lots of different functions with the FSRM cmdlets. One example of using the FSRM cmdlets would be to create folder quotas. Let’s say you’ve got a folder located on a server and want to ensure your tracking the size. You want to know and/or perform an action once it hits 5GB. FSRM lets you not only track the size but perform an operation once a threshold is met.
Because we need to perform an action when the folder reaches 5GB, we can create an action object using the
New-FsrmAction command. As an example, the step below runs a PowerShell command to write a string to a file at C:\log.txt.
PS> $action = New-FsrmAction -Type Command -Command 'c:\windows\system32\powershell.exe' -CommandParameters "-Command "Add-Content -Value 'threshold met' -Path 'C:\log.txt"' -ShouldLogError
As-is, the command above is not much good. We’ll need to associate this action with a threshold. To do that, we use
New-FsrmQuotaThreshold. Below, I’m setting a limit at 90% of whatever I will set the quota at.
PS> $threshold = New-FsrmQuotaThreshold -Percentage 90 -Action $action
Finally, we can create a quota using the
New-FsrmQuota command below. This creates the quota at 10GB, logs an entry to a text file if it goes over 9GB (90% of the 10GB threshold) and allows it to grow over 10GB (
PS> New-FsrmQuota -Path 'C:\Folder' -Description 'Quota at 10GB' -Size 10.0GB -Threshold $threshold -SoftLimit
Through the ActiveDirectory module available with Remote Server Administration Tools (RSAT), we can use cmdlets such as
Get-AdComputer and a wide array of other
Get commands to pull information from Active Directory. Need to find an obscure attribute on an obscure AD object? You could always use
Get-AdObject to pull anything that’s in the AD database.
By using the SqlServer PowerShell module available via the PowerShell Gallery, you have at your disposal, dozens of commands to pull data from SQL tables, execute discovery stored procedures, create SQL views and a whole lot more.
PowerShell isn’t just limited to on-premises. PowerShell can interact with anything with an API. Major cloud vendors like Microsoft, AWS, and Google Cloud all have tons of APIs and even provide supported PowerShell modules to interact with their clouds. Each has a little different way of working, but considering how different each cloud is, this is to be expected.
You can find the AWS PowerShell module as part of the AWSPowerShell module available via the PowerShell Gallery or can download it as part of the AWS Tools for Windows package.
The Microsoft Azure PowerShell module known as AzureRM is also available via the PowerShell Gallery. As expected, coming from the same company that built PowerShell, this module has tons of support for all facets of Azure. No need to hit up the Azure console once if you don’t want to!
Finally, the Google Cloud Platform (GCP) has a PowerShell module of its own as well. Like the AWS and Azure PowerShell modules, Google has wrapped their API calls inside of a PowerShell module to give you an easy way of pulling data from Google Cloud as well.
Did you know that every time some action is taken in Windows, it typically registers an event that you can tap into with PowerShell? How to do this is called WMI events, and it’s a feature in Windows that allows you to immediately, down to the second, notice when actions like a file are created, a file is moved, a folder is deleted and so on. By using the
Register-WmiEvent cmdlet, we can “subscribe” to specific events and perform some action only when that event happens. WMI events are useful when you need to know right away an action has been taken on some data.
Just Getting Started
Wherever data lives, PowerShell can get to it. Even if a nicely-packaged module doesn’t exist from the vendor, you can still find plenty of modules to help you get to whatever information you’re looking for. The most popular place is the PowerShell Gallery. This is where you’ll find thousands of modules that support tons of different data sources, vendor products and more.
Another popular place is Github. There’s currently over 14,000 Github repositories for PowerShell code. Chances are between vendors, the PowerShell Gallery and Github, it’s unlikely you’ll have to spend too much time crafting your own, unique scripts. Don’t reinvent the wheel!
Once you’ve been able to put together some PowerShell code to discover where all your data lies, it’s then time to classify it. Data classification can be done a lot of different ways with PowerShell. You can roll your own scripts by using such commands as
Get-Content to read files and perhaps
Select-String or any one of the other ways to define regular expresssions to search for strings inside of files. If you have the PowerShell skills, you can do just about anything.
However, if you’d instead save some time and use other free tools to classify data, PowerShell has the ability through another module to manage the File Classification Infrastructure feature in Windows Server. Whatever you can do in the GUI, you can manipulate with PowerShell scripts.
Along with that same vein, Microsoft also provides the Data Classification Toolkit which gives you various tools to help you identify and classify data, configure central access policies to manage classification rules and more. As an added bonus, it comes with a PowerShell module too!
Because I know you’re wanting to see some examples of this toolkit, let’s work on a quick sample script using that PowerShell module. To use the module, you’ll need the FSRM Windows feature installed. As described above in the File Servers section, we can install this feature using
PS> Install-WindowsFeature -Name FS-Resource-Manager -IncludeManagementTools
When installed the toolkit creates a few predefined configuration packages located at C:\Program Files (X86)\Microsoft\Data Classification Toolkit. Each package will end with .Example. These packages can be imported into FSRM to be edited in the GUI using the
Import-FileClassificationPackage -Path 'C:\Program Files (X86)\Microsoft\Data Classification Toolkit\PCI-DSS Classification Package Example.xml' -Scope AllShares
You’ll also find various reporting cmdlets too available in this module.
A great example of just how multi-faceted PowerShell can be is this article from Microsoft. In this example, you’ll learn how to change Active Directory to support resource definitions, create file classification rules and also create actions to populate those Active Directory resources once a specific type of data is recognized.
If your budget is tight, but you still need to implement some kind of data discovery and classification strategy, take a look at PowerShell. Because of its rich ecosystem of modules, ubiquitous presence across all Windows systems and available on Linux/MacOS systems too, it’s hard to beat. If you have the PowerShell skills, you’ll find that the sky is the limit when it comes to building tools to discover and define exactly kind of data you have out there.